Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:如何在使用BeautifulSoup抓取数据时在excel xlsx文件上方添加列_Python_Excel_Pandas_Dataframe_Web Crawler - Fatal编程技术网

Python:如何在使用BeautifulSoup抓取数据时在excel xlsx文件上方添加列

Python:如何在使用BeautifulSoup抓取数据时在excel xlsx文件上方添加列,python,excel,pandas,dataframe,web-crawler,Python,Excel,Pandas,Dataframe,Web Crawler,嗨,我是一个代码新手,我正在尝试从cnn.com获取新闻标题,就像下面附上的excel文件的图像一样 但问题是,我不知道如何添加每一列,例如World/Politics/Health,我的代码只从元组列表的最后一个元素(在本代码中为“Politics”)获取数据 这是我的代码。提前谢谢你 from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webd

嗨,我是一个代码新手,我正在尝试从cnn.com获取新闻标题,就像下面附上的excel文件的图像一样

但问题是,我不知道如何添加每一列,例如World/Politics/Health,我的代码只从元组列表的最后一个元素(在本代码中为“Politics”)获取数据

这是我的代码。提前谢谢你

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import requests
import time
import pandas as pd
import os
from bs4 import BeautifulSoup as soup
from bs4 import NavigableString
import re
import xlsxwriter
from openpyxl import Workbook


path = "C:/Users/Desktop/chromedriver.exe"
driver = webdriver.Chrome(path)

# per section

a =['world','health','politics']
wb = Workbook()
ws = wb.active

for i in a:
    nl = []
    driver.get("https://edition.cnn.com/"+str(i))
    driver.implicitly_wait(3)
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    find_ingre = soup.select("span.cd__headline-text")

    for i in find_ingre:
        nl.append(i.get_text())

# make dataframe --> save xlsx

import pandas as pd
from pandas import Series, DataFrame

df = pd.DataFrame(nl)
df.to_excel("cnn_recent_topics.xlsx",index=False)
结果现在--->

我想要得到的结果--->

如果您需要解释,您可以试试这个,评论一下:

def custom_scrape(topic):
    nl = []
    driver.get("https://edition.cnn.com/"+str(topic))
    driver.implicitly_wait(3)
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    find_ingre = soup.select("span.cd__headline-text")

    for i in find_ingre:
        nl.append(i.get_text())


    return nl

topics =['world','health','politics']
result = pd.DataFrame()
for topic in topics:
    temp_df = pd.DataFrame(nl)
    temp_df.columns = [topic]
    result = pd.concat([result, temp_df], ignore_index=True, axis=1)

嗨,实际上代码给了我几个“NameError”,我不知道确切的原因。你能解释一下吗?