Python 熊猫导出到csv
我的代码如下:Python 熊猫导出到csv,python,pandas,csv,Python,Pandas,Csv,我的代码如下: from xlsxwriter import Workbook import os,shutil import requests import pandas from bs4 import BeautifulSoup MAX_RETRIES = 20 base_url='https://pagellapolitica.it/politici/sfoggio/9/matteo-renzi?page=' for page in range(1,32,1): l=[]
from xlsxwriter import Workbook
import os,shutil
import requests
import pandas
from bs4 import BeautifulSoup
MAX_RETRIES = 20
base_url='https://pagellapolitica.it/politici/sfoggio/9/matteo-renzi?page='
for page in range(1,32,1):
l=[]
session = requests.Session()
adapter = requests.adapters.HTTPAdapter(max_retries=MAX_RETRIES)
session.mount('https://', adapter)
session.mount('http://', adapter)
site=(base_url+str(page)+".html")
# print(site)
c=session.get(site)
r=c.content
soup=BeautifulSoup(r,'html.parser')
all=soup.find_all("div",{"class":"clearfix"})
for d in all:
links=d.find_all("a")
len(links)
l=[]
# workbook = Workbook('bbb.xlsx')
# worksheet = workbook.add_worksheet()
# row +=0
# worksheet.write(row,0,'Link')
# worksheet.write(row,1,'Name ')
# row+=1
for a in links[5:17]:
d={}
href=(a["href"])
basic_url=('https://pagellapolitica.it/')
site =basic_url + href
#print(site)
c=requests.get(site)
r=c.content
soup=BeautifulSoup(r,'html.parser')
Name=soup.find("h3",{"class":"pull-left"}).text
Fact_checking=soup.find("label",{"class":"verdict-analisi"}).text
quote=soup.find("div",{"class":"col-xs-12 col-sm-6 col-smm-6 col-md-5 col-lg-5"}).text
all=soup.find_all("span",{"class":"item"})
Topic=all[0].text
Date=all[2].text
a=all[3].find("a",{"class":"" ""})
Link=a["href"]
Text=soup.find("div",{"class":"col-xs-12 col-md-12 col-lg-12"}).text
d["Name"]=Name
d["Fact_checking"]=Fact_checking
d["Quote"]=quote
d["Economic_topic"]=Topic
d["Date"]=Date
d["Link"]=Link
d["Text"]=Text
l.append(d)
df=pandas.DataFrame(l)
df.to_csv("outing.csv")
问题是,当我以csv格式导出数据时,我只得到6行结果。当我打印(df)和打印(l)时,它会打印列表中的所有数据,但是当我选中len(l)时,我只得到6行。你知道为什么会这样吗??
提前谢谢你 考虑使用
pandas.DataFrame()
构建绑定到单个数据帧中的词典列表,然后使用pandas.concat()
将单个数据帧列表连接到最终数据帧中
df_list = []
for d in all:
links=d.find_all("a")
len(links)
l=[]
for a in links[5:17]:
d={}
...
d["Name"]=Name
d["Fact_checking"]=Fact_checking
d["Quote"]=quote
d["Economic_topic"]=Topic
d["Date"]=Date
d["Link"]=Link
d["Text"]=Text
l.append(d)
df_list.append(pandas.DataFrame(l))
final_df = pandas.concat(df_list)
final_df.to_csv("outing.csv")
能否在问题的底部添加
df.info()
的输出。在添加之前,请确保对其进行格式化,使其可读。在内部循环的每次迭代中,您都会覆盖CSV文件…将df=pandas.DataFrame(l)
和df.to_CSV(“outing.CSV”)
移动到循环外部