Python 它只打印网站上的最后一篇文章。需要打印所有
下面是我的代码。print(news_csv)可以很好地打印我想要的所有文章,但是news_csv.to_csv('bbb.csv')只打印最后一篇文章Python 它只打印网站上的最后一篇文章。需要打印所有,python,csv,Python,Csv,下面是我的代码。print(news_csv)可以很好地打印我想要的所有文章,但是news_csv.to_csv('bbb.csv')只打印最后一篇文章 import pandas as pd import requests from bs4 import BeautifulSoup source = requests.get('https://www.vanglaini.org/').text soup = BeautifulSoup(source, 'lxml') for article
import pandas as pd
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('article'):
if article.a is None:
continue
headline = article.a.text
summary=article.p.text
link = "https://www.vanglaini.org" +article.a['href']
#print(headline)
#print(summary)
#print(link)
news_csv = pd.DataFrame({'Headline': [headline],
'Summary': [summary],
'Link': [link],
})
print(news_csv)
news_csv.to_csv('bbb.csv')
#print()
CSV帮助中只打印最后一篇文章。您已经在for循环中定义了变量
news\u CSV
。这意味着每次迭代文章时它都会被覆盖。这就是为什么只有最后一篇文章会
出现在csv文件中。事实上,该文件不断被覆盖
相反,您的内容应该附加到容器对象,然后仅在for循环完成后保存为csv
如果您确实想使用pandas DataFrame,则应遵循文档中提供的最后一个示例:
将所有文章内容追加到列表中,然后使用pd.concat()生成DataFrame对象
我会这样写:
import pandas as pd
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
articles = []
for article in soup.find_all('article'):
if article.a is None:
continue
headline = article.a.text
summary=article.p.text
link = "https://www.vanglaini.org" +article.a['href']
articles.append((headline, summary, link))
print(f'Headline: {headline}\nSummary: {summary}\nLink: {link}')
#print(news_csv)
news_dataframe = pd.concat([pd.DataFrame([article], columns='Headline Summary Link'.split()) for article in articles ], ignore_index=True)
news_dataframe.to_csv('bbb.csv')