Python 它只打印网站上的最后一篇文章。需要打印所有

Python 它只打印网站上的最后一篇文章。需要打印所有,python,csv,Python,Csv,下面是我的代码。print(news_csv)可以很好地打印我想要的所有文章,但是news_csv.to_csv('bbb.csv')只打印最后一篇文章 import pandas as pd import requests from bs4 import BeautifulSoup source = requests.get('https://www.vanglaini.org/').text soup = BeautifulSoup(source, 'lxml') for article

下面是我的代码。print(news_csv)可以很好地打印我想要的所有文章,但是news_csv.to_csv('bbb.csv')只打印最后一篇文章

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('article'):
    if article.a is None:
        continue
    headline = article.a.text
    summary=article.p.text
    link = "https://www.vanglaini.org" +article.a['href']
    #print(headline)
    #print(summary)
    #print(link)
    news_csv = pd.DataFrame({'Headline': [headline],
                             'Summary': [summary],
                             'Link': [link],
                             })
    print(news_csv)
    news_csv.to_csv('bbb.csv')

#print()



CSV帮助中只打印最后一篇文章。

您已经在for循环中定义了变量
news\u CSV
。这意味着每次迭代文章时它都会被覆盖。这就是为什么只有最后一篇文章会 出现在csv文件中。事实上,该文件不断被覆盖

相反,您的内容应该附加到容器对象,然后仅在for循环完成后保存为csv

如果您确实想使用pandas DataFrame,则应遵循文档中提供的最后一个示例:

将所有文章内容追加到列表中,然后使用pd.concat()生成DataFrame对象

我会这样写:

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
articles = []
for article in soup.find_all('article'):
    if article.a is None:
        continue
    headline = article.a.text
    summary=article.p.text
    link = "https://www.vanglaini.org" +article.a['href']

    articles.append((headline, summary, link))
    print(f'Headline: {headline}\nSummary: {summary}\nLink: {link}')
    #print(news_csv)

news_dataframe = pd.concat([pd.DataFrame([article], columns='Headline Summary Link'.split()) for article in articles ], ignore_index=True)

news_dataframe.to_csv('bbb.csv')