Python 3.x BeautifulSoup4:find_all（）覆盖以前的数据集，而不是显示所有目标数据_Python 3.x_Beautifulsoup

Python 3.x BeautifulSoup4:find_all（）覆盖以前的数据集，而不是显示所有目标数据

python-3.x

Python 3.x BeautifulSoup4:find_all（）覆盖以前的数据集，而不是显示所有目标数据,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我正在抓取此网页：代码： import requests as r from bs4 import BeautifulSoup as soup webpages=['https://zh.wikisource.org/wiki/%E8%AE%80%E9%80%9A%E9%91%92%E8%AB%96/%E5%8D%B701'] for item in webpages: headers = {'User-Agent': 'Mozilla/5.0'} data = r.get

我正在抓取此网页：

代码：

import requests as r
from bs4 import BeautifulSoup as soup

webpages=['https://zh.wikisource.org/wiki/%E8%AE%80%E9%80%9A%E9%91%92%E8%AB%96/%E5%8D%B701']

for item in webpages:
    headers = {'User-Agent': 'Mozilla/5.0'}
    data = r.get(item, headers=headers)
    data.encoding = 'utf-8'
    page_soup = soup(data.text, 'html5lib')
    headline = page_soup.find_all(class_='mw-headline')
    for el in headline:
        headline_text = el.get_text()
    p = page_soup.find_all('p')
    for el in p:
        p_text = el.get_text()
    text = headline_text + p_text
    with open(r'sample_srape.txt', 'a', encoding='utf-8') as file:
        file.write(text)
        file.close()

输出txt文件仅显示最后一组

headline\u text+p\u text

数据集。似乎每当检索到新数据时，它都会覆盖以前的数据集。如何使其停止覆盖以前的数据并显示目标的每一组数据？

您需要

来附加参数

我希望您的缩进在内部两个for循环中是不同的，这样您就不会只使用每次匹配的最后一项。若要发出多个请求，则可以使用会话—重新使用连接可提高效率

此外，在给定标题下的段落连接。某些部分的变量命名更清晰

您不需要

关闭

，因为这是由

与

一起处理的。也许是这样的：

import requests
from bs4 import BeautifulSoup as soup

webpages=['https://zh.wikisource.org/wiki/%E8%AE%80%E9%80%9A%E9%91%92%E8%AB%96/%E5%8D%B701']
headers = {'User-Agent': 'Mozilla/5.0'}

with requests.Session() as s:

    for link in webpages:
        data = s.get(link, headers=headers)
        data.encoding = 'utf-8'
        page_soup = soup(data.text, 'html5lib')
        headlines = page_soup.find_all(class_='mw-headline')

        with open(r'sample_scrape.txt', 'a', encoding='utf-8') as file:

            for headline in headlines:
                headline_text = headline.get_text()
                paragraphs = page_soup.find_all('p')
                text = ''

                for paragraph in paragraphs:
                    paragraph_text = paragraph.get_text()
                    text+= paragraph_text

                text = headline_text + text
                file.write(text)

我将open（）从write模式更改为append模式；同样的问题仍然存在，我注意到你在做两个for循环，因为缩进，在这两种情况下，你只能使用循环中的最后一个值。我认为我的for循环也有问题。您能更具体地说明如何修复它吗？谢谢我想我需要你的反馈来适当地调整这个