Python 如何使用beautiful soup和请求提取网站中拆分为不同页面的文章_Python_Web Scraping_Python Requests

Python 如何使用beautiful soup和请求提取网站中拆分为不同页面的文章

python web-scraping

Python 如何使用beautiful soup和请求提取网站中拆分为不同页面的文章,python,web-scraping,python-requests,Python,Web Scraping,Python Requests,如何使用美丽的汤和要求，提取每一篇文章，以获得完整的文章在网站上是在不同的页面分裂比如这个网站谢谢大家! 这里有一个片段可以帮助您： import requests TOC = ["Chapter_I_p%d.html", "Chapter_III_p%d.html", ...] # get all page for chapter I for i in range(1, 10): # 10 because i noticed it. url = "http://www.pag

如何使用美丽的汤和要求，提取每一篇文章，以获得完整的文章在网站上是在不同的页面分裂

比如这个网站

谢谢大家!

这里有一个片段可以帮助您：

import requests

TOC = ["Chapter_I_p%d.html", "Chapter_III_p%d.html", ...]

# get all page for chapter I
for i in range(1, 10): # 10 because i noticed it.
    url = "http://www.pagebypagebooks.com/F_Scott_Fitzgerald/The_Lees_Of_Happiness/Chapter_I_p%d.html" % i
    r = requests.get(url)
    if not r.status_code in (200, 201):
        continue
    content = r.content
    with open(os.path.split(url)[-1], "wb") as fin:
        # write the whole content. 
        # bs4 can be used here to select 'p' tags...
        fin.write(content)

注意：对于每一章，您可以按照描述在页面上循环

我处理废弃页面的方法是识别URL的模式（这里是页码），或者使用bs4选择器get next page来获取它

嗨，欢迎来到Stack Overflow！当您提供多一点背景知识（特别是您尝试过的代码）时，它肯定会帮助我们更好地回答您的问题。有时候，像这样的问题看起来像是家庭作业，这会阻止你获得所需的帮助。看看这个网站，你应该考虑使用这样的任务。谢谢！我想找到下一页按钮并阅读文本直到end@EveBao，如果回答适合你，最好接受它：）