Python 为什么我的代码在抓取时卡在无限循环中？_Python_Web Scraping

Python 为什么我的代码在抓取时卡在无限循环中？

python web-scraping

Python 为什么我的代码在抓取时卡在无限循环中？,python,web-scraping,Python,Web Scraping,我正在学习如何使用Python 3进行基本的web抓取，在本例中，我试图从网站上抓取所有作者的名字。我试图创建一个代码，我不知道网站上的页面总数。但是，当我尝试构建它时，编辑器没有响应。代码有问题吗，还是应该让它运行更长时间 import requests import bs4 i = 0 authors = set() while True: try: if i == 0: url = "http://quotes.toscrape.c

我正在学习如何使用Python 3进行基本的web抓取，在本例中，我试图从网站上抓取所有作者的名字。我试图创建一个代码，我不知道网站上的页面总数。但是，当我尝试构建它时，编辑器没有响应。代码有问题吗，还是应该让它运行更长时间

import requests
import bs4
i = 0
authors = set()
while True:
    try:
        if i == 0:
            url = "http://quotes.toscrape.com"
        else: 
            url = "http://quotes.toscrape.com/page/{}/".format(i+1)
        
        res = requests.get(url)
        soup = bs4.BeautifulSoup(res.text, 'lxml')
        
        for name in soup.select('.author'):
            authors.add(name.text)
            
        
        i += 1
        
    except:
        break

尝试：

导入请求
进口bs4
i=0
authors=set（）
尽管如此：
url=”http://quotes.toscrape.com“如果i==0，那么\
f“http://quotes.toscrape.com/page/{i} /”
res=requests.get（url）
如果res.text.find（'未找到引号！'）<0：
soup=bs4.BeautifulSoup（res.text，“lxml”）
选择（'.author'）：
authors.add（name.text）
i+=1
其他：
打破

我认为问题与此网站如何返回有效回复有关，即使该页码中没有引用（例如，try）。因此，您很可能永远不会（或者至少需要很长时间）遇到任何会导致break语句的错误。您应该在遇到“未找到引号！”之类的文本时尝试打断，例如：

因为站点总是返回一个有效的响应？没有终止while循环的条件，因为你有一个

while True

，除非出现异常，否则不会退出。我认为，验证当前页面上没有作者/引用更可靠。是的，这是一个有效点@bereal。检查引用或作者都将提供一种检测何时停止循环的好方法。

import requests
import bs4
i = 0
authors = set()
while True:
    try:
        if i == 0:
            url = "http://quotes.toscrape.com"
        else: 
            url = "http://quotes.toscrape.com/page/{}/".format(i+1)
    
        res = requests.get(url)
        soup = bs4.BeautifulSoup(res.text, 'lxml')

        if "No quotes found!" in str(soup):
            break
    
        for name in soup.select('.author'):
            authors.add(name.text)
        
    
        i += 1
    
    except:
        break