Python 3.x 在没有超时错误的情况下运行以下代码
我可以对以下代码行进行哪些改进,使其运行更快,并提供我想要的结果。代码运行时间太长。我已尝试在我的计算机上运行此代码,但收到超时错误。此代码循环3564页。如何改进它以消除超时错误?该代码仅对小范围内的页面运行Python 3.x 在没有超时错误的情况下运行以下代码,python-3.x,web-scraping,beautifulsoup,python-requests,Python 3.x,Web Scraping,Beautifulsoup,Python Requests,我可以对以下代码行进行哪些改进,使其运行更快,并提供我想要的结果。代码运行时间太长。我已尝试在我的计算机上运行此代码,但收到超时错误。此代码循环3564页。如何改进它以消除超时错误?该代码仅对小范围内的页面运行 import pandas as pd from bs4 import BeautifulSoup,Tag import requests data = [] s=("https://www.cupcakemaps.com/search_results?page=") for x in
import pandas as pd
from bs4 import BeautifulSoup,Tag
import requests
data = []
s=("https://www.cupcakemaps.com/search_results?page=")
for x in range(1,3564):
res=requests.get(s+str(x),timeout=20)
soup=BeautifulSoup(res.text,'lxml')
listings=soup.findAll(class_='grid_element')
for listing in listings:
listing_name=listing.find('span',{'class':'h3 bold inline-block rmargin member-search-full-name'})
if isinstance(listing_name,Tag):
listing_name=listing_name.text.strip()
listing_description=listing.find('p',{'class':'small member-search-description'})
if isinstance(listing_description,Tag):
listing_description=listing_description.text.strip()
listing_location=listing.find('span',{'class':'small member-search-location rmargin rpad'})
if isinstance (listing_location,Tag):
listing_location=listing_location.text.strip()
full_dict={'Title':listing_name,'Description':listing_description,'Location':listing_location}
data.append(full_dict)
df=pd.DataFrame(data)
print(df)
我希望代码能够打印出一个包含3列的数据帧。您是否尝试过将None分配给res,并在try->中测试为None,除了while循环中的Timeout
import time
for x in range(1,3564):
res = None
while not res:
try:
res=requests.get(s+str(x),timeout=20)
except requests.exceptions.Timeout:
time.sleep(5) # wait 5 seconds and try again
soup=BeautifulSoup(res.text,'lxml')
listings=soup.findAll(class_='grid_element')
for listing in listings:
listing_name=listing.find('span',{'class':'h3 bold inline-block rmargin member-search-full-name'})
if isinstance(listing_name,Tag):
listing_name=listing_name.text.strip()
listing_description=listing.find('p',{'class':'small member-search-description'})
if isinstance(listing_description,Tag):
listing_description=listing_description.text.strip()
listing_location=listing.find('span',{'class':'small member-search-location rmargin rpad'})
if isinstance (listing_location,Tag):
listing_location=listing_location.text.strip()
full_dict={'Title':listing_name,'Description':listing_description,'Location':listing_location}
data.append(full_dict)
因此,我们只是将res作为一个None变量启动,测试它是否仍然为None,如果为正,则重复请求。只要出现requests.exceptions.Timeout异常,我们就会捕获它并等待5秒钟,然后返回while循环。如果请求引发了不同的异常,您可以尝试用以下方式替换exception行:
except requests.exceptions.RequestException:
不幸的是,这不起作用。我现在得到2个错误1。socket.timeout:读取操作超时,2。NameError:name'Timeout'未定义我的错误,Timeout是请求中的一个异常类,我用目录更新了代码段。如果套接字超时错误仍然存在,请尝试用最常见的异常(最后一行)替换exception。让我知道它是否有效