Python 重试请求机制_Python_Web Scraping_Beautifulsoup_Python Requests_Urllib3

Python 重试请求机制

python web-scraping

Python 重试请求机制,python,web-scraping,beautifulsoup,python-requests,urllib3,Python,Web Scraping,Beautifulsoup,Python Requests,Urllib3,我正在尝试构建webscraper项目我尝试做的一件事是智能重试机制使用urlib3和请求以及漂亮的汤当im设置超时=1时为了使重试失败并检查，请重试其异常中断代码如下： import requests import re from bs4 import BeautifulSoup import json import time import sys from requests.adapters import HTTPAdapter from urllib3.util import R

我正在尝试构建webscraper项目我尝试做的一件事是智能重试机制使用urlib3和请求以及漂亮的汤

当im设置超时=1时为了使重试失败并检查，请重试其异常中断代码如下：

import requests
import re
from bs4 import BeautifulSoup
import json
import time
import sys
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

# this get_items methods is for getting dict of link to scrape items per link

def get_items(self, dict):
        itemdict = {}
        for k, v in dict.items():
            boolean = True
        # here, we fetch the content from the url, using the requests library
            while (boolean):
             try:
                a =requests.Session()
                retries = Retry(total=3, backoff_factor=0.1, status_forcelist=[301,500, 502, 503, 504])
                a.mount(('https://'), HTTPAdapter(max_retries=retries))
                page_response = a.get('https://www.XXXXXXX.il' + v, timeout=1)
             except requests.exceptions.Timeout:
                print  ("Timeout occurred")
                logging.basicConfig(level=logging.DEBUG)
             else:
                 boolean = False

            # we use the html parser to parse the url content and store it in a variable.
            page_content = BeautifulSoup(page_response.content, "html.parser")
            for i in page_content.find_all('div', attrs={'class':'prodPrice'}):
                parent = i.parent.parent.contents[0]
                getparentfunc= parent.find("a", attrs={"href": "javascript:void(0)"})
                itemid = re.search(".*'(\d+)'.*", getparentfunc.attrs['onclick']).groups()[0]
                itemName = re.sub(r'\W+', ' ', i.parent.contents[0].text)
                priceitem = re.sub(r'[\D.]+ ', ' ', i.text)
                itemdict[itemid] = [itemName, priceitem]

我将非常感谢高效的重试机制解决或任何其他简单的方法谢谢

Iso

我通常会做以下事情：

def get(url, retries=3):
    try:
        r = requests.get(url)
        return r
    except ValueError as err:
        print(err)
        if retries < 1:
            raise ValueError('No more retries!')
        return get(href, retries - 1)

def get（url，重试次数=3）：
尝试：
r=请求。获取（url）
返回r
除ValueError作为错误外：
打印（错误）
如果重试次数<1：
raise VALUERROR（'不再重试！'）
返回获取（href，重试次数-1）