Python 3.x 如何使用具有完整设置的scrapy旋转代理或按请求旋转ip?

Python 3.x 如何使用具有完整设置的scrapy旋转代理或按请求旋转ip?,python-3.x,api,proxy,scrapy,http-proxy,Python 3.x,Api,Proxy,Scrapy,Http Proxy,大家好, 我刮一个网站,并使用刮旋转代理,但我也尝试了其他代理,但他们不适合我的要求,或我不能实现他们,因为我想 有很多API,我想获取API,比如每个请求一个ip。有一次,当我运行刮板时,我最多可以使用100个IP,但它并没有使用所有IP并使其中一些IP恢复活力。您可以在下面看到一些日志: 2020-11-06 09:35:56 [scrapy.extensions.logstats] INFO: Crawled 21 pages (at 0 pages/min), scraped 0 ite

大家好, 我刮一个网站,并使用刮旋转代理,但我也尝试了其他代理,但他们不适合我的要求,或我不能实现他们,因为我想

有很多API,我想获取API,比如每个请求一个ip。有一次,当我运行刮板时,我最多可以使用100个IP,但它并没有使用所有IP并使其中一些IP恢复活力。您可以在下面看到一些日志:

2020-11-06 09:35:56 [scrapy.extensions.logstats] INFO: Crawled 21 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-11-06 09:35:56 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 1, unchecked: 87, reanimated: 1, mean backoff time: 122s)
2020-11-06 09:36:26 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 1, unchecked: 87, reanimated: 1, mean backoff time: 122s)
2020-11-06 09:36:56 [scrapy.extensions.logstats] INFO: Crawled 21 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-11-06 09:36:56 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 1, unchecked: 87, reanimated: 1, mean backoff time: 122s)
2020-11-06 09:37:26 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 1, unchecked: 87, reanimated: 1, mean backoff time: 122s)
2020-11-06 09:37:56 [scrapy.extensions.logstats] INFO: Crawled 21 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-11-06 09:37:56 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 1, unchecked: 87, reanimated: 1, mean backoff time: 122s)
2020-11-06 09:37:56 [rotating_proxies.middlewares] DEBUG: 1 proxies moved from 'dead' to 'reanimated'
2020-11-06 09:37:59 [rotating_proxies.expire] DEBUG: Proxy <https://92.60.190.249:50335> is DEAD
2020-11-06 09:37:59 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://shopee.com.my/api/v2/search_items/?by=sales&limit=50&match_id=243&newest=0&order=desc&page_type=search&version=2> with another proxy (failed 3 times, max retries: 5)
2020-11-06 09:37:59 [scrapy_user_agents.middlewares] DEBUG: Proxy is detected https://92.60.190.249:50335
2020-11-06 09:37:59 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36
2020-11-06 09:38:10 [rotating_proxies.expire] DEBUG: Proxy <https://162.223.89.220:8080> is GOOD
2020-11-06 09:38:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://shopee.com.my/api/v2/search_items/?by=sales&limit=50&match_id=243&newest=0&order=desc&page_type=search&version=2> (referer: https://shopee.com.my/api/v2/search_items/?by=sales&limit=50&match_id=243&newest=0&order=desc&page_type=search&version=2)
2020-11-06 09:38:10 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
2020-11-06 09:38:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 1, unchecked: 85, reanimated: 2, mean backoff time: 249s)
2020-11-06 09:38:56 [scrapy.extensions.logstats] INFO: Crawled 22 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2020-11-06 09:38:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 1, unchecked: 85, reanimated: 2, mean backoff time: 249s)
我正在检查一个try-and-except块,如果response if 200是一个空的json对象,那么我将再次让步,所以我希望再次让步将是change-ip,或者如果不是,那么如何更改它。 再次查看下面的解析代码:

def parse(self, response, subcat, category):
            try:
                jdata = json.loads(response.body.decode('utf-8'))
            except Exception as e:
                print(
                    'This is failed subcat url: {0} and tring again.'.format(subcat))
                print('and the exception is: {0}'.format(e))
                yield Request(response.url, dont_filter=True, callback=self.parse,
                              cb_kwargs={'subcat': subcat, 'category': category})
def parse(self, response, subcat, category):
            try:
                jdata = json.loads(response.body.decode('utf-8'))
            except Exception as e:
                print(
                    'This is failed subcat url: {0} and tring again.'.format(subcat))
                print('and the exception is: {0}'.format(e))
                yield Request(response.url, dont_filter=True, callback=self.parse,
                              cb_kwargs={'subcat': subcat, 'category': category})