Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/fortran/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Web scraping 刮擦周期性地暂停_Web Scraping_Scrapy - Fatal编程技术网

Web scraping 刮擦周期性地暂停

Web scraping 刮擦周期性地暂停,web-scraping,scrapy,Web Scraping,Scrapy,我正试图做一个一次性的网站与三万页以上的刮与刮。但是,我的spider会定期暂停(例如,在第148、285、425、558页),并在几分钟后恢复(如下面的日志所示)。我正在使用scrapy_proxy_池和scrapy用户代理进行IP和用户代理轮换。我尝试设置下载延迟和自动锁定,但问题仍然存在。非常感谢您的帮助 2020-02-25 22:11:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), sc

我正试图做一个一次性的网站与三万页以上的刮与刮。但是,我的spider会定期暂停(例如,在第148、285、425、558页),并在几分钟后恢复(如下面的日志所示)。我正在使用scrapy_proxy_池和scrapy用户代理进行IP和用户代理轮换。我尝试设置下载延迟和自动锁定,但问题仍然存在。非常感谢您的帮助

2020-02-25 22:11:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 12 items/min)
2020-02-25 22:12:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 0 items/min)
2020-02-25 22:13:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 0 items/min)
2020-02-25 22:14:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 0 items/min)
2020-02-25 22:15:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 0 items/min)
2020-02-25 22:16:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 0 items/min)
2020-02-25 22:17:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 0 items/min)
2020-02-25 22:18:08 [scrapy.extensions.logstats] INFO: Crawled 558 pages (at 0 pages/min), scraped 2768 items (at 0 items/min)
2020-02-25 22:18:33 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
2020-02-25 22:18:33 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): free-proxy-list.net:443
2020-02-25 22:18:33 [urllib3.connectionpool] DEBUG: https://free-proxy-list.net:443 "GET /anonymous-proxy.html HTTP/1.1" 200 None
2020-02-25 22:18:34 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.us-proxy.org:443
2020-02-25 22:18:34 [urllib3.connectionpool] DEBUG: https://www.us-proxy.org:443 "GET / HTTP/1.1" 200 None
2020-02-25 22:18:34 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:34 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:34 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): free-proxy-list.net:443
2020-02-25 22:18:34 [urllib3.connectionpool] DEBUG: https://free-proxy-list.net:443 "GET /uk-proxy.html HTTP/1.1" 200 None
2020-02-25 22:18:34 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.sslproxies.org:443
2020-02-25 22:18:35 [urllib3.connectionpool] DEBUG: https://www.sslproxies.org:443 "GET / HTTP/1.1" 200 None
2020-02-25 22:18:35 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.free-proxy-list.net:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.free-proxy-list.net:80 "GET / HTTP/1.1" 301 None
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.free-proxy-list.net:443
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: https://www.free-proxy-list.net:443 "GET / HTTP/1.1" 200 None
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 5.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:36 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:36 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
2020-02-25 22:18:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:37 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:37 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:37 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:37 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
2020-02-25 22:18:37 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:37 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:37 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.myanmartradeportal.gov.mm/commodity-search/view/30989> (referer: https://www.myanmartradeportal.gov.mm/commodity-search/view/1)
2020-02-25 22:18:46 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
2020-02-25 22:18:46 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:46 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:46 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:46 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.myanmartradeportal.gov.mm/commodity-search/view/30988> (referer: https://www.myanmartradeportal.gov.mm/commodity-search/view/1)
2020-02-25 22:18:47 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36
2020-02-25 22:18:47 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2020-02-25 22:18:47 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 200 185
2020-02-25 22:18:47 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2020-02-25 22:18:47 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2020-02-25 22:18:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET SCRAPING_URL> (referer: REFERER_URL)

您能否提供更多的信息:特别是从以下位置开始的日志行:
[scrapy.utils.log]信息:版本:
[scrapy.core.engine]信息:Spider opened
消息?可能是您使用的代理/代理扩展造成的吗?
2020-02-25 22:53:06 [scrapy_proxy_pool.middlewares] INFO: Blacklist is cleared.