Python 通过selenium和scrapy实现使用代理_Python_Selenium Webdriver_Web Scraping_Scrapy_Selenium Chromedriver

Python 通过selenium和scrapy实现使用代理

python selenium-webdriver web-scraping scrapy

Python 通过selenium和scrapy实现使用代理,python,selenium-webdriver,web-scraping,scrapy,selenium-chromedriver,Python,Selenium Webdriver,Web Scraping,Scrapy,Selenium Chromedriver,我正在构建一个spider，它试图使用selenium作为代理。主要目标是使蜘蛛尽可能地僵硬，以避免因拉网而被抓到。我知道scrapy有“scrapy旋转代理”模块，但我无法验证scrapy是否会检查chromedriver请求网页成功的状态，如果由于被抓到而失败，则运行切换代理的过程其次，我有点不确定我的计算机是如何处理代理的。例如，如果在任何情况下，当我设置代理值时，该值是否与在我的计算机上发出请求的任何内容一致？例如，只要其中一个设置了代理值，scrapy和webdriver会有相同的代

我正在构建一个spider，它试图使用selenium作为代理。主要目标是使蜘蛛尽可能地僵硬，以避免因拉网而被抓到。我知道scrapy有“scrapy旋转代理”模块，但我无法验证scrapy是否会检查chromedriver请求网页成功的状态，如果由于被抓到而失败，则运行切换代理的过程

其次，我有点不确定我的计算机是如何处理代理的。例如，如果在任何情况下，当我设置代理值时，该值是否与在我的计算机上发出请求的任何内容一致？例如，只要其中一个设置了代理值，scrapy和webdriver会有相同的代理值吗？特别是如果scrapy有一个代理值，那么在类定义中实例化的任何SeleniumWebDriver都会继承该代理吗

我对这些工具缺乏经验，非常感谢您的帮助

我试图寻找一种方法来测试和检查selenium的代理值，并与scrapy进行比较

#gets the proxies and sets the value of the scrapy proxy list in settings
def get_proxies():
        url = 'https://free-proxy-list.net/'
        response = requests.get(url)
        parser = fromstring(response.text)
        proxies = set()
        for i in parser.xpath('//tbody/tr')[:10]:
            if i.xpath('.//td[7][contains(text(),"yes")]'):
                #Grabbing IP and corresponding PORT
                proxy = ":".join([i.xpath('.//td[1]/text()')[0],i.xpath('.//td[2]/text()')[0]])
                proxies.add(proxy)

        proxy_pool = cycle(proxies)


        url = 'https://httpbin.org/ip'
        new_proxy_list = []
        for i in range(1,30):
            #Get a proxy from the pool
            proxy = next(proxy_pool)

            try:
                response = requests.get(url,proxies={"http": proxy, "https": proxy})

                #Grab and append proxy if valid
                new_proxy_list.append(proxy)



            except:
                #Most free proxies will often get connection errors. You will have retry the entire request using another proxy to work. 
                #We will just skip retries as its beyond the scope of this tutorial and we are only downloading a single url 
                print("Skipping. Connnection error")

#add to settings proxy list
        settings.ROTATING_PROXY_LIST = new_proxy_list

问题是如何获取和检查免费代理？看一看proxybrokerIts如何确保selenium使用免费代理功能中的值并由scrapy rotating proxy Library更新selenium启动后无法更改其代理。你需要一个合适的代理旋转器。这就是为什么selenium…或其他长会话存在3、5、15分钟旋转器的原因。问题是如何获取和检查免费代理？看一看proxybrokerIts如何确保selenium使用免费代理功能中的值并由scrapy rotating proxy Library更新selenium启动后无法更改其代理。你需要一个合适的代理旋转器。这就是为什么selenium…或其他长会话存在3、5、15分钟旋转器的原因。