当某些线程创建Webdriver时，Python Selenium失败_Python_Multithreading_Selenium

当某些线程创建Webdriver时，Python Selenium失败

python multithreading selenium

当某些线程创建Webdriver时，Python Selenium失败,python,multithreading,selenium,Python,Multithreading,Selenium,我有一个线程，它获取一个URL，在selenium中请求它并解析数据大多数时候，这根线都很好用。但有时它似乎挂在创建webdriver上，而我似乎无法处理它这是线程的开始： def GetLink(eachlink): trry = 0 #10 Attempts at getting the data while trry < 10: print "Scraping: ", eachlink try: N

我有一个线程，它获取一个URL，在selenium中请求它并解析数据

大多数时候，这根线都很好用。但有时它似乎挂在创建webdriver上，而我似乎无法处理它

这是线程的开始：

def GetLink(eachlink):

    trry = 0 #10 Attempts at getting the data

    while trry < 10:

        print "Scraping:  ", eachlink
        try:

            Numbergrab = []
            Namegrab = []
            Positiongrab = []

            nextproxy = (random.choice(ProxyList))
            nextuseragent = (random.choice(UseragentsList))
            proxywrite = '--proxy=',nextproxy
            service_args = [
            proxywrite,
            '--proxy-type=http',
            '--ignore-ssl-errors=true',
            ]

            dcap = dict(DesiredCapabilities.PHANTOMJS)
            dcap["phantomjs.page.settings.userAgent"] = (nextuseragent)
            pDriver = webdriver.PhantomJS('C:\phantomjs.exe',desired_capabilities=dcap, service_args=service_args)
            pDriver.set_window_size(1024, 768) # optional
            pDriver.set_page_load_timeout(20)

            print "Requesting link: ", eachlink
            pDriver.get(eachlink)
            try:
                WebDriverWait(pDriver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='seat-setting']")))
            except:
                time.sleep(10)

但它会高兴地坐在那里几个小时，什么也不说，直到我实际关闭控制台。然后它会弹出socket.error，并为死掉的线程打印“scraping:link”消息

这实际上表明它甚至在启动while之前就失败了，但是trry在该线程的开头被设置为0，并且没有在其他任何地方被引用。另外，如果它没有selenium webdriver，那么就不会有socket.error，所以它一定也阻止了前面的消息

更新#2：

当运行一个完全相同代码的线程时，它看起来很乐意运行几个小时

但是线程锁并没有什么不同

小男孩被难住了。将尝试子进程而不是线程，以了解其作用

更新#3：

线程不稳定，但子进程很长。好的，Python。

我在多线程和多处理以及使用Firefox、Chrome或PhantomJS时都遇到过这种情况。无论出于何种原因，实例化浏览器的调用（e.q.

driver=webdriver.Chrome（）

）永远不会返回

我的大多数脚本都相对较短，只有很少的线程/进程，所以这个问题并不常见。我有一些脚本，但是，将运行几个小时，并创建和销毁数百个浏览器对象，我保证体验挂起几次运行

我的解决方案是将浏览器实例化放入其自己的函数/方法中，然后使用PyPI提供的众多超时和重试修饰符中的一个来装饰函数/方法：

（这是未经测试的）

timeoutcontext只在主线程中工作，因为它使用signal.alarm（），Python只强制将所有信号传递到主线程。因此，如果您使用多线程并从非主线程调用webdriver，则无法使用此解决方案；您必须进行多进程处理（或者装配一些方法，让主线程为您创建所有新的webdriver实例）。

except:
                trry +=1
                e = sys.exc_info()[0]
                print "Problem scraping link: ", e

from retrying import retry
from selenium import webdriver
from timeoutcontext import timeout, TimeoutException


def retry_if_timeoutexception(exception):
    return isinstance(exception, TimeoutException)


@retry(retry_on_exception=retry_if_timeoutexception, stop_max_attempt_number=3)
@timeout(30)  # Allow the function 30 seconds to create and return the object
def get_browser():
    return webdriver.Chrome()