Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/security/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 硒+;壁虎河故障排除_Python 2.7_Selenium_Selenium Webdriver_Geckodriver_Firefox Marionette - Fatal编程技术网

Python 2.7 硒+;壁虎河故障排除

Python 2.7 硒+;壁虎河故障排除,python-2.7,selenium,selenium-webdriver,geckodriver,firefox-marionette,Python 2.7,Selenium,Selenium Webdriver,Geckodriver,Firefox Marionette,我正在用Python中带有selenium的FirefoxGecko驱动程序抓取论坛帖子标题,但遇到了一个我似乎无法解决的问题 ~$ geckodriver --version geckodriver 0.19.0 The source code of this program is available from testing/geckodriver in https://hg.mozilla.org/mozilla-central. This program is subject to

我正在用Python中带有selenium的FirefoxGecko驱动程序抓取论坛帖子标题,但遇到了一个我似乎无法解决的问题

~$ geckodriver --version
geckodriver 0.19.0

The source code of this program is available from
testing/geckodriver in https://hg.mozilla.org/mozilla-central.

This program is subject to the terms of the Mozilla Public License 2.0.
You can obtain a copy of the license at https://mozilla.org/MPL/2.0/.
我正试图从论坛上搜集几年前的文章标题,我的代码在一段时间内运行良好。我已经坐着看它运行了大约20-30分钟,它完全做到了它应该做的事情。然而,当我开始写剧本,上床睡觉,第二天早上醒来时,我发现它已经处理了22000篇帖子。我目前正在抓取的网站每页有25篇帖子,所以在崩溃之前,它通过了880个单独的URL

当它崩溃时,抛出以下错误:

WebDriverException: Message: Tried to run command without establishing a connection
最初,我的代码如下所示:

FirefoxProfile = webdriver.FirefoxProfile('/home/me/jupyter-notebooks/FirefoxProfile/')
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()
我也试过:

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()

我在所有3个场景中都遇到了相同的错误,但只是在它成功运行了一段时间之后,我不知道如何确定它失败的原因


在成功处理了几百个url后,如何确定为什么会出现此错误?或者,对于Selenium/Firefox处理如此多的页面,是否有一些最佳实践我没有遵循?

所有3个代码块都近乎完美,但都有如下小缺陷:

您的第一个代码块是:

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()
for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()
有一个问题,代码块看起来很有前途。在最后一步中,根据
最佳实践
我们必须调用
而不是
driver.close()
,这将防止
系统内存中存在的悬挂
实例。您可以找到
driver.close()
driver.quit()
的区别

第二个代码块是:

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()
for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()
此块容易出错。一旦执行进入
for()
循环,并在
url上运行,我们将关闭
浏览器会话/实例。因此,当执行开始第二次迭代的循环时,
driver.get(url)
上的脚本会出错,因为没有活动的浏览器会话

您的第三个代码块是:

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()
for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()
代码块的组成与第一个代码块的问题几乎相同。在最后一步中,我们必须调用
,而不是
driver.close()
,这将防止
webdriver
系统内存中的实例悬空。由于悬空的
实例创建了一些琐事,并在某个时间点继续占用端口
webdriver
无法找到空闲端口或无法打开新的
浏览器会话/连接。因此,您将错误视为WebDriverException:Message:尝试在不建立连接的情况下运行命令

解决方案:
根据
最佳实践
尝试调用
而不是
driver.close()
,并打开一个新的
实例和一个新的
所有3个代码块都近乎完美,但都有如下小缺陷:

您的第一个代码块是:

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()
for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()
有一个问题,代码块看起来很有前途。在最后一步中,根据
最佳实践
我们必须调用
而不是
driver.close()
,这将防止
系统内存中存在的悬挂
实例。您可以找到
driver.close()
driver.quit()
的区别

第二个代码块是:

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()
for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()
此块容易出错。一旦执行进入
for()
循环,并在
url上运行,我们将关闭
浏览器会话/实例。因此,当执行开始第二次迭代的循环时,
driver.get(url)
上的脚本会出错,因为没有活动的浏览器会话

您的第三个代码块是:

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()
for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()
代码块的组成与第一个代码块的问题几乎相同。在最后一步中,我们必须调用
,而不是
driver.close()
,这将防止
webdriver
系统内存中的实例悬空。由于悬空的
实例创建了一些琐事,并在某个时间点继续占用端口
webdriver
无法找到空闲端口或无法打开新的
浏览器会话/连接。因此,您将错误视为WebDriverException:Message:尝试在不建立连接的情况下运行命令

解决方案:
根据
最佳实践
尝试调用
驱动程序。退出()
而不是
驱动程序。关闭()
并打开一个新的
实例和一个新的
Web浏览器会话
谢谢您的详细解释!这解决了这个问题,但现在我得到了一个IO错误后,它已经运行了一段时间,我无法找出。我将为此打开一个新问题。IOError:[Errno 2]没有这样的文件或目录:'/tmp/tmpYDZJ4E/webdriver py profilecopy/user.js'实际上,此错误似乎与对driver.quit()的更改有关。Firefox第一次打开时运行正常,但是在.quit()之后,当它试图打开下一个窗口时会抛出此错误。感谢您的帮助