Python Selenium Webdriver正在使用[Errno 10054]停止

Python Selenium Webdriver正在使用[Errno 10054]停止,python,firefox,selenium,selenium-webdriver,Python,Firefox,Selenium,Selenium Webdriver,我正在尝试运行一个Python 2.7.0例程,该例程使用Selenium 2.37.2启动Firefox 26.0浏览器,并将查询提交到Google n-grams站点(都在我的Windows 8机器上)。对于输入文件中的前十个条目,程序工作正常,然后通过以下回溯停止: Traceback (most recent call last): File "C:\Python27\lib\lib-tk\Tkinter.py", line 1410, in __call__ return

我正在尝试运行一个Python 2.7.0例程,该例程使用Selenium 2.37.2启动Firefox 26.0浏览器,并将查询提交到Google n-grams站点(都在我的Windows 8机器上)。对于输入文件中的前十个条目,程序工作正常,然后通过以下回溯停止:

Traceback (most recent call last):
  File "C:\Python27\lib\lib-tk\Tkinter.py", line 1410, in __call__
    return self.func(*args)
  File "C:\Users\Douglas\Desktop\n-grams\n_gram_api.py", line 43, in query_n_gra
ms
    driver.get("https://books.google.com/ngrams")
  File "C:\Python27\lib\site-packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\webdriver.py", line 176, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python27\lib\site-packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\webdriver.py", line 162, in execute
    response = self.command_executor.execute(driver_command, params)
  File "C:\Python27\lib\site-packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\remote_connection.py", line 355, in execute
    return self._request(url, method=command_info[0], data=data)
  File "C:\Python27\lib\site-packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\remote_connection.py", line 402, in _request
    response = opener.open(request)
  File "C:\Python27\lib\urllib2.py", line 391, in open
    response = self._open(req, data)
  File "C:\Python27\lib\urllib2.py", line 409, in _open
    '_open', req)
  File "C:\Python27\lib\urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 1173, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "C:\Python27\lib\urllib2.py", line 1148, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 10054] An existing connection was forcibly close
d by the remote host>
回溯(最近一次呼叫最后一次):
文件“C:\Python27\lib\lib tk\Tkinter.py”,第1410行,在uu调用中__
返回self.func(*args)
文件“C:\Users\Douglas\Desktop\n-grams\n\u gram\u api.py”,第43行,在query\n\u gra中
太太
驱动程序。获取(“https://books.google.com/ngrams")
文件“C:\Python27\lib\site packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\webdriver.py”,get中第176行
self.execute(Command.GET,{'url':url})
文件“C:\Python27\lib\site packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\webdriver.py”,执行中的第162行
响应=self.command\u executor.execute(driver\u command,params)
文件“C:\Python27\lib\site packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\remote_connection.py”,执行中的第355行
返回self.\u请求(url,方法=命令\u信息[0],数据=数据)
文件“C:\Python27\lib\site packages\selenium-2.37.2-py2.7.egg\selenium\webdriv
er\remote\remote\u connection.py”,请求中第402行
响应=opener.open(请求)
文件“C:\Python27\lib\urllib2.py”,第391行,打开
响应=自身打开(请求,数据)
文件“C:\Python27\lib\urllib2.py”,第409行,处于打开状态
"开放",
文件“C:\Python27\lib\urllib2.py”,第369行,在调用链中
结果=func(*args)
文件“C:\Python27\lib\urllib2.py”,第1173行,在http\u open中
返回self.do_open(httplib.HTTPConnection,req)
文件“C:\Python27\lib\urllib2.py”,第1148行,在do_open中
引发URL错误(err)
URL错误:
我找到了许多讨论错误消息的信息站点,但我一直无法弄清楚为什么我自己的进程在for循环中经过十次迭代后会停止。下面是我正在运行的代码(很抱歉,代码有点长,我不想修剪它,以防罪魁祸首隐藏在GUI中):

从selenium导入webdriver
从selenium.webdriver.common.keys导入密钥
从Tkinter进口*
将Tkinter作为tk导入
从tkFileDialog导入askopenfilename
导入时间
#出去
out=打开(“n_grams_outfile.txt”、“w”)
out.write(“搜索字符串“+”\t“+”发布年份“+”\t“+”频率“+”\n”)
#创建一个函数,该函数将返回用户提供的文件的文件路径
用户定义的文件路径={}
def selectfile():
现在可以在全局范围内访问用户定义的文件路径['filename']=askopenfilename(文件类型=[(“Text”,“*.txt”)])\user定义的文件路径['filename']。
#当按下开始按钮时,创建我们将调用的函数
def查询图表(事件=”):
#创建二进制开关,我们将仅在第一次使用时启动新浏览器。将默认值设置为true
第一次通过=1
#识别输入文件
inputfile=用户定义的文件路径['filename']
readinputfile=open(inputfile).read()
stringinputfile=str(readinputfile)

#假设输入文件=tsv。Left-hand column=len的字符串我调用了Firefox 23而不是26,这解决了问题

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from Tkinter import *
import Tkinter as tk
from tkFileDialog import askopenfilename
import time
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

#out
out = open("n_grams_outfile.txt", "w")
out.write("search string" + "\t" + "pub year" + "\t" + "frequency" + "\n")

#create a function that will return the filepath for a file provided by the user
user_defined_filepath = {}
def selectfile():
    user_defined_filepath['filename'] = askopenfilename(filetypes=[("Text","*.txt")]) # user_defined_filepath['filename'] may now be accessed in the global scope.

#create function we'll call when start button is pressed
def query_n_grams(event = "<Button>"):

    #create binary switch we'll use to only start new browser in first pass. Set default to true
    first_pass = 1

    #identify the input file
    inputfile = user_defined_filepath['filename']
    readinputfile = open(inputfile).read()
    stringinputfile = str(readinputfile)

    #assume input file = tsv. Left hand column = string of len <= 6; right hand column = pub year of text
    split_by_row = stringinputfile.split("\n")
    for row in split_by_row:

        #because the program will yelp if it reaches the end of the input file and then tries to split an empty line on "\t", wrap call in try/except
        try:
            search_terms = row.split("\t")[0]
            actual_pub_year = row.split("\t")[1]
        except IndexError:
            pass

        pub_year_minus_five = int(actual_pub_year) - 5
        pub_year_plus_five = int(actual_pub_year) + 5        

        #you now have terms and pub yaer. Fire up webdriver and ride, cowboy
        if first_pass == 1:

            binary = FirefoxBinary('C:\Text\Professional\Digital Humanities\Programming Languages\Python\Query Literature Online\LION 3.0\Firefox Versions\Firefox23\FirefoxPortable.exe')
            driver = webdriver.Firefox(firefox_binary=binary)

            first_pass = 0

        #otherwise, use extant driver
        driver.implicitly_wait(10)
        driver.get("https://books.google.com/ngrams")
        driver.refresh()
        driver.implicitly_wait(10)

        #send keys
        driver.implicitly_wait(10)
        keyword = driver.find_element_by_class_name("query")
        driver.implicitly_wait(10)
        keyword.clear()
        driver.implicitly_wait(10)
        keyword.send_keys(str(search_terms))
        driver.implicitly_wait(10)

        #find start year
        driver.implicitly_wait(10)
        start_year = driver.find_element_by_name("year_start")
        driver.implicitly_wait(10)
        start_year.clear()
        driver.implicitly_wait(10)
        start_year.send_keys(str(pub_year_minus_five))
        driver.implicitly_wait(10)

        #find end year
        driver.implicitly_wait(10)
        end_year = driver.find_element_by_name("year_end")
        driver.implicitly_wait(10)
        end_year.clear()
        driver.implicitly_wait(10)
        end_year.send_keys(pub_year_plus_five)
        driver.implicitly_wait(10)

        #click enter
        driver.implicitly_wait(10)
        submit_button = driver.find_element_by_class_name("kd_submit")
        driver.implicitly_wait(10)
        submit_button.click()
        driver.implicitly_wait(10)

        #grab html
        driver.implicitly_wait(10)
        html = driver.page_source
        driver.implicitly_wait(10)

        #if you run a search that yields no hits, can't split the html, so use try/except
        try:

            #we want the list object that comes right after timeseries and before the comma
            desired_percent_figures = html.split('"timeseries": [')[1].split("]")[0]

            #now desired_percent_figures contains comma-separated list of percents (which we still need to convert out of mathematical notation). Convert out of mathematical notation (with e)
            percents_as_list = desired_percent_figures.split(",")

            #convert to ints
            percent_list_as_ints = [float(i) for i in percents_as_list]

            #take your list and find mean
            mean_percent = sum(percent_list_as_ints) / float(len(percent_list_as_ints))

            out.write(str(search_terms) + "\t" + str(actual_pub_year) + "\t" + str(mean_percent) + "\n")

        #you'll get IndexError if you run a query like "Hello Garrett" for which there are no entries in the database at all. (Other queries, like 'animal oeconomy' for year 1700, yields result 0, but because search string is in database elsewhere, won't throw IndexError)
        except IndexError:

            mean_percent = "0.0"

            #because we got an index error, we know that the search yielded no results. so let's type 0.0 as percent
            out.write(str(search_terms) + "\t" + str(actual_pub_year) + "\t" + str(mean_percent) + "\n")

#create TK frame
root = tk.Tk()
canvas = tk.Canvas(root, width=157, height=100)
canvas.pack()

#create label for tk
ngram_label = tk.Button(root, text = "Google N-Gram API", command = "", anchor = 'w', width = 14, activebackground = "#33B5E5")
ngram_label_canvas = canvas.create_window(20, 20, anchor='nw', width = 119, window=ngram_label)

#create a button that allows users to find a file for analysis
file_label = tk.Button(root, text = "Input file", command = selectfile, anchor = 'w', width = 7, activebackground = "#33B5E5")
file_label_canvas = canvas.create_window(20, 60, anchor='nw', window=file_label)

#create a start button that allows users to submit selected parameters and run the "startviewing" processes
start_label = tk.Button(root, text = "Go!", command = query_n_grams, anchor = 'w', width = 3, activebackground = "#33B5E5")
start_label_canvas = canvas.create_window(107, 60, anchor='nw', window=start_label)

root.mainloop()
从selenium导入webdriver
从selenium.webdriver.common.keys导入密钥
从Tkinter进口*
将Tkinter作为tk导入
从tkFileDialog导入askopenfilename
导入时间
从selenium.webdriver.firefox.firefox\u二进制文件导入FirefoxBinary
#出去
out=打开(“n_grams_outfile.txt”、“w”)
out.write(“搜索字符串“+”\t“+”发布年份“+”\t“+”频率“+”\n”)
#创建一个函数,该函数将返回用户提供的文件的文件路径
用户定义的文件路径={}
def selectfile():
现在可以在全局范围内访问用户定义的文件路径['filename']=askopenfilename(文件类型=[(“Text”,“*.txt”)])\user定义的文件路径['filename']。
#当按下开始按钮时,创建我们将调用的函数
def查询图表(事件=”):
#创建二进制开关,我们将仅在第一次使用时启动新浏览器。将默认值设置为true
第一次通过=1
#识别输入文件
inputfile=用户定义的文件路径['filename']
readinputfile=open(inputfile).read()
stringinputfile=str(readinputfile)

#假设输入文件=tsv。左栏=一串列我面临着同样的问题。。。问题出在新的Firefox更新中(从46到47),这是一个很大的错误:)

无论如何,我确实这样解决了这个问题

下载并安装Firefox 46:从47.0降级到46.0 您可以通过以下链接执行此操作: 如果您有32位:|或如果您有64位:

注意:在安装旧版本之前,您需要删除当前的firefox:)小心它真的很重要

就这样,你现在准备好了:)

如果你有任何问题的网址,你可以找到你的方式

玩得开心

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from Tkinter import *
import Tkinter as tk
from tkFileDialog import askopenfilename
import time
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

#out
out = open("n_grams_outfile.txt", "w")
out.write("search string" + "\t" + "pub year" + "\t" + "frequency" + "\n")

#create a function that will return the filepath for a file provided by the user
user_defined_filepath = {}
def selectfile():
    user_defined_filepath['filename'] = askopenfilename(filetypes=[("Text","*.txt")]) # user_defined_filepath['filename'] may now be accessed in the global scope.

#create function we'll call when start button is pressed
def query_n_grams(event = "<Button>"):

    #create binary switch we'll use to only start new browser in first pass. Set default to true
    first_pass = 1

    #identify the input file
    inputfile = user_defined_filepath['filename']
    readinputfile = open(inputfile).read()
    stringinputfile = str(readinputfile)

    #assume input file = tsv. Left hand column = string of len <= 6; right hand column = pub year of text
    split_by_row = stringinputfile.split("\n")
    for row in split_by_row:

        #because the program will yelp if it reaches the end of the input file and then tries to split an empty line on "\t", wrap call in try/except
        try:
            search_terms = row.split("\t")[0]
            actual_pub_year = row.split("\t")[1]
        except IndexError:
            pass

        pub_year_minus_five = int(actual_pub_year) - 5
        pub_year_plus_five = int(actual_pub_year) + 5        

        #you now have terms and pub yaer. Fire up webdriver and ride, cowboy
        if first_pass == 1:

            binary = FirefoxBinary('C:\Text\Professional\Digital Humanities\Programming Languages\Python\Query Literature Online\LION 3.0\Firefox Versions\Firefox23\FirefoxPortable.exe')
            driver = webdriver.Firefox(firefox_binary=binary)

            first_pass = 0

        #otherwise, use extant driver
        driver.implicitly_wait(10)
        driver.get("https://books.google.com/ngrams")
        driver.refresh()
        driver.implicitly_wait(10)

        #send keys
        driver.implicitly_wait(10)
        keyword = driver.find_element_by_class_name("query")
        driver.implicitly_wait(10)
        keyword.clear()
        driver.implicitly_wait(10)
        keyword.send_keys(str(search_terms))
        driver.implicitly_wait(10)

        #find start year
        driver.implicitly_wait(10)
        start_year = driver.find_element_by_name("year_start")
        driver.implicitly_wait(10)
        start_year.clear()
        driver.implicitly_wait(10)
        start_year.send_keys(str(pub_year_minus_five))
        driver.implicitly_wait(10)

        #find end year
        driver.implicitly_wait(10)
        end_year = driver.find_element_by_name("year_end")
        driver.implicitly_wait(10)
        end_year.clear()
        driver.implicitly_wait(10)
        end_year.send_keys(pub_year_plus_five)
        driver.implicitly_wait(10)

        #click enter
        driver.implicitly_wait(10)
        submit_button = driver.find_element_by_class_name("kd_submit")
        driver.implicitly_wait(10)
        submit_button.click()
        driver.implicitly_wait(10)

        #grab html
        driver.implicitly_wait(10)
        html = driver.page_source
        driver.implicitly_wait(10)

        #if you run a search that yields no hits, can't split the html, so use try/except
        try:

            #we want the list object that comes right after timeseries and before the comma
            desired_percent_figures = html.split('"timeseries": [')[1].split("]")[0]

            #now desired_percent_figures contains comma-separated list of percents (which we still need to convert out of mathematical notation). Convert out of mathematical notation (with e)
            percents_as_list = desired_percent_figures.split(",")

            #convert to ints
            percent_list_as_ints = [float(i) for i in percents_as_list]

            #take your list and find mean
            mean_percent = sum(percent_list_as_ints) / float(len(percent_list_as_ints))

            out.write(str(search_terms) + "\t" + str(actual_pub_year) + "\t" + str(mean_percent) + "\n")

        #you'll get IndexError if you run a query like "Hello Garrett" for which there are no entries in the database at all. (Other queries, like 'animal oeconomy' for year 1700, yields result 0, but because search string is in database elsewhere, won't throw IndexError)
        except IndexError:

            mean_percent = "0.0"

            #because we got an index error, we know that the search yielded no results. so let's type 0.0 as percent
            out.write(str(search_terms) + "\t" + str(actual_pub_year) + "\t" + str(mean_percent) + "\n")

#create TK frame
root = tk.Tk()
canvas = tk.Canvas(root, width=157, height=100)
canvas.pack()

#create label for tk
ngram_label = tk.Button(root, text = "Google N-Gram API", command = "", anchor = 'w', width = 14, activebackground = "#33B5E5")
ngram_label_canvas = canvas.create_window(20, 20, anchor='nw', width = 119, window=ngram_label)

#create a button that allows users to find a file for analysis
file_label = tk.Button(root, text = "Input file", command = selectfile, anchor = 'w', width = 7, activebackground = "#33B5E5")
file_label_canvas = canvas.create_window(20, 60, anchor='nw', window=file_label)

#create a start button that allows users to submit selected parameters and run the "startviewing" processes
start_label = tk.Button(root, text = "Go!", command = query_n_grams, anchor = 'w', width = 3, activebackground = "#33B5E5")
start_label_canvas = canvas.create_window(107, 60, anchor='nw', window=start_label)

root.mainloop()