Python Mechanize浏览器:HTTP错误460

Python Mechanize浏览器:HTTP错误460,python,screen-scraping,mechanize,Python,Screen Scraping,Mechanize,我正在尝试使用mechanize浏览器登录到一个站点,并收到一个HTTP 460错误,这似乎是一个虚构的错误,所以我不确定该如何处理它。代码如下: # Browser br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_redirect(True) br

我正在尝试使用mechanize浏览器登录到一个站点,并收到一个HTTP 460错误,这似乎是一个虚构的错误,所以我不确定该如何处理它。代码如下:

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

br.open("https://foo.com/login")
br.select_form(nr=1)

br[br.form.controls[2].name] = "login@gmail.com" #I can't select the form or controls by name because they change every time
br[br.form.controls[3].name] = "mypassword"
br.method = "post"

response = br.submit()
以下是启用mechanize调试消息时出现的错误:

>>> response = br.submit()
send: 'POST /login/signin.logincomponent_0.signinform HTTP/1.1\r\nAccept-Encodin
g: identity\r\nContent-Length: 599\r\nConnection: close\r\nUser-Agent: Mozilla/5
.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 F
irefox/3.0.1\r\nHost: myaccount.foo.com\r\nCookie: DC=origin1; STUB_SESS=fil
ler%7E%5E%7E0%7Cguid%7E%5E%7E30909BA355883C551B421713700871E5%7E%5E%7E04%2F09%2F
2014; TLTHID=33A41894C02B10C01D1CF554572C7A31; TLTSID=FFDDD892C02A10C01C0BF55457
2C7A31; STUB_SESSION=filler%7E%5E%7E0%7Cstub_sid%7E%5E%7E0%7E%5E%7E04%2F09%2F201
4; JSESSIONID=B6E04AC06D5885942E299F67EE421640\r\nReferer: https://myaccount.foo
.com/login/Signin?\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\
ntLGGfIeGONQt=H4sIAAAAAAAAAJWQvUoDURCFx0AQEmwEa1ES7G4sTKNVCgUhkeBqLbN3Z9cr98%2B5
N25sfBSfQPISKex8Bx%2FA1spCsxo7w9p%2BzDnfYZ7eoFl2YDdRhVX2ULtCWemMd5ZsvNoXFSCDSgeG
vuNCoEd5TSKipxD5vi%2BkY9IqFSkGEoP0C6KMJ4p01kkoTnz3ct5%2B3Xr%2BaMDaENrS2chOn6GhCJ
vDG7zDnkZb9JLIyhZHUx%2BhVVmPF9ba2wb%2F3TZmJymEZJIaFYJydj7LDvL3x5cGwNSXe9Bd6fUYQu
k4C7fwABBho6LjH1o7vkg3yx3Y%2Fus6GqZc%2FWrWL0bnlJ9mNSLf1Sv%2Bx2TIpMSGlu2tJRpRvWDl
%2BATZ7YRMRAIAAA%3D%3D&GkhkHrHNkEGO=N&NgSEvMJNtPPU=login%40gmail.com&tl
BhliqPEpQP=mypassword&NTFAoHFKrewo=184f4acf-1300-4e65-a81d-3092301d87c213970777534
13&signIn=signIn&shs8q2kGs88H=1979975621'
reply: 'HTTP/1.1 460 Unknown\r\n'
header: Content-Type: text/html
header: Content-Length: 0
header: Content-Language: en
header: Cache-Control: no-cache
header: Cache-Control: no-store
header: Cache-Control: must-revalidate
header: Cache-Control: max-age=0
header: Cache-Control: s-maxage=0
header: Cache-Control: private
header: Expires: Wed, 09 Apr 2014 21:09:40 GMT
header: Pragma: no-cache
header: Date: Wed, 09 Apr 2014 21:09:40 GMT
header: Connection: close
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 541, in submit
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 203, in open
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 255, in _mech_open
mechanize._response.httperror_seek_wrapper: HTTP Error 460: Unknown

但是我认为我在这里没有正确使用br.click()。

我确定该网站使用Javascript作为登录身份验证机制的一部分,因此Mechanize无法正确模拟浏览器。我切换到Selenium,它能够处理javascript并让我成功登录。

你可以问问维护你正在尝试的站点的人吗?正如您正确指出的,问题在于460不是一个公开定义的错误。这是4xx“Client error”系列中的一个错误,这意味着(假定开发人员出于这个原因选择了它)应用程序不喜欢您的请求。然而,这似乎是只有开发它的人才能回答的问题。可怕的是,我仅仅从发布的内容就可以认出这个网站。(你说得对,硒是个不错的选择)
response = br.form.click(br.form.controls[6].name)