使用Python/mechanize select_form（）时出错？_Python_Mechanize_Web Scraping

使用Python/mechanize select_form（）时出错？

python web-scraping

使用Python/mechanize select_form（）时出错？,python,mechanize,web-scraping,Python,Mechanize,Web Scraping,我正试图从网站上删除一些数据。我尝试编写的脚本应包含页面内容： http://www.atpworldtour.com/Rankings/Singles.aspx 应模拟用户通过每个选项获得额外排名和日期，并模拟点击Go，然后在获取数据后应使用back功能目前，我一直在尝试选择此选项作为附加站姿： <option value="101" >101-200</option> 但是，它只是在select_表单（nr=0）上失败，该表单应选择第一

我正试图从网站上删除一些数据。我尝试编写的脚本应包含页面内容：

http://www.atpworldtour.com/Rankings/Singles.aspx

应模拟用户通过每个选项获得额外排名和日期，并模拟点击Go，然后在获取数据后应使用back功能

目前，我一直在尝试选择此选项作为附加站姿：

            <option value="101" >101-200</option>

但是，它只是在select_表单（nr=0）上失败，该表单应选择第一个表单

这是Python返回的日志：

>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError

来自mechanize导入浏览器的

>>
>>>
>>>从BeautifulSoup导入BeautifulSoup
>>>进口稀土
>>>导入urllib2
>>>
>>>
>>>
>>>br=浏览器（）；
>>>br.打开（“http://www.atpworldtour.com/Rankings/Singles.aspx");
>>>br.选择表格（nr=0）；
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
文件“build\bdist.win-amd64\egg\mechanize\\u mechanize.py”，第505行，选择_
形式
文件“build\bdist.win-amd64\egg\mechanize\\ u html.py”，第546行，在\uu getattr中__
文件“build\bdist.win-amd64\egg\mechanize\\ u html.py”，第559行，格式为
文件“build\bdist.win-amd64\egg\mechanize\\ u html.py”，第228行，格式为
mechanize.\u html.ParseError

我在mechanize主页上找不到对所有功能的正确解释。有人能给我指出一个关于使用表单和机械化的正确教程，或者在这个问题上帮助我吗

Anthony

我认为您正确地使用了该库，但是解析器似乎在该特定页面上遇到了问题。我在另一个页面（“”）上以相同的方式使用库，它不会产生错误。

我刚刚遇到同样的问题。我访问的页面通过了W3C验证，所以我认为这不是标记问题。然而，HTMLTidy抱怨该页面在一个范围内有一个错误。一旦我修好了，mechanize就开始工作了

另外，我在邮件列表上看到了对这个问题的回复。我只想指出，将factory=mechanize.RobustFactory（）添加到mechanize.Browser（）不会改变结果。

提示：定义有关mechanize.Browser（）的更多信息

>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError