Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python/mechanize select_form()时出错?_Python_Mechanize_Web Scraping - Fatal编程技术网

使用Python/mechanize select_form()时出错?

使用Python/mechanize select_form()时出错?,python,mechanize,web-scraping,Python,Mechanize,Web Scraping,我正试图从网站上删除一些数据。 我尝试编写的脚本应包含页面内容: http://www.atpworldtour.com/Rankings/Singles.aspx 应模拟用户通过每个选项获得额外排名和日期,并模拟点击Go,然后在获取数据后应使用back功能 目前,我一直在尝试选择此选项作为附加站姿: <option value="101" >101-200</option> 但是,它只是在select_表单(nr=0)上失败,该表单应选择第一

我正试图从网站上删除一些数据。 我尝试编写的脚本应包含页面内容:

http://www.atpworldtour.com/Rankings/Singles.aspx
应模拟用户通过每个选项获得额外排名和日期,并模拟点击Go,然后在获取数据后应使用back功能

目前,我一直在尝试选择此选项作为附加站姿:

            <option value="101" >101-200</option>
但是,它只是在select_表单(nr=0)上失败,该表单应选择第一个表单

这是Python返回的日志:

>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError
来自mechanize导入浏览器的
>>
>>>
>>>从BeautifulSoup导入BeautifulSoup
>>>进口稀土
>>>导入urllib2
>>>
>>>
>>>
>>>br=浏览器();
>>>br.打开(“http://www.atpworldtour.com/Rankings/Singles.aspx");
>>>br.选择表格(nr=0);
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“build\bdist.win-amd64\egg\mechanize\\u mechanize.py”,第505行,选择_
形式
文件“build\bdist.win-amd64\egg\mechanize\\ u html.py”,第546行,在\uu getattr中__
文件“build\bdist.win-amd64\egg\mechanize\\ u html.py”,第559行,格式为
文件“build\bdist.win-amd64\egg\mechanize\\ u html.py”,第228行,格式为
mechanize.\u html.ParseError
我在mechanize主页上找不到对所有功能的正确解释。有人能给我指出一个关于使用表单和机械化的正确教程,或者在这个问题上帮助我吗


Anthony

我认为您正确地使用了该库,但是解析器似乎在该特定页面上遇到了问题。我在另一个页面(“”)上以相同的方式使用库,它不会产生错误。

我刚刚遇到同样的问题。我访问的页面通过了W3C验证,所以我认为这不是标记问题。然而,HTMLTidy抱怨该页面在一个范围内有一个错误。一旦我修好了,mechanize就开始工作了


另外,我在邮件列表上看到了对这个问题的回复。我只想指出,将factory=mechanize.RobustFactory()添加到mechanize.Browser()不会改变结果。

提示:定义有关mechanize.Browser()的更多信息

>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError