Python urlib2.urlopen通过代理在几次调用后失败_Python_Authentication_Proxy_Urllib2_Urlopen

Python urlib2.urlopen通过代理在几次调用后失败

python authentication proxy

Python urlib2.urlopen通过代理在几次调用后失败,python,authentication,proxy,urllib2,urlopen,Python,Authentication,Proxy,Urllib2,Urlopen,编辑：经过多次修改后，urllib2失败时，urlgrabber似乎成功了，即使在每个文件之后都告诉它关闭连接。似乎urllib2处理代理的方式或我使用代理的方式有问题！无论如何，下面是在循环中检索文件的最简单代码：导入urlgrabber 对于范围（1100）内的i： url=”http://www.iana.org/domains/example/" urlgrabber.urlgrab（url，代理={'http'：'http://:@:'}，keepalive=1，close_con

编辑：经过多次修改后，urllib2失败时，urlgrabber似乎成功了，即使在每个文件之后都告诉它关闭连接。似乎urllib2处理代理的方式或我使用代理的方式有问题！无论如何，下面是在循环中检索文件的最简单代码：

导入urlgrabber
对于范围（1100）内的i：
url=”http://www.iana.org/domains/example/"
urlgrabber.urlgrab（url，代理={'http'：'http://:@:'}，keepalive=1，close_connection=1，throttle=0）

大家好

我试图编写一个非常简单的python脚本，通过urllib2获取一堆文件

此脚本需要在工作时通过代理工作（如果在intranet上抓取文件，即没有代理，则我的问题不存在）

该脚本在两次请求后失败，并显示“HTTPError:HTTPError 401:basic auth failed”。知道为什么吗？代理似乎拒绝我的身份验证，但为什么？前两个urlopen请求正确通过

编辑：在请求之间添加10秒的睡眠时间，以避免代理可能进行的某种限制，但不会更改结果

下面是我的脚本的简化版本（显然已删除标识信息）：

导入urllib2
passmgr=urllib2.HTTPPasswordMgrWithDefaultRealm（）
passmgr.add_密码（无“：”、“，”）
authinfo=urllib2.ProxyBasicAuthHandler（passmgr）
proxy_support=urllib2.ProxyHandler（{“http”：“”}）
opener=urlib2.build\u opener（authinfo，代理\u支持）
urllib2.install_opener（opener）
对于范围（100）内的i：
以open（“e:/tmp/images/tst{}.htm.”格式（i），“w”）作为输出文件：
f=urllib2.urlopen（“http://www.iana.org/domains/example/")
outfile.write（f.read（））

提前谢谢

代理可能正在限制您的请求。我猜它认为你看起来像个机器人

您可以添加一个超时，看看是否可以通过。

您可以使用模块中的keepalive处理程序来最小化连接数

我不确定这将与您的代理设置正常工作。

您可能需要破解keepalive模块。

谢谢您的建议！尽管请求之间有整整10秒的睡眠，但仍然没有乐趣。。。这真奇怪！似乎keepalive模块已经从urlgrabber中消失了（请参见jwat的答案：）。但是，urlgrabber.urlgrab支持代理并成功检索了所有文件。我已经在问题中添加了相关代码。

import urlgrabber

for i in range(1, 100):
    url = "http://www.iana.org/domains/example/"
    urlgrabber.urlgrab(url, proxies={'http':'http://<user>:<password>@<proxy url>:<proxy port>'}, keepalive=1, close_connection=1, throttle=0)

import urllib2

passmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
passmgr.add_password(None, '<proxy url>:<proxy port>', '<my user name>', '<my password>')
authinfo = urllib2.ProxyBasicAuthHandler(passmgr)

proxy_support = urllib2.ProxyHandler({"http" : "<proxy http address>"})
opener = urllib2.build_opener(authinfo, proxy_support)
urllib2.install_opener(opener)

for i in range(100):
with open("e:/tmp/images/tst{}.htm".format(i), "w") as outfile:
    f = urllib2.urlopen("http://www.iana.org/domains/example/")
    outfile.write(f.read())

import urllib2
from keepalive import HTTPHandler
keepalive_handler = HTTPHandler()
opener = urllib2.build_opener(keepalive_handler)
urllib2.install_opener(opener)

fo = urllib2.urlopen('http://www.python.org')