使用python的HTTP请求重载/超时_Python_Http_Python 2.7_Python 3.x_Https

使用python的HTTP请求重载/超时

python http python-2.7 python-3.x https

使用python的HTTP请求重载/超时,python,http,python-2.7,python-3.x,https,Python,Http,Python 2.7,Python 3.x,Https,我运行了一个python脚本，它基本上通过http请求1000个URL并记录它们的响应。下面是下载url页面的函数 def downld_url(url, output): print "Entered Downld_url and scraping the pdf/doc/docx file now..." global error try: # determine all extensions we should account for

我运行了一个python脚本，它基本上通过http请求1000个URL并记录它们的响应。下面是下载url页面的函数

def downld_url(url, output):
     print "Entered Downld_url and scraping the pdf/doc/docx file now..."
     global error
     try:
        # determine all extensions we should account for
        f = urllib2.urlopen(url)
        data = f.read()
        dlfn = urlsplit(url).path.split('.')[-1]
        print "The extension of the file is: " + str(dlfn)
        dwnladfn = ImageDestinationPath + "/" + output + "." + dlfn
        with open(dwnladfn, "wb") as code:
            code.write(data)
            code.close()
        _Save_image_to_s3(output+"."+dlfn, ImageDestinationPath + "/" +output + 
                          "." + dlfn)
        print dlfn + " file saved to S3"
        os.remove(ImageDestinationPath + "/" +output + "." + dlfn)
        print dlfn + "file removed from local folder"
        update_database(output,output+"."+dlfn, None)
        return
     except Exception as e:
        error = "download error: " + str(e)
        print "Error in downloading file: " + error
        return

现在，对于管道中的100-200个URL，这可以顺利运行，但之后响应开始变得非常缓慢，最终响应超时。我猜这是因为请求过载。是否有一些有效的方法可以在不过载请求的情况下实现这一点？

我不知道问题来自何方，但如果它与在同一进程中有太多请求有关，您可以尝试作为解决方法

它还可以加速整个过程，因为您可以同时执行多个任务（例如，一个进程正在下载，而另一个进程正在写入磁盘，…）。我这样做是为了做一件类似的事情，它确实更好（也提高了总下载速度）

注意：这些url大多是下载的.png和.pdf文件。无关：使用

urlparse

解析url和

os.path

，

posixpath

操作请求变慢的路径：是

urlopen（）

，或者

\u将图像保存到\u s3

，或者

更新数据库（）

，或者其他什么？