Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/http/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中修改此下载函数?_Python_Http_Url_Urllib - Fatal编程技术网

如何在Python中修改此下载函数?

如何在Python中修改此下载函数?,python,http,url,urllib,Python,Http,Url,Urllib,现在还不确定。Gzip,图像,有时不起作用 如何修改此下载功能,使其可以处理任何内容?(不考虑gzip或任何标题?) 如果是gzip,我如何自动“检测”?我不想像现在这样总是通过真/假 def download(source_url, g = False, correct_url = True): try: socket.setdefaulttimeout(10) agents = ['Mozilla/4.0 (compatible; MSIE 5.5;

现在还不确定。Gzip,图像,有时不起作用

如何修改此下载功能,使其可以处理任何内容?(不考虑gzip或任何标题?)

如果是gzip,我如何自动“检测”?我不想像现在这样总是通过真/假

def download(source_url, g = False, correct_url = True):
    try:
        socket.setdefaulttimeout(10)
        agents = ['Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)','Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)','Microsoft Internet Explorer/4.0b1 (Windows 95)','Opera/8.00 (Windows NT 5.1; U; en)']
        ree = urllib2.Request(source_url)
        ree.add_header('User-Agent',random.choice(agents))
        ree.add_header('Accept-encoding', 'gzip')
        opener = urllib2.build_opener()
        h = opener.open(ree).read()
        if g:
            compressedstream = StringIO(h)
            gzipper = gzip.GzipFile(fileobj=compressedstream)
            data = gzipper.read()
            return data
        else:
            return h
    except Exception, e:
        return ""

要检测正在下载的数据类型,应将
h=opener.open(ree).read()替换为
h=opener.open(ree)

现在在h中有了response对象。您可以使用h.headers(类似dict)对象来分析标头。您尤其会对标题“内容类型”和“内容编码”感兴趣。您可以通过分析来确定发送的内容

def download(source_url, correct_url = True):
    try:
        socket.setdefaulttimeout(10)
        agents = ['Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)','Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)','Microsoft Internet Explorer/4.0b1 (Windows 95)','Opera/8.00 (Windows NT 5.1; U; en)']
        ree = urllib2.Request(source_url)
        ree.add_header('User-Agent',random.choice(agents))
        ree.add_header('Accept-encoding', 'gzip')
        opener = urllib2.build_opener()
        h = opener.open(ree)
        if 'gzip' in h.headers.get('content-type', '') or
           'gzip' in h.headers.get('content-encoding', ''):
            compressedstream = StringIO(h.read())
            gzipper = gzip.GzipFile(fileobj=compressedstream)
            data = gzipper.read()
            return data
        else:
            return h.read()
    except Exception, e:
        return ""

检查
内容编码
标题:

import urllib2
import socket
import random
import StringIO
import gzip

def download(source_url):
    try:
        socket.setdefaulttimeout(10)
        agents = ['Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)','Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)','Microsoft Internet Explorer/4.0b1 (Windows 95)','Opera/8.00 (Windows NT 5.1; U; en)']
        ree = urllib2.Request(source_url)
        ree.add_header('User-Agent',random.choice(agents))
        ree.add_header('Accept-encoding', 'gzip')
        opener = urllib2.build_opener()
        response = opener.open(ree)
        encoding=response.headers.getheader('Content-Encoding')
        content = response.read()
        if encoding and 'gzip' in encoding:
            compressedstream = StringIO.StringIO(content)
            gzipper = gzip.GzipFile(fileobj=compressedstream)
            data = gzipper.read()
            return data
        else:
            return content
    except urllib2.URLError as e:
        return ""

data=download('http://api.stackoverflow.com/1.0/questions/3708418?type=jsontext')
print(data)
如果您正在处理的服务器没有将内容编码报告为gzip,那么您可以通过
try
ing首先进行攻击:

def download(source_url):
    try:
        socket.setdefaulttimeout(10)
        agents = ['Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)','Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)','Microsoft Internet Explorer/4.0b1 (Windows 95)','Opera/8.00 (Windows NT 5.1; U; en)']
        ree = urllib2.Request(source_url)
        ree.add_header('User-Agent',random.choice(agents))
        ree.add_header('Accept-encoding', 'gzip')
        opener = urllib2.build_opener()
        response = opener.open(ree)
        content = response.read()
        compressedstream = StringIO.StringIO(content)
        gzipper = gzip.GzipFile(fileobj=compressedstream)
        try:
            data = gzipper.read()
        except IOError:
            data = content
        return data        
    except urllib2.URLError as e:
        return ""

“这是不确定的。Gzip,图像,有时它不起作用。”这是什么意思?
import urllib2
import StringIO
import gzip

req = urllib2.Request('http:/foo/')
h = urllib2.urlopen(req)
data = resp.read()
if 'gzip' in resp.headers['Content-Encoding']:
    compressedstream = StringIO(h)
    gzipper = gzip.GzipFile(fileobj=compressedstream)
    data = gzipper.read()

# etc...