使用Python的图像下载问题_Python_Iteration_Imagedownload

使用Python的图像下载问题

python

使用Python的图像下载问题,python,iteration,imagedownload,Python,Iteration,Imagedownload,我正在通过python使用requests.get（）函数通过URL下载图像。当我给这个函数提供一个URL时，它会下载。但是当在for循环中给定一些1000 URL的时，一些生成的图像就会损坏。但是如果我在浏览器中打开损坏图像的URL，我们可以看到该图像，因此URL似乎没有问题。为什么会这样这似乎是一些反蜘蛛的对策。您所需要做的就是用Python修饰您的HTTP头。默认情况下，HTTP头中的“代理”段告诉网站它们是“python”：在python中，您可以尝试使用： # _*_coding:

我正在通过python使用

requests.get（）

函数通过

URL下载图像。当我给这个函数提供一个URL时，它会下载。但是当在for循环中给定一些1000 URL的时，一些生成的图像就会损坏。但是如果我在浏览器中打开损坏图像的URL
，我们可以看到该图像，因此URL
似乎没有问题。为什么会这样
 这似乎是一些反蜘蛛的对策。您所需要做的就是用Python修饰您的HTTP头。默认情况下，HTTP头中的“代理”段告诉网站它们是“python”：
在python中，您可以尝试使用：
# _*_coding:utf-8 _*_
# @Time    : 2019/4/22 15:51
# @Author  : Shek 
# @FileName: m2.py
# @Software: PyCharm

import requests

# header modify function
def get_header(agent, referer, host):
# just for example, you can crawl it from your Google Chrome Browser with F12
    header = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Connection': 'keep-alive',
        'Host': host,
        'Cache-Control': 'max-age=0',
        'Referer': referer,
        'Cookie':'bla bla bla',
        'User-Agent': agent
    }
    return header

# requests part
req_session = requests.Session()
req = req_session.get(url='your.url', headers=get_header(agent='your.agent',referer='your.referer',host='your.host'), timeout=10)

# save part
with open('filename.jpg', 'w') as file_wr:
    file_wr.write(req.content)

file_wr.close()


您能提供您的代码让我们检查是否存在问题吗？请阅读