如何在Python3中下载带有https URL的图像?

如何在Python3中下载带有https URL的图像?,python,image,https,download,Python,Image,Https,Download,我尝试用Python编写一个简短的大规模下载脚本,在本地存储图像列表 对于http图像url,它工作得非常好,但是无法下载任何带有httpsurl的图像。所涉及的代码行为: import urllib.request urllib.request.urlretrieve(url, filename) 比如说,, https://cdn.discordapp.com/attachments/299398003486097412/303580387786096641/FB_IMG_14905345

我尝试用Python编写一个简短的大规模下载脚本,在本地存储图像列表

对于
http
图像url,它工作得非常好,但是无法下载任何带有
https
url的图像。所涉及的代码行为:

import urllib.request
urllib.request.urlretrieve(url, filename)
比如说,,
https://cdn.discordapp.com/attachments/299398003486097412/303580387786096641/FB_IMG_1490534565948.jpg
会导致
HTTP错误403:禁止
,以及任何其他
https
图像

这给我留下了两个问题:

  • 如何使用Python下载这样的图像
  • 如果图像基本上只是文件,为什么它们甚至有
    https
    url
  • 编辑: 以下是堆栈跟踪:

    Traceback (most recent call last):
      File "img_down.py", line 52, in <module>
        main()
      File "img_down.py", line 38, in main
        save_img(d, l)
      File "img_down.py", line 49, in save_img
        stream = read_img(url)
      File "img_down.py", line 42, in read_img
        with urllib.request.urlopen(url) as response:
      File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 223, in urlopen
        return opener.open(url, data, timeout)
      File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 532, in open
        response = meth(req, response)
      File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 642, in http_response
        'http', request, response, code, msg, hdrs)
      File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 570, in error
        return self._call_chain(*args)
      File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 504, in _call_chain
        result = func(*args)
      File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 650, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 403: Forbidden
    
    回溯(最近一次呼叫最后一次):
    文件“img_down.py”,第52行,在
    main()
    文件“img_down.py”,第38行,主目录
    保存图像(d,l)
    文件“img\u down.py”,第49行,在save\u img中
    stream=read\u img(url)
    文件“img_down.py”,第42行,在read_img中
    使用urllib.request.urlopen(url)作为响应:
    文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”,urlopen中的第223行
    返回opener.open(url、数据、超时)
    文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”,第532行,处于打开状态
    响应=方法(请求,响应)
    文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”,第642行,在http\U响应中
    “http”、请求、响应、代码、消息、hdrs)
    文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”第570行出错
    返回自我。调用链(*args)
    文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”,第504行,在调用链中
    结果=func(*args)
    文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”,第650行,默认为http\u error\u
    raise HTTPError(请求完整的url、代码、消息、hdrs、fp)
    urllib.error.HTTPError:HTTP错误403:禁止
    
    可能会帮助您

    我做了这个,但从未完成(最终目的是让它每天自动运行)

    但为了不成为那种拖延回答的人,以下是你感兴趣的一段代码:

        def downloadimg(self):
            import datetime
            imgurl = self.getdailyimg();
            imgfilename = datetime.datetime.today().strftime('%Y%m%d') + '_' + imgurl.split('/')[-1]
            with open(IMGFOLDER + imgfilename, 'wb') as f:
                f.write(self.readimg(imgurl))
    
    希望它能帮到你

    已编辑

    PS:使用python3

    完整脚本

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    import os
    IMGFOLDER = os.getcwd() + '/images/'
    
    
    class BingImage(object):
        """docstring for BingImage"""
        BINGURL = 'http://www.bing.com/'
        JSONURL = 'HPImageArchive.aspx?format=js&idx=0&n=1&mkt=pt-BR'
        LASTIMG = None
    
        def __init__(self):
            super(BingImage, self).__init__()
            try:
                self.downloadimg()
            except:
                pass
    
        def getdailyimg(self):
            import json
            import urllib.request
            with urllib.request.urlopen(self.BINGURL + self.JSONURL) as response:
                rawjson = response.read().decode('utf-8')
                parsedjson = json.loads(rawjson)
                return self.BINGURL + parsedjson['images'][0]['url'][1:]
    
        def downloadimg(self):
            import datetime
            imgurl = self.getdailyimg();
            imgfilename = datetime.datetime.today().strftime('%Y%m%d') + '_' + imgurl.split('/')[-1]
            with open(IMGFOLDER + imgfilename, 'wb') as f:
                f.write(self.readimg(imgurl))
            self.LASTIMG = IMGFOLDER + imgfilename
    
        def checkfolder(self):
            d = os.path.dirname(IMGFOLDER)
            if not os.path.exists(d):
                os.makedirs(d)
    
        def readimg(self, url):
            import urllib.request
            with urllib.request.urlopen(url) as response:
                return response.read()
    
    
    def DefineBackground(src):
        import platform
        if platform.system() == 'Linux':
            MAINCMD = "gsettings set org.gnome.desktop.background picture-uri"
            os.system(MAINCMD + ' file://' + src)
    
    
    def GetRandomImg():
        """Return a random image already downloaded from the images folder"""
        import random
        f = []
        for (dirpath, dirnames, filenames) in os.walk(IMGFOLDER):
            f.extend(filenames)
            break
        return IMGFOLDER + random.choice(f)
    
    
    if __name__ == '__main__':
        # get a new today's image from Bing
        img = BingImage()
        # check whether a new image was get or not
        if(img.LASTIMG):
            DefineBackground(img.LASTIMG)
        else:
            DefineBackground(GetRandomImg())
        print('Background defined')
    
    希望这有帮助

    import requests
    with open('FB_IMG_1490534565948.jpg', 'wb') as f:
        f.write(requests.get('https://url/to/image.jpg').content)
    

    下面是这个问题的最新答案,我使用openCV存储图像和请求模块,它还将处理批处理操作,并可以作为公共代码添加

    import numpy as np
    from urllib.request import urlopen
    import cv2
    import os
    current_path = os.getcwd()
    try: os.mkdir(current_path + "\\Downloaded\\")
    except:pass
    
    def downloadImage(url):
        try:
            print("Downloading %s" % (url))
            image_name = str(url).split('/')[-1]
            resp = urlopen(url)
            image = np.asarray(bytearray(resp.read()), dtype="uint8")
            image = cv2.imdecode(image, cv2.IMREAD_COLOR)
            cv2.imwrite(current_path + "\\Downloaded\\" + image_name, image)
        except Exception as error:
            print(error)
    
    if __name__ == '__main__':
        urls = ["https://www.google.com/logos/doodles/2019/st-georges-day-2019-6234830302871552.20-2x.png"]
        for url in urls:
            downloadImage(url)
    

    参考以下链接,即使使用@Thiago Cardoso的解决方案,您也可能会遇到“HTTP错误403:禁止”错误,因为服务器不知道请求来自何方。一些网站会验证UserAgent以防止异常访问。所以你应该提供你的假浏览器访问的信息

    因此,我将代码readimg方法修改为:-

    def readimg(self, img_url):
        from urllib.request import urlopen, Request
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'}
        req = Request(url=img_url, headers=headers) 
        with urlopen(req) as response:
            return response.read()
    

    你需要做一个用户代理。 这可能是一个服务器安全功能,它阻止未知的用户代理

    如果使用设置已知的浏览器用户代理,则会起作用

    def download_img(img_url, img_name):
        request = Request(img_url, headers={'User-Agent': 'Mozilla/5.0'})
        response = urlopen(request)
        with open(img_name, "wb") as f:
           f.write(response.read())
    

    你在这篇文章上试过这个解决方案吗@我试过了。当遇到此URL时,它还会引发403 HTTP错误。确定您没有阅读我上面发布的链接。。。这是我从Bing网站下载今日图片的课程的一部分。。。您真正感兴趣的是如何打开一个文件(注意
    'wb'
    部分,使其不仅可写,而且可二进制模式),并将原始内容保存在其中