如何在Python3中下载带有https URL的图像？_Python_Image_Https_Download

如何在Python3中下载带有https URL的图像？

python image https download

如何在Python3中下载带有https URL的图像？,python,image,https,download,Python,Image,Https,Download,我尝试用Python编写一个简短的大规模下载脚本，在本地存储图像列表对于http图像url，它工作得非常好，但是无法下载任何带有httpsurl的图像。所涉及的代码行为： import urllib.request urllib.request.urlretrieve(url, filename) 比如说,， https://cdn.discordapp.com/attachments/299398003486097412/303580387786096641/FB_IMG_14905345

我尝试用Python编写一个简短的大规模下载脚本，在本地存储图像列表

对于

http

图像url，它工作得非常好，但是无法下载任何带有

https

url的图像。所涉及的代码行为：

import urllib.request
urllib.request.urlretrieve(url, filename)

比如说,，

https://cdn.discordapp.com/attachments/299398003486097412/303580387786096641/FB_IMG_1490534565948.jpg

会导致

HTTP错误403:禁止

，以及任何其他

https

图像

这给我留下了两个问题：

如何使用Python下载这样的图像

如果图像基本上只是文件，为什么它们甚至有

https

url

编辑：以下是堆栈跟踪：

Traceback (most recent call last):
  File "img_down.py", line 52, in <module>
    main()
  File "img_down.py", line 38, in main
    save_img(d, l)
  File "img_down.py", line 49, in save_img
    stream = read_img(url)
  File "img_down.py", line 42, in read_img
    with urllib.request.urlopen(url) as response:
  File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "D:\Users\Jan\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

回溯（最近一次呼叫最后一次）：
文件“img_down.py”，第52行，在
main（）
文件“img_down.py”，第38行，主目录
保存图像（d，l）
文件“img\u down.py”，第49行，在save\u img中
stream=read\u img（url）
文件“img_down.py”，第42行，在read_img中
使用urllib.request.urlopen（url）作为响应：
文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”，urlopen中的第223行
返回opener.open（url、数据、超时）
文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”，第532行，处于打开状态
响应=方法（请求，响应）
文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”，第642行，在http\U响应中
“http”、请求、响应、代码、消息、hdrs）
文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”第570行出错
返回自我。调用链（*args）
文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”，第504行，在调用链中
结果=func（*args）
文件“D:\Users\Jan\AppData\Local\Programs\Python36-32\lib\urllib\request.py”，第650行，默认为http\u error\u
raise HTTPError（请求完整的url、代码、消息、hdrs、fp）
urllib.error.HTTPError:HTTP错误403:禁止

可能会帮助您

我做了这个，但从未完成（最终目的是让它每天自动运行）

但为了不成为那种拖延回答的人，以下是你感兴趣的一段代码：

    def downloadimg(self):
        import datetime
        imgurl = self.getdailyimg();
        imgfilename = datetime.datetime.today().strftime('%Y%m%d') + '_' + imgurl.split('/')[-1]
        with open(IMGFOLDER + imgfilename, 'wb') as f:
            f.write(self.readimg(imgurl))

希望它能帮到你

已编辑

PS:使用python3

完整脚本

#!/usr/bin/python
# -*- coding: utf-8 -*-

import os
IMGFOLDER = os.getcwd() + '/images/'


class BingImage(object):
    """docstring for BingImage"""
    BINGURL = 'http://www.bing.com/'
    JSONURL = 'HPImageArchive.aspx?format=js&idx=0&n=1&mkt=pt-BR'
    LASTIMG = None

    def __init__(self):
        super(BingImage, self).__init__()
        try:
            self.downloadimg()
        except:
            pass

    def getdailyimg(self):
        import json
        import urllib.request
        with urllib.request.urlopen(self.BINGURL + self.JSONURL) as response:
            rawjson = response.read().decode('utf-8')
            parsedjson = json.loads(rawjson)
            return self.BINGURL + parsedjson['images'][0]['url'][1:]

    def downloadimg(self):
        import datetime
        imgurl = self.getdailyimg();
        imgfilename = datetime.datetime.today().strftime('%Y%m%d') + '_' + imgurl.split('/')[-1]
        with open(IMGFOLDER + imgfilename, 'wb') as f:
            f.write(self.readimg(imgurl))
        self.LASTIMG = IMGFOLDER + imgfilename

    def checkfolder(self):
        d = os.path.dirname(IMGFOLDER)
        if not os.path.exists(d):
            os.makedirs(d)

    def readimg(self, url):
        import urllib.request
        with urllib.request.urlopen(url) as response:
            return response.read()


def DefineBackground(src):
    import platform
    if platform.system() == 'Linux':
        MAINCMD = "gsettings set org.gnome.desktop.background picture-uri"
        os.system(MAINCMD + ' file://' + src)


def GetRandomImg():
    """Return a random image already downloaded from the images folder"""
    import random
    f = []
    for (dirpath, dirnames, filenames) in os.walk(IMGFOLDER):
        f.extend(filenames)
        break
    return IMGFOLDER + random.choice(f)


if __name__ == '__main__':
    # get a new today's image from Bing
    img = BingImage()
    # check whether a new image was get or not
    if(img.LASTIMG):
        DefineBackground(img.LASTIMG)
    else:
        DefineBackground(GetRandomImg())
    print('Background defined')

希望这有帮助

import requests
with open('FB_IMG_1490534565948.jpg', 'wb') as f:
    f.write(requests.get('https://url/to/image.jpg').content)

下面是这个问题的最新答案，我使用openCV存储图像和请求模块，它还将处理批处理操作，并可以作为公共代码添加

import numpy as np
from urllib.request import urlopen
import cv2
import os
current_path = os.getcwd()
try: os.mkdir(current_path + "\\Downloaded\\")
except:pass

def downloadImage(url):
    try:
        print("Downloading %s" % (url))
        image_name = str(url).split('/')[-1]
        resp = urlopen(url)
        image = np.asarray(bytearray(resp.read()), dtype="uint8")
        image = cv2.imdecode(image, cv2.IMREAD_COLOR)
        cv2.imwrite(current_path + "\\Downloaded\\" + image_name, image)
    except Exception as error:
        print(error)

if __name__ == '__main__':
    urls = ["https://www.google.com/logos/doodles/2019/st-georges-day-2019-6234830302871552.20-2x.png"]
    for url in urls:
        downloadImage(url)

参考以下链接，即使使用@Thiago Cardoso的解决方案，您也可能会遇到“HTTP错误403:禁止”错误，因为服务器不知道请求来自何方。一些网站会验证UserAgent以防止异常访问。所以你应该提供你的假浏览器访问的信息

因此，我将代码readimg方法修改为：-

def readimg(self, img_url):
    from urllib.request import urlopen, Request
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'}
    req = Request(url=img_url, headers=headers) 
    with urlopen(req) as response:
        return response.read()

你需要做一个用户代理。这可能是一个服务器安全功能，它阻止未知的用户代理

如果使用设置已知的浏览器用户代理，则会起作用

def download_img(img_url, img_name):
    request = Request(img_url, headers={'User-Agent': 'Mozilla/5.0'})
    response = urlopen(request)
    with open(img_name, "wb") as f:
       f.write(response.read())

你在这篇文章上试过这个解决方案吗@我试过了。当遇到此URL时，它还会引发403 HTTP错误。确定您没有阅读我上面发布的链接。。。这是我从Bing网站下载今日图片的课程的一部分。。。您真正感兴趣的是如何打开一个文件（注意

'wb'

部分，使其不仅可写，而且可二进制模式），并将原始内容保存在其中