Python从HTTPS aspx下载图像_Python_Asp.net_Https_Python Requests_Binaryfiles

Python从HTTPS aspx下载图像

python asp.net https

Python从HTTPS aspx下载图像,python,asp.net,https,python-requests,binaryfiles,Python,Asp.net,Https,Python Requests,Binaryfiles,我正在尝试从NASS案例查看器下载一些图像。一个例子是本例中指向图像查看器的链接为这可能是不可见的，我想是因为https。但是，这只是前面的第二幅图像图像的实际链接是（或应该是？）这将只是下载aspx二进制文件我的问题是，我不知道如何将这些二进制文件存储到正确的jpg文件中我尝试过的代码示例是 import requests test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?I

我正在尝试从NASS案例查看器下载一些图像。一个例子是

本例中指向图像查看器的链接为

这可能是不可见的，我想是因为https。但是，这只是前面的第二幅图像

图像的实际链接是（或应该是？）

这将只是下载aspx二进制文件

我的问题是，我不知道如何将这些二进制文件存储到正确的jpg文件中

我尝试过的代码示例是

import requests 
test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&amp;ImageID=497001669&amp;CaseID=149006692&amp;Version=1"
pull_image = requests.get(test_image)

with open("test_image.jpg", "wb+") as myfile:
    myfile.write(str.encode(pull_image.text))

但这并不能生成正确的jpg文件。我还检查了

pull\u image.raw.read（）

，发现它是空的

这里可能有什么问题？我的URL不正确吗？我使用Beautifulsoup将这些URL放在一起，并通过检查几页中的HTML代码来查看它们

我是否保存的二进制文件不正确？

.text

将响应内容解码为字符串，因此您的imge文件将被损坏。
相反，您应该使用保存二进制响应内容的

import requests 

test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&amp;ImageID=497001669&amp;CaseID=149006692&amp;Version=1"
pull_image = requests.get(test_image)

with open("test_image.jpg", "wb+") as myfile:
    myfile.write(pull_image.content)

.raw.read（）

也返回字节，但要使用它，必须将

流

参数设置为

True

pull_image = requests.get(test_image, stream=True)
with open("test_image.jpg", "wb+") as myfile:
    myfile.write(pull_image.raw.read())

我想跟进@t.m.adam的回答，为任何有兴趣将这些数据用于自己项目的人提供一个完整的答案

下面是我的代码，用于提取案例ID示例的所有图像。这是一个相当不干净的代码，但我认为它为您提供了入门所需的内容

import requests
from bs4 import BeautifulSoup
from tqdm import tqdm


CaseIDs = [149006673, 149006651, 149006672, 149006673, 149006692, 149006693]

url_part1 = 'https://www-nass.nhtsa.dot.gov/nass/cds/'

data = []
with requests.Session() as sesh:
    for caseid in tqdm(CaseIDs):
        url_full = f"https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?ViewText&CaseID={caseid}&xsl=textonly.xsl&websrc=true"
        #print(url_full)
        source = sesh.get(url_full).text
        soup = BeautifulSoup(source, 'lxml')
        tr_tags = soup.find_all('tr',  style="page-break-after: always")
        for tag in tr_tags:
            #print(tag)
            """
            try:
                vehicle = [x for x  in tag.text.split('\n') if 'Vehicle' in x][0] ## return the first element
            except IndexError:
                vehicle = [x for x  in tag.text.split('\n') if 'Scene' in x][0] ## return the first element
            """
            tag_list = tag.find_all('tr', class_ = 'label')
            test = [x.find('td').text for x in tag_list]
            #print(test)
            img_id, img_type, part_name = test
            img_id = img_id.replace(":", "")
            img = tag.find('img')
            #part_name = img.get('alt').replace(":", "").replace("/", "")
            part_name = part_name.replace(":", "").replace("/", "")
            image_name = " ".join([img_type, part_name, img_id]) + ".jpg"
            url_src = img.get('src')
            img_url =  url_part1 + url_src
            print(img_url)
            pull_image = sesh.get(img_url, stream=True)
            with open(image_name, "wb+") as myfile:
                myfile.write(pull_image.content)