Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用Python实现图像的Web抓取_Python_List_Image_Exception_Web Scraping - Fatal编程技术网

用Python实现图像的Web抓取

用Python实现图像的Web抓取,python,list,image,exception,web-scraping,Python,List,Image,Exception,Web Scraping,我正在学习Python语言,下面的代码保存URL列表中的图像。但我想在URL不是图像时忽略它,并保存.png格式 import pandas as pd import urllib.request def url_to_jpg(i, url, file_path): filename = 'image-{}.jpg'.format(i) full_path = '{}{}'.format(file_path, filename) urllib.request.u

我正在学习Python语言,下面的代码保存URL列表中的图像。但我想在URL不是图像时忽略它,并保存.png格式


import pandas as pd
import urllib.request

def url_to_jpg(i, url, file_path):


    filename = 'image-{}.jpg'.format(i)

    full_path = '{}{}'.format(file_path, filename)
    urllib.request.urlretrieve(url, full_path)

    return None

FILENAME = 'C:/Users/Home/AppData/Roaming/Microsoft/Windows/Start Menu/Programs/Python 3.8/image_url.csv'
FILE_PATH = 'C:/Users/Home/AppData/Roaming/Microsoft/Windows/Start Menu/Programs/Python 3.8/imagens2/'

urls = pd.read_csv(FILENAME)

for i, url in enumerate(urls.values):
    url_to_jpg(i, url[0], FILE_PATH)

作为pd进口熊猫
导入urllib.request
def url_至_jpg(i、url、文件路径):
文件名='image-{}.jpg'。格式(i)
完整路径=“{}{}”。格式(文件路径,文件名)
urllib.request.urlretrieve(url,完整路径)
一无所获
文件名='C:/Users/Home/AppData/Roaming/Microsoft/Windows/Start Menu/Programs/Python 3.8/image\u url.csv'
文件路径='C:/Users/Home/AppData/Roaming/Microsoft/Windows/Start Menu/Programs/Python 3.8/imagens2/'
URL=pd.read\u csv(文件名)
对于i,枚举中的url(url.values):
url_至_jpg(i,url[0],文件路径)

您可以使用标题内容类型

import urllib
request = urllib.request.urlretrieve('https://www.jhsph.edu/sebin/j/k/public-health-on-call.jpg')

request[1].__dict__
您将看到urlretrieve方法返回一个元组,第二个元素是:

{'_charset': None,
 '_default_type': 'text/plain',
 '_headers': [('Server', 'nginx/1.17.6'),
  ('Date', 'Sat, 04 Apr 2020 22:00:21 GMT'),
  ('Content-Type', 'image/jpeg'),
  ('Content-Length', '129747'),
  ('Connection', 'close'),
  ('Last-Modified', 'Wed, 04 Mar 2020 15:26:43 GMT'),
  ('ETag', '"3632864f39f2d51:0"'),
  ('X-Powered-By', 'ASP.NET'),
  ('Accept-Ranges', 'bytes')],
 '_payload': '',
 '_unixfrom': None,
 'defects': [],
 'epilogue': None,
 'policy': Compat32(),
 'preamble': None}

内容类型告诉您这是一个图像和图像的类型。因此,基于这一点,您可以决定如何保存它,以及做什么

如果你正在做网页抓取,你应该使用