Deep learning 使用Python从CSV文件中的URL下载图像_Deep Learning_Imagedownload_File Organization

Deep learning 使用Python从CSV文件中的URL下载图像

deep-learning

Deep learning 使用Python从CSV文件中的URL下载图像,deep-learning,imagedownload,file-organization,Deep Learning,Imagedownload,File Organization,我有下面的代码，应该从csv文件中给定的URL将图像下载到指定的目录中。所有目录都已设置 with open('images.csv') as csv_file: csv_reader = csv.reader(csv_file, delimiter=',') next(csv_reader) for row in csv_reader: basename = os.path.basename(urlparse(row[0]).path)

我有下面的代码，应该从csv文件中给定的URL将图像下载到指定的目录中。所有目录都已设置

with open('images.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    next(csv_reader)
    for row in csv_reader:
        basename = os.path.basename(urlparse(row[0]).path)
        filename = '{}/{}/{}'.format(row[2], row[1], basename)
        urllib.request.urlretrieve(row[0], filename)

csv文件的组织方式如下：

http://farm2.static.flickr.com/1245/1259825348_6a2aa94e8d.jpg,cat,train
http://farm1.static.flickr.com/146/350588612_d84d71cc59.jpg,cat,test
http://farm1.static.flickr.com/32/99029168_940da3a1e5.jpg,cat,val

但是当我执行代码时，我得到了以下错误。我今天才知道如何使用python从URL下载图像，所以我非常感谢大家在这方面的帮助

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-36-6e201d3625d3> in <module>
      5         basename = os.path.basename(urlparse(row[0]).path)
      6         filename = '{}/{}/{}'.format(row[2], row[1], basename)
----> 7         urllib.request.urlretrieve(row[0], filename)

~\Anaconda3\lib\urllib\request.py in urlretrieve(url, filename, reporthook, data)
    245     url_type, path = splittype(url)
    246 
--> 247     with contextlib.closing(urlopen(url, data)) as fp:
    248         headers = fp.info()
    249 

~\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~\Anaconda3\lib\urllib\request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~\Anaconda3\lib\urllib\request.py in error(self, proto, *args)
    561             http_err = 0
    562         args = (dict, proto, meth_name) + args
--> 563         result = self._call_chain(*args)
    564         if result:
    565             return result

~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~\Anaconda3\lib\urllib\request.py in http_error_302(self, req, fp, code, msg, headers)
    753         fp.close()
    754 
--> 755         return self.parent.open(new, timeout=req.timeout)
    756 
    757     http_error_301 = http_error_303 = http_error_307 = http_error_302

~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~\Anaconda3\lib\urllib\request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~\Anaconda3\lib\urllib\request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

---------------------------------------------------------------------------
HTTPError回溯（最近一次调用上次）
在里面
5 basename=os.path.basename（URLPASSE（行[0]）.path）
6文件名='{}/{}/{}'。格式（第[2]行，第[1]行，基本名称）
---->7 urllib.request.urlretrieve（第[0]行，文件名）
urlretrieve中的~\Anaconda3\lib\urllib\request.py（url、文件名、reporthook、数据）
245 url_类型，路径=拆分类型（url）
246
-->247将contextlib.closing（urlopen（url，data））作为fp:
248头文件=fp.info（）
249
urlopen中的~\Anaconda3\lib\urllib\request.py（url、数据、超时、cafile、capath、cadefault、上下文）
220其他：
221开瓶器=_开瓶器
-->222返回opener.open（url、数据、超时）
223
224 def安装_开启器（开启器）：
~\Anaconda3\lib\urllib\request.py处于打开状态（self、fullurl、数据、超时）
529用于self.process\u response.get（协议，[]）中的处理器：
530 meth=getattr（处理器，meth\u名称）
-->531响应=方法（请求，响应）
532
533返回响应
http\u响应中的~\Anaconda3\lib\urllib\request.py（self、request、response）
639如果不是（200 641“http”、请求、响应、代码、消息、hdrs）
642
643返回响应
~\Anaconda3\lib\urllib\request.py出错（self、proto、*args）
561 http_err=0
562 args=（dict，proto，meth_name）+args
-->563结果=自调用链（*args）
564如果结果：
565返回结果
调用链中的~\Anaconda3\lib\urllib\request.py（self、chain、kind、meth\u name、*args）
501对于处理程序中的处理程序：
502 func=getattr（处理程序，方法名称）
-->503结果=函数（*args）
504如果结果不是无：
505返回结果
http\u error\u 302中的~\Anaconda3\lib\urllib\request.py（self、req、fp、code、msg、headers）
753 fp.close（）
754
-->755返回self.parent.open（新建，超时=请求超时）
756
757 http_error\u 301=http_error\u 303=http_error\u 307=http_error\u 302
~\Anaconda3\lib\urllib\request.py处于打开状态（self、fullurl、数据、超时）
529用于self.process\u response.get（协议，[]）中的处理器：
530 meth=getattr（处理器，meth\u名称）
-->531响应=方法（请求，响应）
532
533返回响应
http\u响应中的~\Anaconda3\lib\urllib\request.py（self、request、response）
639如果不是（200 641“http”、请求、响应、代码、消息、hdrs）
642
643返回响应
~\Anaconda3\lib\urllib\request.py出错（self、proto、*args）
567如果http_错误：
568参数=（dict，“default”，“http\u error\u default”）+原始参数
-->569返回自调用链（*args）
570
571#XXX可能还想要一个知道何时生产的抽象工厂
调用链中的~\Anaconda3\lib\urllib\request.py（self、chain、kind、meth\u name、*args）
501对于处理程序中的处理程序：
502 func=getattr（处理程序，方法名称）
-->503结果=函数（*args）
504如果结果不是无：
505返回结果
http\u error\u默认值中的~\Anaconda3\lib\urllib\request.py（self、req、fp、code、msg、hdrs）
647类HTTPDefaultErrorHandler（BaseHandler）：
648 def http_错误_默认值（self、req、fp、code、msg、hdrs）：
-->649 raise HTTPError（请求完整的url、代码、消息、hdrs、fp）
650
651类HTTPRedirectHandler（BaseHandler）：
HTTPError:HTTP错误404:未找到

我复制粘贴了您拥有的内容并制作了目录，并且能够下载所有cat图片。就我所知，唯一的区别是我没有使用蟒蛇；我使用的是python3上的venv，所以我不得不用

urllib.parse.urlparse

替换

urlparse

。当我运行代码时，它设法下载了前15个URL，直到出现这个错误。在这种情况下，由于这是一个404错误，我猜那张图片的URL就是问题所在。尝试手动转到浏览器中有问题的URL，看看是否可以找到它。谢谢您的帮助！csv文件中有一些带有HTTPError 404和410的图像。我复制粘贴了您拥有的内容并制作了目录，并且能够下载所有cat图片。就我所知，唯一的区别是我没有使用蟒蛇；我使用的是python3上的venv，所以我不得不用

urllib.parse.urlparse

替换

urlparse

。当我运行代码时，它设法下载了前15个URL，直到出现这个错误。在这种情况下，由于这是一个404错误，我猜那张图片的URL就是问题所在。尝试手动转到浏览器中有问题的URL，看看是否可以找到它。谢谢您的帮助！csv文件中有一些带有HTTPError 404和410的图像。