Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/286.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
通过python脚本从Wikipedia下载图像时出错_Python_Web Crawler_Beautifulsoup - Fatal编程技术网

通过python脚本从Wikipedia下载图像时出错

通过python脚本从Wikipedia下载图像时出错,python,web-crawler,beautifulsoup,Python,Web Crawler,Beautifulsoup,我正在尝试下载特定维基百科页面的所有图像。下面是代码片段 from bs4 import BeautifulSoup as bs import urllib2 import urlparse from urllib import urlretrieve site="http://en.wikipedia.org/wiki/Pune" hdr= {'User-Agent': 'Mozilla/5.0'} outpath="" req = urllib2.Request(site,headers=

我正在尝试下载特定维基百科页面的所有图像。下面是代码片段

from bs4 import BeautifulSoup as bs
import urllib2
import urlparse
from urllib import urlretrieve

site="http://en.wikipedia.org/wiki/Pune"
hdr= {'User-Agent': 'Mozilla/5.0'}
outpath=""
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup =bs(page)
tag_image=soup.findAll("img")
for image in tag_image:
        print "Image: %(src)s" % image
        urlretrieve(image["src"], "/home/mayank/Desktop/test") 
运行程序后,我看到以下堆栈出现错误

Image: //upload.wikimedia.org/wikipedia/commons/thumb/0/04/Pune_Montage.JPG/250px-Pune_Montage.JPG
Traceback (most recent call last):
  File "download_images.py", line 15, in <module>
    urlretrieve(image["src"], "/home/mayank/Desktop/test")
  File "/usr/lib/python2.7/urllib.py", line 93, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "/usr/lib/python2.7/urllib.py", line 239, in retrieve
    fp = self.open(url, data)
  File "/usr/lib/python2.7/urllib.py", line 207, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.7/urllib.py", line 460, in open_file
    return self.open_ftp(url)
  File "/usr/lib/python2.7/urllib.py", line 543, in open_ftp
    ftpwrapper(user, passwd, host, port, dirs)
  File "/usr/lib/python2.7/urllib.py", line 864, in __init__
    self.init()
  File "/usr/lib/python2.7/urllib.py", line 870, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/usr/lib/python2.7/ftplib.py", line 132, in connect
    self.sock = socket.create_connection((self.host, self.port), self.timeout)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
IOError: [Errno ftp error] [Errno 111] Connection refused
Image://upload.wikimedia.org/wikipedia/commons/thumb/0/04/Pune_Montage.JPG/250px-Pune_Montage.JPG
回溯(最近一次呼叫最后一次):
文件“download_images.py”,第15行,在
urlretrieve(图像[“src”],“/home/mayank/Desktop/test”)
文件“/usr/lib/python2.7/urllib.py”,第93行,在urlretrieve中
return _urlopener.retrieve(url、文件名、reporthook、数据)
文件“/usr/lib/python2.7/urllib.py”,第239行,检索
fp=self.open(url、数据)
文件“/usr/lib/python2.7/urllib.py”,第207行,打开
返回getattr(self,name)(url)
文件“/usr/lib/python2.7/urllib.py”,第460行,在open_文件中
返回self.open_ftp(url)
文件“/usr/lib/python2.7/urllib.py”,第543行,在OpenFTP中
ftpwrapper(用户、密码、主机、端口、目录)
文件“/usr/lib/python2.7/urllib.py”,第864行,在__
self.init()
文件“/usr/lib/python2.7/urllib.py”,第870行,在init中
self.ftp.connect(self.host、self.port、self.timeout)
文件“/usr/lib/python2.7/ftplib.py”,第132行,在connect中
self.sock=socket.create_连接((self.host,self.port),self.timeout)
文件“/usr/lib/python2.7/socket.py”,第571行,在create_connection中
提出错误
IOError:[Errno ftp error][Errno 111]连接被拒绝

请提供有关导致此错误的原因的帮助?

/
是当前协议的缩写。Wikipedia似乎在使用速记,因此您必须显式指定HTTP而不是FTP(Python出于某种原因假定为FTP):


谢谢@Blender:这解决了我的问题,不过我想补充一点,这样如果有人提到这个问题,他就不会被误导。附加http和镜像;I don’我不能像回答中提到的那样工作。相反,我是这样做的:urlretrieve('http:'+image[“src”],outpath)
for image in tag_image:
    src = 'http:' + image