如何使用python 3.xx进行图像爬行
我正在尝试用Python进行图像爬行。 一个映像爬网成功,但多个映像爬网失败如何使用python 3.xx进行图像爬行,python,python-3.x,Python,Python 3.x,我正在尝试用Python进行图像爬行。 一个映像爬网成功,但多个映像爬网失败 #-*- coding: utf-8 -*- from bs4 import BeautifulSoup import urllib.request import random from array import* def download_image(url): name = random.randrange(1, 1000) full_name = str(name) + ".jpg" u
#-*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import urllib.request
import random
from array import*
def download_image(url):
name = random.randrange(1, 1000)
full_name = str(name) + ".jpg"
urllib.request.urlretrieve(url, full_name)
if __name__ == "__main__":
print("Crawling!!!!!!!!!!!!!!!")
hdr = {'User-Agent': 'Mozilla/5.0', 'referer' : 'http://m.naver.com'}
req = urllib.request.Request("https://www.google.co.kr/search hl=ko&site=imghp&tbm=isch&source=hp&biw=1600&bih=770&q=sad",headers=hdr)
data = urllib.request.urlopen(req).read()
bs = BeautifulSoup(data, 'html.parser')
imgs = bs.findAll(name='img')
for img in imgs:
temp = img.get('src')
download_image(temp)
这就是错误:
Crawling!!!!!!!!!!!!!!!
Traceback (most recent call last):
File "C:/Users/ajh46/PycharmProjects/untitled1/Crawling.py", line 25, in
<module>
download_image(temp)
File "C:/Users/ajh46/PycharmProjects/untitled1/Crawling.py", line 10, in
download_image
urllib.request.urlretrieve(url, full_name)
File "C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 511, in open
req = Request(fullurl, data)
File "C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 329, in __init__
self.full_url = url
File "C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 355, in full_url
self._parse()
File "C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 384, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
爬行!!!!!!!!!!!!!!!
回溯(最近一次呼叫最后一次):
文件“C:/Users/ajh46/PycharmProjects/untitled1/Crawling.py”,第25行,在
下载图片(临时)
文件“C:/Users/ajh46/PycharmProjects/untitled1/Crawling.py”,第10行,在
下载图片
urllib.request.urlretrieve(url,全名)
文件“C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py”,URLLRetrieve中的第248行
使用contextlib.closing(urlopen(url,data))作为fp:
文件“C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py”,urlopen中的第223行
返回opener.open(url、数据、超时)
文件“C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py”,第511行,打开
req=请求(完整URL、数据)
文件“C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py”,第329行,在__
self.full_url=url
文件“C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py”,第355行,完整url
self._parse()
文件“C:\Users\ajh46\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py”,第384行,在
提升值错误(“未知url类型:%r”%self.full\u url)
目标URL返回一个
urllib.request
因此引发错误。目标URL返回一个错误
urllib.request
因此引发错误。404错误。尝试在浏览器404错误中访问该网页。尝试在浏览器中访问该网页