Python 如何将图片从网站保存到本地文件夹
我需要将此网站的图片保存到文件夹中: 我尝试过使用导入操作系统Python 如何将图片从网站保存到本地文件夹,python,html,python-3.x,web-scraping,beautifulsoup,Python,Html,Python 3.x,Web Scraping,Beautifulsoup,我需要将此网站的图片保存到文件夹中: 我尝试过使用导入操作系统 from lxml import html from urllib.request import urlopen from bs4 import BeautifulSoup as bs class ImageScraper: def __init__(self, url, download_path): self.url = url self.download_path = downlo
from lxml import html
from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
class ImageScraper:
def __init__(self, url, download_path):
self.url = url
self.download_path = download_path
self.session = requests.Session()
def scrape_images(self):
html = urlopen(url)
bs4 = bs(html, 'html.parser')
images = bs4.find_all('img', {})
scraper = ImageScraper(url="http://www.photobirdireland.com/garden-birds.html")
scraper.scrape_images()
f = open('Users/Lu/Desktop/Images','wb') # folder
f.write(img)
f.close()
但我没有得到任何结果或错误
我很确定代码中有些东西不起作用
请你看一下,告诉我怎么了 这个html=urlopen(url)
应该是html=urlopen(self.url)
编辑:您可以像这样获取URL
def scrape_images(self):
html = urlopen(selfurl)
bs4 = bs(html, 'html.parser')
urls = []
for img in bs4.find_all('img'):
urls.append(img.attrs.get("src"))
return urls
下一步是找到如何下载它们。尝试以下代码下载图像。使用
urlretrieve
将图像src值下载到位置
from urllib.request import urlretrieve
import requests
from bs4 import BeautifulSoup
import os
url='http://www.photobirdireland.com/garden-birds.html'
data=requests.get(url).text
soup=BeautifulSoup(data,"html.parser")
images=['http://www.photobirdireland.com/'+ image['src'] for image in soup.find_all('img')]
for img in images:
urlretrieve(img,os.path.basename(img))
您的代码不完整,第一次在images=bs4上运行循环。查找所有('img',{})
范例
完整的代码应该如下所示-
嗨,费德,谢谢你的回答。我已经按照你的建议试过了,但不幸的是它仍然不起作用。它没有在文件夹中保存任何内容。好的,我的命令是对“图像”中没有得到任何内容的修复。要保存图片,首先要从每个img中检索
src
,然后下载存储在该地址中的图像,然后将该图像保存到文件中。谢谢你@Fede Calendino。当我遇到困难时,我正试图弄明白怎么做。你的意思是:图像。通过在图像上循环,附加(图像['src'])打印(图像['src'])
?我得到了以下错误:TypeError:string索引必须是整数,在scraper.scrape_images()
中,我认为我的定义或我使用的def\uu init\uuuuu
是错误的,因为它没有为你节省太多@KunduK。这就是我要找的。如何将文件存储在单独的文件夹(与保存代码的文件夹不同)中?(我以前从未保存过文件或使用过路径/目录)@LucaDiMauro:如果你在windows上像这样工作urlretrieve(img,“D:\KK/”+os.path.basename(img))
或urlretrieve(img,“Users/Lu/Desktop/Images/”+os.path.basename(img))
for image in images:
# get the img url
img_url = image.get('src').replace('\\', '/')
real_url = "http://www.photobirdireland.com/" + img_url
# get the image name
img_name = str(img_url.split('/')[-1])
# now download the image using - import urllib.request & import os
print("downloading {}".format(img_url))
urllib.request.urlretrieve(real_url, os.path.join(path, img_name))
import os
import urllib.request
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup as Bs
class ImageScraper:
def __init__(self, url, download_path):
self.url = url
self.download_path = download_path
self.session = requests.Session()
def scrape_images(self):
path = self.download_path
html = urlopen(self.url)
bs4 = Bs(html, 'html.parser')
images = bs4.find_all('img', {})
for image in images:
# get the img url
img_url = image.get('src').replace('\\', '/')
real_url = "http://www.photobirdireland.com/" + img_url
print(real_url)
# get the image name
img_name = str(img_url.split('/')[-1])
print(img_name)
print("downloading {}".format(img_url))
urllib.request.urlretrieve(real_url, os.path.join(path, img_name))
scraper = ImageScraper(
url="http://www.photobirdireland.com/garden-birds.html", download_path=r"D:\Temp\Images")
scraper.scrape_images()