用于谷歌图像下载的网页抓取,用于Python中的最少2K个图像
我正在尝试使用python脚本从Google中提取2000幅图像,但我只能下载80幅图像。有人能帮我修改下面的代码来获得2K号码吗用于谷歌图像下载的网页抓取,用于Python中的最少2K个图像,python,python-3.x,python-requests,google-image-search,web-scraping-language,Python,Python 3.x,Python Requests,Google Image Search,Web Scraping Language,我正在尝试使用python脚本从Google中提取2000幅图像,但我只能下载80幅图像。有人能帮我修改下面的代码来获得2K号码吗 import os import requests from bs4 import BeautifulSoup Google_Image = 'https://www.google.com/search?site=&tbm=isch&source=hp&biw=1873&bih=990&' u_agnt = { '
import os
import requests
from bs4 import BeautifulSoup
Google_Image = 'https://www.google.com/search?site=&tbm=isch&source=hp&biw=1873&bih=990&'
u_agnt = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
}
Image_Folder = 'Images_1'
def main():
if not os.path.exists(Image_Folder):
os.mkdir(Image_Folder)
download_images()
def download_images():
data = input('Enter your search keyword: ')
num_images = int(input('Enter the number of images you want: '))
print('Searching Images....')
search_url = Google_Image + 'q=' + data #'q=' because its a query
# request url, without u_agnt the permission gets denied
response = requests.get(search_url, headers=u_agnt)
html = response.text #To get actual result i.e. to read the html data in text mode
# find all img where class='rg_i Q4LuWd'
b_soup = BeautifulSoup(html, 'html.parser') #html.parser is used to parse/extract features from HTML files
results = b_soup.findAll('img', {'class': 'rg_i Q4LuWd'})
#extract the links of requested number of images with 'data-src' attribute and appended those links to a list 'imagelinks'
#allow to continue the loop in case query fails for non-data-src attributes
count = 0
imagelinks= []
for res in results:
try:
link = res['data-src']
imagelinks.append(link)
count = count + 1
if (count >= num_images):
break
except KeyError:
continue
print(f'Found {len(imagelinks)} images')
print('Start downloading...')
for i, imagelink in enumerate(imagelinks):
# open each image link and save the file
response = requests.get(imagelink)
imagename = Image_Folder + '/' + data + str(i+1) + '.jpg'
with open(imagename, 'wb') as file:
file.write(response.content)
print('Download Completed!')
if __name__ == '__main__':
main()
请让我知道如何解决这个问题,以及为什么在从谷歌下载图片时会出现这个限制。我是否需要使用proxycrawl之类的工具,如果需要,我如何使用它?有一个Python库可以做到这一点:@kuldeepsingsidhu,谢谢您提供的信息。但无法下载,显示此消息:“很遗憾,所有6个无法下载,因为某些图像无法下载。0是此搜索筛选器的全部内容!”代码是:从google_images_download导入google_images_download response=google_images_download。google ImagesDownload()参数={“关键字”:“狗”,“限制”:6,“格式”:“jpg”,“print_url”:True}path=response.download(参数)