Selenium webdriver Python3.6图像crwaling_Selenium Webdriver_Web Crawler_Python 3.6

Selenium webdriver Python3.6图像crwaling

selenium-webdriver web-crawler

Selenium webdriver Python3.6图像crwaling,selenium-webdriver,web-crawler,python-3.6,Selenium Webdriver,Web Crawler,Python 3.6,我正在从谷歌图像搜索中抓取图像我试过了 1.使用Selenium打开Chrome驱动程序 2.向下滚动至末尾 3.使用BeautifulSoup获取图像URL并保存图像但这是一个问题，因为图像太小了所以我发现有一个原始图像src 它位于irc_mi image类的src（以“.jpg”结尾）中但我不知道怎么把它拔出来我尝试使用find_all作为类名，但失败了我该怎么办这里是源代码 def Remainder_All_ImagesURLs_Google(searchText):

我正在从谷歌图像搜索中抓取图像

我试过了

1.使用Selenium打开Chrome驱动程序

2.向下滚动至末尾

3.使用BeautifulSoup获取图像URL并保存图像

但这是一个问题，因为图像太小了

所以我发现有一个原始图像src

它位于irc_mi image类的src（以“.jpg”结尾）中

但我不知道怎么把它拔出来

我尝试使用find_all作为类名，但失败了

我该怎么办

这里是源代码

def Remainder_All_ImagesURLs_Google(searchText):

 def scroll_page():
    for i in range(7): 
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        sleep(3) 

 def click_button():
    more_imgs_button_xpath = "//*[@id='smb']" 
    element = driver.find_element_by_xpath(more_imgs_button_xpath)
    element.click()
    sleep(3)


 def create_soup():
    html_source = driver.page_source
    soup = BeautifulSoup(html_source, 'html.parser')
    return soup


 def find_imgs():
    soup = create_soup() 
    imgs_urls = [] 
    for img in soup.find_all('img'):
        try:
            if img['src'].startswith('http'): 
                imgs_urls.append(img['src'])
        except: 
            pass

    return imgs_urls


 driver = webdriver.Chrome('C:/chromedriver.exe')

 driver.maximize_window()
 sleep(2)


 searchUrl = "https://www.google.com/search?q={}&site=webhp&tbm=isch".format(searchText)


 driver.get(searchUrl)

 try:
    scroll_page()
    click_button()
    scroll_page()


 except:
    click_button()
    scroll_page()

 imgs_urls = find_imgs()

 driver.close()

 return(imgs_urls)

def download_image(url,filename):
  full_name = str(filename)  + ".jpg"
  urllib.request.urlretrieve(url, 'C:/Python/Picture' +  full_name)

问题是Beauty soup找不到图像的任何源或href，因为它是一个基于java脚本的函数，返回源（src），因此我建议使用selenium单击图像标记，等待图像src并提取它使用

然后搜索图像src

element=driver.find_element_by_class_name("some_class")
element.click()