Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用selenium向下滚动谷歌评论_Python_Selenium_Screen Scraping - Fatal编程技术网

Python 使用selenium向下滚动谷歌评论

Python 使用selenium向下滚动谷歌评论,python,selenium,screen-scraping,Python,Selenium,Screen Scraping,我正试图从这个链接中获取评论: 对于我正在使用以下代码加载页面的内容 from selenium import webdriver import datetime import time import argparse import os import time #Define the argument parser to read in the URL url = "https://www.google.com/search?q=google+reviews+2nd+chance+tre

我正试图从这个链接中获取评论:

对于我正在使用以下代码加载页面的内容

from selenium import webdriver
import datetime
import time
import argparse
import os
import time

#Define the argument parser to read in the URL

url = "https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1"


# Initialize the Chrome webdriver and open the URL
#driver = webdriver.Chromium()


profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko")
#driver = webdriver.Firefox(profile)
# https://stackoverflow.com/questions/22476112/using-chromedriver-with-selenium-python-ubuntu
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")

driver.get(url)

driver.implicitly_wait(2)



SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

页面加载很好,没有向下滚动,我在其他网站(如linkedn)上使用了相同的代码,它在那里工作。

以下是不使用javascript向下滚动的逻辑。使用将滚动到元素的
location\u once\u滚动到\u view
方法简单有效

作为下面逻辑的一部分,我们滚动到最后一次审阅,然后检查是否按照请求加载了所需的审阅数量

需要导入:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
根据您在以下代码中的要求更改
desiredReviewCount
变量值

wait = WebDriverWait(driver,10)
url = "https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1"
driver.get(url)
x=0
desiredReviewsCount=30
wait.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']")))
while x<desiredReviewsCount:
    driver.find_element_by_xpath("(//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review'])[last()]").location_once_scrolled_into_view
    x = len(driver.find_elements_by_xpath("//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']"))

print (len(driver.find_elements_by_xpath("//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']")))
wait=WebDriverWait(驱动程序,10)
url=”https://www.google.com/search?q=google+评论+第二次+机会+治疗+第40次+街道&rlz=1cjzap_enUS697US697&oq=google+评论+第二次+机会+治疗+40次+街道&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8\lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1“
获取驱动程序(url)
x=0
DesiredReviewCount=30
等待.until(EC.presence_of_all_elements_located((By.XPATH,“//div[@class='gws-localreviews\uu general-reviews-block']//div[@class='WMbnJf gws-localreviews\uu google-review']))

当您向下滚动以加载页面上的任何元素时?是的,我需要滚动以获取所有评论。请检查下面的答案,并让我知道进展如何。不确定您何时说
all
,这就是为什么我在脚本中提供了
DesiredReviewCount
选项。这对我不起作用,它似乎将侧栏移动到初始页面的末尾,但它不会强制加载下面的评论。您使用的是哪个版本的FF和selenium。我没有看到任何问题(附上gif)。对于低质量的gif表示抱歉,由于屏幕截图大小限制,我无法上传高质量的屏幕截图。我正在linux mint上使用selenium-3.141和chromium 73。你用的是哪个版本?啊哈。。。问题似乎在于chrome浏览器(能够在chrome中重现您的问题)FF没有这个问题。你能换成FF吗?或者你想让我深入了解这个问题吗?没关系,我可以换成FF。谢谢