使用Selenium Python从Google Play Store中提取一个特定应用程序的评论
这里我需要点击一个按钮(全文回顾)来查看全文回顾使用Selenium Python从Google Play Store中提取一个特定应用程序的评论,python,selenium,xpath,Python,Selenium,Xpath,这里我需要点击一个按钮(全文回顾)来查看全文回顾 from time import sleep from webbrowser import Chrome import selenium from bs4 import BeautifulSoup as bsoup import pandas as pd from selenium import webdriver class FindByXpathCss(): def test(self): baseUrl = "
from time import sleep
from webbrowser import Chrome
import selenium
from bs4 import BeautifulSoup as bsoup
import pandas as pd
from selenium import webdriver
class FindByXpathCss():
def test(self):
baseUrl = "https://play.google.com/store/apps/details?
id=com.delta.mobile.android&hl=en_US&showAllReviews=true"
driver = webdriver.Chrome("F:\\Chrome-webdriver\\chromedriver")
driver.maximize_window()
driver.get(baseUrl)
在这里,我们使用xpath阅读完整的评论文本,但我希望阅读该应用程序,仅此应用程序就有1200条评论。我想知道如何使用 在这里循环
fullReviewbtn = driver.find_element_by_css_selector('#fcxH9b > div.WpDbMd > c-wiz > div >
div.ZfcPIb > div > div.JNury.Ekdcne > div > div > div.W4P4ne > div:nth-child(2) > div >
div:nth-child(2) > div > div.d15Mdf.bAhLNe > div.UD7Dzf > span:nth-child(1) > div >
button').click()
sleep(1)
导入时间
从selenium导入webdriver
从selenium.webdriver.support.ui导入WebDriverWait
从selenium.webdriver.support将预期的_条件导入为EC
从selenium.webdriver.common.by导入
从selenium.common.exceptions导入TimeoutException
类FindByXpathCss():
driver=webdriver.Chrome(可执行文件\u path=r“C:\New folder\chromedriver.exe”)
驱动程序。最大化_窗口()
baseUrl=”https://play.google.com/store/apps/details?id=com.delta.mobile.android&hl=en_US&showAllReviews=true"
获取驱动程序(baseUrl)
卷轴=15
尽管如此:
卷轴-=1
执行_脚本(“window.scrollTo(0,document.body.scrollHeight)”)
时间。睡眠(3)
如果滚动小于0:
打破
elemtn=WebDriverWait(驱动程序,30)。直到(
元素是可点击的((By.XPATH,//span[contains(@class,'RveJvd snByac'))))
elemtn.click()
卷轴=5
尽管如此:
卷轴-=1
执行_脚本(“window.scrollTo(0,document.body.scrollHeight)”)
时间。睡眠(3)
如果滚动小于0:
打破
elemtn=WebDriverWait(驱动程序,30)。直到(
元素是可点击的((By.XPATH,//span[contains(@class,'RveJvd snByac'))))
elemtn.click()
reviewText=WebDriverWait(驱动程序,30)。直到(
EC.所有元素的存在((By.XPATH,“/*[@class='UD7Dzf']))
#reviewText=driver。通过xpath(“/*[@class='UD7Dzf']”)查找元素
对于reviewText中的textreview:
打印textreview.text
谢谢你的Dipak,它运行得很好,但还有一个问题,我仍然无法完全获取所有评论,我也尝试过更改xpath和css选择器,但仍然只获取了约160条评论,但仍有500多条评论需要获取,请提供帮助。。revtext=driver.find_elements_by_css_selector(‘#fcxH9b>div.wpbmd>c-wiz>div>div.ZfcPIb>div>div.JNury.Ekdcne>div>div.W4P4ne>div:n子(2)#reviewText=driver但它并没有给出应用程序的所有评论。是的,因为DOM是动态的,而且如果你正确地检查DOM,很少有评论没有与之关联的完整评论按钮。我要切碎它
elementByXpath = driver.find_element_by_xpath('//*
[@id="fcxH9b"]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div/div[2]/div/div[2]/div[2]').text
if elementByXpath is not None:
print("We found an element using Xpath")
#Review = elementByXpath.get_attribute("Review")
print(elementByXpath)
driver.close()
ff = FindByXpathCss()
ff.test()
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
class FindByXpathCss():
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
driver.maximize_window()
baseUrl = "https://play.google.com/store/apps/details?id=com.delta.mobile.android&hl=en_US&showAllReviews=true"
driver.get(baseUrl)
scrolls = 15
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break
elemtn = WebDriverWait(driver, 30).until(
EC.element_to_be_clickable((By.XPATH, "//span[contains(@class,'RveJvd snByac')]")))
elemtn.click()
scrolls = 5
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break
elemtn = WebDriverWait(driver, 30).until(
EC.element_to_be_clickable((By.XPATH, "//span[contains(@class,'RveJvd snByac')]")))
elemtn.click()
reviewText = WebDriverWait(driver, 30).until(
EC.presence_of_all_elements_located((By.XPATH, "//*[@class='UD7Dzf']")))
# reviewText = driver.find_elements_by_xpath("//*[@class='UD7Dzf']")
for textreview in reviewText:
print textreview.text