Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Selenium无法找到“;应用程序id标题“;尝试加载google play页面时的元素_Python_Selenium_Web Scraping_Beautifulsoup_Google Play - Fatal编程技术网

Python Selenium无法找到“;应用程序id标题“;尝试加载google play页面时的元素

Python Selenium无法找到“;应用程序id标题“;尝试加载google play页面时的元素,python,selenium,web-scraping,beautifulsoup,google-play,Python,Selenium,Web Scraping,Beautifulsoup,Google Play,我试图从google play商店中获取评论,但我不断收到以下错误: DevTools listening on ws://127.0.0.1:53044/devtools/browser/9de3e58b-6384-4809-bf01-31d47a57879f Traceback (most recent call last): File "c:/Users/Emil/Documents/Guatrain_Reviews/guatrain_reviews.py", line 20, in

我试图从google play商店中获取评论,但我不断收到以下错误:

DevTools listening on ws://127.0.0.1:53044/devtools/browser/9de3e58b-6384-4809-bf01-31d47a57879f
Traceback (most recent call last):
  File "c:/Users/Emil/Documents/Guatrain_Reviews/guatrain_reviews.py", line 20, in <module>
    Ptitle = driver.find_element_by_class_name('id-app-title').text.replace(' ','')
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"id-app-title"}
  (Session info: chrome=71.0.3578.98)
  (Driver info: chromedriver=2.46.628402 (536cd7adbad73a3783fdc2cab92ab2ba7ec361e1),platform=Windows NT 10.0.17134 x86_64)
是否有人能指出我在哪里可以找到我感兴趣的应用程序的Id,或者帮助我确定哪里出了问题

谢谢

编辑

我想要的最终结果需要如下所示:

我插入了哪个应用程序url?它将提取评级和评论:


谢谢

代码来自2016年,所以我假设他们改变了结构,这就是为什么没有“id应用程序标题”或原始代码中的任何内容。这只是我的假设

这段代码还有很多工作要做(比如通过selenium更改time.sleep的隐式等待时间,坦率地说,只是为了让它更健壮,因为我只看了这个特定的app review.EDIT,见下文)这是一个非常复杂的html,包含大量嵌套的
div
span
标记,没有与属性/类相关的特定含义,等等。因此,我很难拉出每个用户评论元素

但基本上,我可以用浏览器打开页面,让它继续向下滚动,直到它可以单击“显示更多”,然后继续x次

一旦这样做,它就会迭代span标记。现在我发现每10个span标记都与单个用户相关。然而,如果应用程序所有者对审查作出回应,则会抵消2,因此必须对此进行解释

我对编程比较新,所以我为混乱的代码和低效率道歉。我相信专家能够提供更好的解决方案,但是,这有希望让您开始或尝试:

#load webdriver function from selenium
from selenium import webdriver
from time import sleep
import bs4
import pandas as pd
import requests
from selenium.webdriver.common.keys import Keys
import time

# Change this number to get more or less reviews
# Current set of x=100 yielded 11,312 reviews
x = 100

link = "https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(link + '&showAllReviews=true')

num_clicks = 0
num_scrolls = 0
while num_clicks <= x and num_scrolls <= x*5:
    try:
        show_more = driver.find_element_by_xpath('//*[@id="fcxH9b"]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div/content/span')
        show_more.click()
        num_clicks += 1

    except:
        html = driver.find_element_by_tag_name('html')
        html.send_keys(Keys.END)
        num_scrolls +=1
        time.sleep(2)

soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
h2 = soup.find_all('h2')

results_df = pd.DataFrame()
for ele in h2:
    if ele.text == 'Reviews':
        c_wiz = ele.parent.parent.find_all('c-wiz')
        for sibling in c_wiz[0].next_siblings:
            try:
                #print (sibling)
                comment_shift = 0
                spans = sibling.find_all('span')
                for user_block in range(0,len(spans)):
                    i = user_block *10
                    name = spans[i+0+comment_shift].text
                    try:
                        rating = spans[i+1+comment_shift].div.next_element['aria-label']
                        rating = str(''.join(filter(str.isdigit, rating)))
                    except:
                        comment_shift += 2
                        continue
                    date = spans[i+2+comment_shift].text
                    review = spans[i+8+comment_shift].text
                    print ('Name: %s\nRating: %s\nDate: %s\nReview: %s\n' %(name, rating, date, review))
                    temp_df = pd.DataFrame([[date, rating, name, review]], columns = ['Date','Rating','User','Review'])

                    results_df = results_df.append(temp_df)
            except:
                continue

results_df = results_df.reset_index(drop=True)
results_df.to_csv('C:/reviews.csv', index=False)

driver.close()
#从selenium加载webdriver函数
从selenium导入webdriver
从时间上导入睡眠
进口bs4
作为pd进口熊猫
导入请求
从selenium.webdriver.common.keys导入密钥
导入时间
#更改此数字以获得更多或更少的评论
#当前的一组x=100产生了11312次审查
x=100
链接=”https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"
driver=webdriver.Chrome('C:/chromedriver\u win32/chromedriver.exe')
获取(link+'&showAllReviews=true')
点击次数=0
num_scrolls=0

虽然num_单击现在需要稍微改进。将8改为12,你会得到评论

review=span[i+12+comment\u shift].text

您可以从play store添加您想要的快照吗。我无法通过点击提供的URLPtitle=driver.find_element_by_css_selector('[itemprop=name]span')对上述类中的任何元素进行优化。text@QHarr这将导致selenium.common.exceptions.NoSuchElementException:消息:没有这样的元素:无法定位元素:{“方法”:“xpath”,“选择器”:“/*[@id=“body content”]/div/div/div[1]/div[2] /div[2]/div[1]/div[4]/button[2]/div[2]/div/div“}错误。这不会导致您显示的错误。就我个人而言,这是因为我在整个过程中看到了硬编码值的使用,而不是等待条件,所以在生成所有评论之前,我看不到所需的单击/滚动设置是如何运行的(滚动,然后单击页面上的“显示更多”),与xpath相比,我更喜欢使用css选择器进行选择,因为现代浏览器都是针对css进行优化的(关于使用的实际选择器组合(以及与xpath等效的选择器组合)对哪个更快的影响,有一些例外;浏览器的最新程度如何)……我没有检查很久,因为那时我已经决定我可能会从头开始写。
#load webdriver function from selenium
from selenium import webdriver
from time import sleep
import bs4
import pandas as pd
import requests
from selenium.webdriver.common.keys import Keys
import time

# Change this number to get more or less reviews
# Current set of x=100 yielded 11,312 reviews
x = 100

link = "https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(link + '&showAllReviews=true')

num_clicks = 0
num_scrolls = 0
while num_clicks <= x and num_scrolls <= x*5:
    try:
        show_more = driver.find_element_by_xpath('//*[@id="fcxH9b"]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div/content/span')
        show_more.click()
        num_clicks += 1

    except:
        html = driver.find_element_by_tag_name('html')
        html.send_keys(Keys.END)
        num_scrolls +=1
        time.sleep(2)

soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
h2 = soup.find_all('h2')

results_df = pd.DataFrame()
for ele in h2:
    if ele.text == 'Reviews':
        c_wiz = ele.parent.parent.find_all('c-wiz')
        for sibling in c_wiz[0].next_siblings:
            try:
                #print (sibling)
                comment_shift = 0
                spans = sibling.find_all('span')
                for user_block in range(0,len(spans)):
                    i = user_block *10
                    name = spans[i+0+comment_shift].text
                    try:
                        rating = spans[i+1+comment_shift].div.next_element['aria-label']
                        rating = str(''.join(filter(str.isdigit, rating)))
                    except:
                        comment_shift += 2
                        continue
                    date = spans[i+2+comment_shift].text
                    review = spans[i+8+comment_shift].text
                    print ('Name: %s\nRating: %s\nDate: %s\nReview: %s\n' %(name, rating, date, review))
                    temp_df = pd.DataFrame([[date, rating, name, review]], columns = ['Date','Rating','User','Review'])

                    results_df = results_df.append(temp_df)
            except:
                continue

results_df = results_df.reset_index(drop=True)
results_df.to_csv('C:/reviews.csv', index=False)

driver.close()
print (results_df)
                   Date                        ...                                                                     Review
0       31 January 2019                        ...                          Was broken for pay as you go customers. Has no...
1       2 February 2019                        ...                          o2 just won't be happy until their customer se...
2       1 February 2019                        ...                                             Excellent quality piece of kit
3       6 February 2019                        ...                                                                      Gud Now it needs a slightly improvement. Change 8 to 12 and you will get reviews.

review = spans[i+12+comment_shift].text