Python Selenium无法找到“；应用程序id标题“；尝试加载google play页面时的元素_Python_Selenium_Web Scraping_Beautifulsoup_Google Play

Python Selenium无法找到“；应用程序id标题“；尝试加载google play页面时的元素

python selenium web-scraping

Python Selenium无法找到“；应用程序id标题“；尝试加载google play页面时的元素,python,selenium,web-scraping,beautifulsoup,google-play,Python,Selenium,Web Scraping,Beautifulsoup,Google Play,我试图从google play商店中获取评论，但我不断收到以下错误： DevTools listening on ws://127.0.0.1:53044/devtools/browser/9de3e58b-6384-4809-bf01-31d47a57879f Traceback (most recent call last): File "c:/Users/Emil/Documents/Guatrain_Reviews/guatrain_reviews.py", line 20, in

我试图从google play商店中获取评论，但我不断收到以下错误：

DevTools listening on ws://127.0.0.1:53044/devtools/browser/9de3e58b-6384-4809-bf01-31d47a57879f
Traceback (most recent call last):
  File "c:/Users/Emil/Documents/Guatrain_Reviews/guatrain_reviews.py", line 20, in <module>
    Ptitle = driver.find_element_by_class_name('id-app-title').text.replace(' ','')
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"id-app-title"}
  (Session info: chrome=71.0.3578.98)
  (Driver info: chromedriver=2.46.628402 (536cd7adbad73a3783fdc2cab92ab2ba7ec361e1),platform=Windows NT 10.0.17134 x86_64)

是否有人能指出我在哪里可以找到我感兴趣的应用程序的Id，或者帮助我确定哪里出了问题

谢谢

编辑

我想要的最终结果需要如下所示：

我插入了哪个应用程序url？它将提取评级和评论：

谢谢

代码来自2016年，所以我假设他们改变了结构，这就是为什么没有“id应用程序标题”或原始代码中的任何内容。这只是我的假设

这段代码还有很多工作要做（比如通过selenium更改time.sleep的隐式等待时间，坦率地说，只是为了让它更健壮，因为我只看了这个特定的app review.EDIT，见下文）这是一个非常复杂的html，包含大量嵌套的

div

和

span

标记，没有与属性/类相关的特定含义，等等。因此，我很难拉出每个用户评论元素

但基本上，我可以用浏览器打开页面，让它继续向下滚动，直到它可以单击“显示更多”，然后继续x次

一旦这样做，它就会迭代span标记。现在我发现每10个span标记都与单个用户相关。然而，如果应用程序所有者对审查作出回应，则会抵消2，因此必须对此进行解释

我对编程比较新，所以我为混乱的代码和低效率道歉。我相信专家能够提供更好的解决方案，但是，这有希望让您开始或尝试：

#load webdriver function from selenium
from selenium import webdriver
from time import sleep
import bs4
import pandas as pd
import requests
from selenium.webdriver.common.keys import Keys
import time

# Change this number to get more or less reviews
# Current set of x=100 yielded 11,312 reviews
x = 100

link = "https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(link + '&showAllReviews=true')

num_clicks = 0
num_scrolls = 0
while num_clicks <= x and num_scrolls <= x*5:
    try:
        show_more = driver.find_element_by_xpath('//*[@id="fcxH9b"]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div/content/span')
        show_more.click()
        num_clicks += 1

    except:
        html = driver.find_element_by_tag_name('html')
        html.send_keys(Keys.END)
        num_scrolls +=1
        time.sleep(2)

soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
h2 = soup.find_all('h2')

results_df = pd.DataFrame()
for ele in h2:
    if ele.text == 'Reviews':
        c_wiz = ele.parent.parent.find_all('c-wiz')
        for sibling in c_wiz[0].next_siblings:
            try:
                #print (sibling)
                comment_shift = 0
                spans = sibling.find_all('span')
                for user_block in range(0,len(spans)):
                    i = user_block *10
                    name = spans[i+0+comment_shift].text
                    try:
                        rating = spans[i+1+comment_shift].div.next_element['aria-label']
                        rating = str(''.join(filter(str.isdigit, rating)))
                    except:
                        comment_shift += 2
                        continue
                    date = spans[i+2+comment_shift].text
                    review = spans[i+8+comment_shift].text
                    print ('Name: %s\nRating: %s\nDate: %s\nReview: %s\n' %(name, rating, date, review))
                    temp_df = pd.DataFrame([[date, rating, name, review]], columns = ['Date','Rating','User','Review'])

                    results_df = results_df.append(temp_df)
            except:
                continue

results_df = results_df.reset_index(drop=True)
results_df.to_csv('C:/reviews.csv', index=False)

driver.close()

#从selenium加载webdriver函数
从selenium导入webdriver
从时间上导入睡眠
进口bs4
作为pd进口熊猫
导入请求
从selenium.webdriver.common.keys导入密钥
导入时间
#更改此数字以获得更多或更少的评论
#当前的一组x=100产生了11312次审查
x=100
链接=”https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"
driver=webdriver.Chrome（'C:/chromedriver\u win32/chromedriver.exe'）
获取（link+'&showAllReviews=true'）
点击次数=0
num_scrolls=0
虽然num_单击现在需要稍微改进。将8改为12，你会得到评论
review=span[i+12+comment\u shift].text
您可以从play store添加您想要的快照吗。我无法通过点击提供的URLPtitle=driver.find_element_by_css_selector（'[itemprop=name]span'）对上述类中的任何元素进行优化。text@QHarr这将导致selenium.common.exceptions.NoSuchElementException:消息：没有这样的元素：无法定位元素：{“方法”：“xpath”，“选择器”：“/*[@id=“body content”]/div/div/div[1]/div[2] /div[2]/div[1]/div[4]/button[2]/div[2]/div/div“}错误。这不会导致您显示的错误。就我个人而言，这是因为我在整个过程中看到了硬编码值的使用，而不是等待条件，所以在生成所有评论之前，我看不到所需的单击/滚动设置是如何运行的（滚动，然后单击页面上的“显示更多”），与xpath相比，我更喜欢使用css选择器进行选择，因为现代浏览器都是针对css进行优化的（关于使用的实际选择器组合（以及与xpath等效的选择器组合）对哪个更快的影响，有一些例外；浏览器的最新程度如何）……我没有检查很久，因为那时我已经决定我可能会从头开始写。
#load webdriver function from selenium
from selenium import webdriver
from time import sleep
import bs4
import pandas as pd
import requests
from selenium.webdriver.common.keys import Keys
import time

# Change this number to get more or less reviews
# Current set of x=100 yielded 11,312 reviews
x = 100

link = "https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(link + '&showAllReviews=true')

num_clicks = 0
num_scrolls = 0
while num_clicks <= x and num_scrolls <= x*5:
    try:
        show_more = driver.find_element_by_xpath('//*[@id="fcxH9b"]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div/content/span')
        show_more.click()
        num_clicks += 1

    except:
        html = driver.find_element_by_tag_name('html')
        html.send_keys(Keys.END)
        num_scrolls +=1
        time.sleep(2)

soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
h2 = soup.find_all('h2')

results_df = pd.DataFrame()
for ele in h2:
    if ele.text == 'Reviews':
        c_wiz = ele.parent.parent.find_all('c-wiz')
        for sibling in c_wiz[0].next_siblings:
            try:
                #print (sibling)
                comment_shift = 0
                spans = sibling.find_all('span')
                for user_block in range(0,len(spans)):
                    i = user_block *10
                    name = spans[i+0+comment_shift].text
                    try:
                        rating = spans[i+1+comment_shift].div.next_element['aria-label']
                        rating = str(''.join(filter(str.isdigit, rating)))
                    except:
                        comment_shift += 2
                        continue
                    date = spans[i+2+comment_shift].text
                    review = spans[i+8+comment_shift].text
                    print ('Name: %s\nRating: %s\nDate: %s\nReview: %s\n' %(name, rating, date, review))
                    temp_df = pd.DataFrame([[date, rating, name, review]], columns = ['Date','Rating','User','Review'])

                    results_df = results_df.append(temp_df)
            except:
                continue

results_df = results_df.reset_index(drop=True)
results_df.to_csv('C:/reviews.csv', index=False)

driver.close()

print (results_df)
                   Date                        ...                                                                     Review
0       31 January 2019                        ...                          Was broken for pay as you go customers. Has no...
1       2 February 2019                        ...                          o2 just won't be happy until their customer se...
2       1 February 2019                        ...                                             Excellent quality piece of kit
3       6 February 2019                        ...                                                                      Gud Now it needs a slightly improvement. Change 8 to 12 and you will get reviews.

review = spans[i+12+comment_shift].text