Python 查找URL的评级分数

Python 查找URL的评级分数,python,web-scraping,beautifulsoup,nlp,Python,Web Scraping,Beautifulsoup,Nlp,我试图创建一个数据框架,由20家银行的评论组成,在下面的代码中,我试图获得20家客户的评级分数值,但发现很难,因为我是新的BeautifulSoup和Webscraping import pandas as pd import requests from bs4 import BeautifulSoup url = 'https://www.bankbazaar.com/reviews.html' page = requests.get(url) print(page.text) soup =

我试图创建一个数据框架,由20家银行的评论组成,在下面的代码中,我试图获得20家客户的评级分数值,但发现很难,因为我是新的BeautifulSoup和Webscraping

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')


 Rating = []
rat_elem = soup.find_all('span')
for rate in rat_elem:
    Rating.append(rate.find_all('div').get('value')) 

 print(Rating)

我更喜欢使用CSS选择器,因此您应该能够通过将
itemprop
属性设置为
ratingvalue
的范围作为目标来确定所有范围的目标

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

Rating = []
for rate in soup.select('span[itemprop=ratingvalue]'):
    Rating.append(rate.get_text()) 

print(Rating)
相关输出

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']  
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

编辑:添加相关的输出

我更喜欢使用CSS选择器,因此您应该能够通过将
itemprop
属性设置为
ratingvalue
的范围作为目标来确定所有范围的目标

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

Rating = []
for rate in soup.select('span[itemprop=ratingvalue]'):
    Rating.append(rate.get_text()) 

print(Rating)
['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']  
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']
相关输出

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']  
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']
编辑:添加相关输出

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']  
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']


您能告诉我为什么选择“itemprop:rating value”吗?ratingvalue是用户给出的评分,而“bestrating”是用户可以给出的最大值。在我看来,“ratingvalue”似乎是更合适的选择。(不确定你是否在问这个问题。)我当时浏览html代码,但找不到它。我使用Chrome中的“开发人员工具”浏览元素。这是Ctrl-Shift-I,或者右键单击页面中的项目,然后单击“检查”。它会把你带到网页源代码中的元素。。。在Firefox中,右键单击“inspect element”的快捷方式保持不变(与chrome相同)。在Firefox中,右键单击“inspect element”的快捷方式保持不变(与chrome相同),它被称为“inspector”,而不是“developer tools”。您能告诉我为什么选择了“itemprop:rating value”。“ratingvalue”是用户给出的评级,而“bestrating”是用户可以提供的最大值。在我看来,“ratingvalue”似乎是更合适的选择。(不确定你是否在问这个问题。)我当时浏览html代码,但找不到它。我使用Chrome中的“开发人员工具”浏览元素。这是Ctrl-Shift-I,或者右键单击页面中的项目,然后单击“检查”。它会把你带到网页源代码中的元素。。。在Firefox中,右键单击“inspect element”快捷方式保持不变(与chrome相同)。在Firefox中,右键单击“inspect element”快捷方式保持不变(与chrome相同),它被称为“inspector”,而不是“developer tools”。同意。不知道为什么find看起来如此流行。我想这是因为长久以来,
select
是如此有限。如果您是bs4新手,您会发现更多的
find
/
find\u all
示例。啊。。。我想我是python新手,但对CSS并不陌生,所以我从最近的变化中受益匪浅。对我来说,查找似乎是一种更昂贵/更慢的方法,我想。是的,对我来说,如果你要删除很多HTML,你就会知道HTML和CSS。如果你知道CSS,你为什么要使用更笨拙的
find
,除非你利用了select无法获得的功能,比如regex等。但是如果你在find是最佳选择时使用bs4,旧习惯就很难改掉。同意。不知道为什么find看起来如此流行。我想这是因为长久以来,
select
是如此有限。如果您是bs4新手,您会发现更多的
find
/
find\u all
示例。啊。。。我想我是python新手,但对CSS并不陌生,所以我从最近的变化中受益匪浅。对我来说,查找似乎是一种更昂贵/更慢的方法,我想。是的,对我来说,如果你要删除很多HTML,你就会知道HTML和CSS。如果你知道CSS,你为什么要使用更笨拙的
find
,除非你利用了select无法获得的功能,比如regex等。但是如果你在find是最好的选择时使用bs4,旧习惯就很难改掉了。