Python 查找URL的评级分数_Python_Web Scraping_Beautifulsoup_Nlp

Python 查找URL的评级分数

python web-scraping nlp

Python 查找URL的评级分数,python,web-scraping,beautifulsoup,nlp,Python,Web Scraping,Beautifulsoup,Nlp,我试图创建一个数据框架，由20家银行的评论组成，在下面的代码中，我试图获得20家客户的评级分数值，但发现很难，因为我是新的BeautifulSoup和Webscraping import pandas as pd import requests from bs4 import BeautifulSoup url = 'https://www.bankbazaar.com/reviews.html' page = requests.get(url) print(page.text) soup =

我试图创建一个数据框架，由20家银行的评论组成，在下面的代码中，我试图获得20家客户的评级分数值，但发现很难，因为我是新的BeautifulSoup和Webscraping

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')


 Rating = []
rat_elem = soup.find_all('span')
for rate in rat_elem:
    Rating.append(rate.find_all('div').get('value')) 

 print(Rating)

我更喜欢使用CSS选择器，因此您应该能够通过将

itemprop

属性设置为

ratingvalue

的范围作为目标来确定所有范围的目标

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

Rating = []
for rate in soup.select('span[itemprop=ratingvalue]'):
    Rating.append(rate.get_text()) 

print(Rating)

相关输出

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

编辑：添加相关的输出

我更喜欢使用CSS选择器，因此您应该能够通过将

itemprop

属性设置为

ratingvalue

的范围作为目标来确定所有范围的目标

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

Rating = []
for rate in soup.select('span[itemprop=ratingvalue]'):
    Rating.append(rate.get_text()) 

print(Rating)

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

相关输出

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

编辑：添加相关输出

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

您能告诉我为什么选择“itemprop:rating value”吗？ratingvalue是用户给出的评分，而“bestrating”是用户可以给出的最大值。在我看来，“ratingvalue”似乎是更合适的选择。（不确定你是否在问这个问题。）我当时浏览html代码，但找不到它。我使用Chrome中的“开发人员工具”浏览元素。这是Ctrl-Shift-I，或者右键单击页面中的项目，然后单击“检查”。它会把你带到网页源代码中的元素。。。在Firefox中，右键单击“inspect element”的快捷方式保持不变（与chrome相同）。在Firefox中，右键单击“inspect element”的快捷方式保持不变（与chrome相同），它被称为“inspector”，而不是“developer tools”。您能告诉我为什么选择了“itemprop:rating value”。“ratingvalue”是用户给出的评级，而“bestrating”是用户可以提供的最大值。在我看来，“ratingvalue”似乎是更合适的选择。（不确定你是否在问这个问题。）我当时浏览html代码，但找不到它。我使用Chrome中的“开发人员工具”浏览元素。这是Ctrl-Shift-I，或者右键单击页面中的项目，然后单击“检查”。它会把你带到网页源代码中的元素。。。在Firefox中，右键单击“inspect element”快捷方式保持不变（与chrome相同）。在Firefox中，右键单击“inspect element”快捷方式保持不变（与chrome相同），它被称为“inspector”，而不是“developer tools”。同意。不知道为什么find看起来如此流行。我想这是因为长久以来，

select

是如此有限。如果您是bs4新手，您会发现更多的

find

find\u all

示例。啊。。。我想我是python新手，但对CSS并不陌生，所以我从最近的变化中受益匪浅。对我来说，查找似乎是一种更昂贵/更慢的方法，我想。是的，对我来说，如果你要删除很多HTML，你就会知道HTML和CSS。如果你知道CSS，你为什么要使用更笨拙的

find

，除非你利用了select无法获得的功能，比如regex等。但是如果你在find是最佳选择时使用bs4，旧习惯就很难改掉。同意。不知道为什么find看起来如此流行。我想这是因为长久以来，

select

是如此有限。如果您是bs4新手，您会发现更多的

find

find\u all

find

，除非你利用了select无法获得的功能，比如regex等。但是如果你在find是最好的选择时使用bs4，旧习惯就很难改掉了。