Python 用靓汤提取可变元素

Python 用靓汤提取可变元素,python,beautifulsoup,Python,Beautifulsoup,我正试图在没有运气的情况下勉强获得yelp餐厅的评级。我用的是漂亮的汤 基本上,源代码如下所示: <div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating"> <img class="offscreen" height="303" src="https://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_design_web/9b34e39c

我正试图在没有运气的情况下勉强获得yelp餐厅的评级。我用的是漂亮的汤

基本上,源代码如下所示:

<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">
    <img class="offscreen" height="303" src="https://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_design_web/9b34e39ccbeb/assets/img/stars/stars.png" width="84" alt="5.0 star rating">
</div>

但它似乎不起作用。

如果您需要对类使用进行部分搜索

s = """<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">
    <img class="offscreen" height="303" src="https://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_design_web/9b34e39ccbeb/assets/img/stars/stars.png" width="84" alt="5.0 star rating">
</div>"""

from bs4 import BeautifulSoup
r = BeautifulSoup(s, "html.parser")
rating = r.find_all('div')
for i in rating:
    if "i-stars i-stars--regular-5" in " ".join(i["class"]):
        print(i.get('title', 'No title attribute')) 

您可以使用以下函数来执行此操作:

from bs4 import BeautifulSoup
import re
html_text = '<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">\n    <img class="offscreen" height="303" src="https://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_design_web/9b34e39ccbeb/assets/img/stars/stars.png" width="84" alt="5.0 star rating">\n</div'
soup = BeautifulSoup(html_text, 'html.parser')
soup.find_all(class_ = lambda x:re.search(r"i\-stars i\-stars\-\-regular\-\d rating\-\w+",x))
根据您的要求给出其中的正则表达式模式。在这里,我将其标记为任何分级大小和星级。

find\u all返回所有匹配项的列表。所以,你可以使用。列一个清单。要仅获取一项,您必须使用find_all…[0],或者更好的是,使用find函数;返回第一个匹配项

此外,由于类名根据评级而更改,因此可以使用常量类并将它们添加到列表中。例如,在这里,i-start和rating-large似乎是常数。因此,您可以使用:

html = '''
<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">
    <img class="offscreen" height="303" src="https://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_design_web/9b34e39ccbeb/assets/img/stars/stars.png" width="84" alt="5.0 star rating">
</div>'''
soup = BeautifulSoup(html, 'lxml')
rating = soup.find('div', {'class': ['i-stars', 'rating-large']}).get('title', 'No title attribute')
print(rating)
# 5.0 star rating
使用正则表达式-


我以为他要求的是动态评级,任何评级,你的都只能获得5星级的评级。也许我的答案中的解决方案会给出find all函数中的部分匹配。
from bs4 import BeautifulSoup
import re
html_text = '<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">\n    <img class="offscreen" height="303" src="https://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_design_web/9b34e39ccbeb/assets/img/stars/stars.png" width="84" alt="5.0 star rating">\n</div'
soup = BeautifulSoup(html_text, 'html.parser')
soup.find_all(class_ = lambda x:re.search(r"i\-stars i\-stars\-\-regular\-\d rating\-\w+",x))
html = '''
<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">
    <img class="offscreen" height="303" src="https://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_design_web/9b34e39ccbeb/assets/img/stars/stars.png" width="84" alt="5.0 star rating">
</div>'''
soup = BeautifulSoup(html, 'lxml')
rating = soup.find('div', {'class': ['i-stars', 'rating-large']}).get('title', 'No title attribute')
print(rating)
# 5.0 star rating
ratings = soup.select_one('div.i-stars.rating-large').get('title', 'No title attribute')
import re
rating = [x.get('title', 'No title attribute') for x in r.find_all('div', attrs={"class": re.compile("i-stars i-stars--regular-")})]
print(rating)