Python 使用BeautifulSoup解析HTML_Python_Beautifulsoup

Python 使用BeautifulSoup解析HTML

python

Python 使用BeautifulSoup解析HTML,python,beautifulsoup,Python,Beautifulsoup,图片很小，下面是另一个链接：我试图在底部提取评论的文本。我试过这个： y = soup.find_all("div", style = "margin-left:0.5em;") review = y[0].text 问题是，在未展开的div标记中有不需要的文本，从审阅内容中删除这些文本会变得很乏味。就我个人而言，我就是搞不懂这个。谁能帮帮我吗编辑：HTML是： div style="margin-left:0.5em;"> <div style="margin-bot

图片很小，下面是另一个链接：

我试图在底部提取评论的文本。我试过这个：

y = soup.find_all("div", style = "margin-left:0.5em;")
review = y[0].text

问题是，在未展开的div标记中有不需要的文本，从审阅内容中删除这些文本会变得很乏味。就我个人而言，我就是搞不懂这个。谁能帮帮我吗

编辑：HTML是：

div style="margin-left:0.5em;">
    <div style="margin-bottom:0.5em;"> 9 of 35 people found the following review helpful </div>
    <div style="margin-bottom:0.5em;">
    <div style="margin-bottom:0.5em;">
    <div class="tiny" style="margin-bottom:0.5em;">
        <b>
    </div>
    That is true. I tried it myself this morning. There's a little note on the Audible site that says "a few titles will require two credits" or something like that. A Dance with Dragons is one of those few.

文本上方的div标记如下所示：

<div class="tiny" style="margin-bottom:0.5em;">
    <b>
        <span class="h3color tiny">This review is from: </span>
        <a href="https://rads.stackoverflow.com/amzn/click/com/B005C7QVUE" rel="nofollow noreferrer">A Dance with Dragons: A Song of Ice and Fire: Book 5 (Audible Audio Edition)</a>
    </b>
</div>
That is true. I tried it myself this morning. There's a little note on the Audible site that says "a few titles will require two credits" or something like that. A Dance with Dragons is one of those few.

表明.strings方法是您想要的-它返回对象中每个字符串的迭代器。因此，如果您将该迭代器转换为一个列表并获取最后一项，您应该得到您想要的。例如：

$ python
>>> import bs4
>>> text = '<div style="mine"><div>unwanted</div>wanted</div>'
>>> soup = bs4.BeautifulSoup(text)
>>> soup.find_all("div", style="mine")[0].text
u'unwantedwanted'
>>> list(soup.find_all("div", style="mine")[0].strings)[-1]
u'wanted'

$ python
>>> import bs4
>>> text = '<div style="mine"><div>unwanted</div>wanted</div>'
>>> soup = bs4.BeautifulSoup(text)
>>> soup.find_all("div", style="mine")[0].text
u'unwantedwanted'
>>> list(soup.find_all("div", style="mine")[0].strings)[-1]
u'wanted'

要获取div.tiny尾部的文本：

review=soup.finddiv，tiny.findNextSiblingtext=True 完整示例：

!/usr/bin/env python 从bs4导入BeautifulSoup html= 35人中有9人认为以下评论有帮助本审查来自：这是真的。今天早上我自己试过了。在Audible网站上有一个小提示，说一些标题需要两个学分或类似的东西。与龙共舞就是其中之一。 soup=BeautifulSouphtml review=soup.finddiv，tiny.findNextSiblingtext=True 印刷评论输出

要获取div.tiny尾部的文本：

review=soup.finddiv，tiny.findNextSiblingtext=True 完整示例：

很抱歉造成混淆，文本不在类为tiny的div标记下。它位于主div标签下，样式边距为左：0.5em@user1709173:它有效吗？如果不是，则将实际html作为文本发布，而不是提供足够上下文的图片，即包含文本周围的元素。尾巴紧跟在小元素之后，所以下一个兄弟姐妹应该可以工作。对不起。我在编辑我的原始帖子时发布了HTML。扩展嵌套的div标记最终会显示文本，但它会变得有点长，因此不包括在我的编辑中。@user1709173:我已经用html尝试了我的代码，它可以工作。你得到了什么结果？我得到了'\n'。我在最初的帖子中发布了对div标签的扩展，其中包含类tiny。抱歉混淆…抱歉混淆，文本不在带有类tiny的div标记下。它位于主div标签下，样式边距为左：0.5em@user1709173:它有效吗？如果不是，则将实际html作为文本发布，而不是提供足够上下文的图片，即包含文本周围的元素。尾巴紧跟在小元素之后，所以下一个兄弟姐妹应该可以工作。对不起。我在编辑我的原始帖子时发布了HTML。扩展嵌套的div标记最终会显示文本，但它会变得有点长，因此不包括在我的编辑中。@user1709173:我已经用html尝试了我的代码，它可以工作。你得到了什么结果？我得到了'\n'。我在最初的帖子中发布了对div标签的扩展，其中包含类tiny。很抱歉给你带来困惑。。。