Python 使用BeautifulSoup在HTML注释之间提取文本_Python_Python 3.x_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup在HTML注释之间提取文本

python python-3.x web-scraping

Python 使用BeautifulSoup在HTML注释之间提取文本,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,使用Python3和Beautifulsoup4，我希望能够从HTML页面中提取文本，该页面仅由上面的注释描述。例如： <\!--UNIQUE COMMENT--> I would like to get this text <\!--SECOND UNIQUE COMMENT--> I would also like to find this text 我想得到这个文本我也想找到这篇文章我已经找到了各种方法来提取页面的文本或评论，但没有办法做到我想要的。任何帮

使用Python3和Beautifulsoup4，我希望能够从HTML页面中提取文本，该页面仅由上面的注释描述。例如：

<\!--UNIQUE COMMENT-->
I would like to get this text
<\!--SECOND UNIQUE COMMENT-->
I would also like to find this text


我想得到这个文本
我也想找到这篇文章

我已经找到了各种方法来提取页面的文本或评论，但没有办法做到我想要的。任何帮助都将不胜感激。

Python的

bs4

模块有一个类。您可以使用它来提取注释

from bs4 import BeautifulSoup, Comment

html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')
comments = soup.findAll(text=lambda text:isinstance(text, Comment))

您只需遍历所有可用的注释，查看它是否是您所需的条目之一，然后显示以下元素的文本，如下所示：

from bs4 import BeautifulSoup, Comment

html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')

for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
    if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:
        print comment.next_element.strip()

Martin答案的改进-您可以直接搜索特定注释-无需迭代所有注释，然后检查值-一次性完成：

comments_to_search_for = {'UNIQUE COMMENT', 'SECOND UNIQUE COMMENT'}
for comment in soup.find_all(text=lambda text: isinstance(text, Comment) and text in comments_to_search_for):
    print(comment.next_element.strip())

印刷品：

I would like to get this text
I would also like to find this text

我认为OP是试图在注释之间提取文本，而不是注释本身。

我想得到这个文本

-这个？是的，那个。我可以很好地提取评论。我刚才正要这么做+这正是我所需要的。非常感谢你。

I would like to get this text
I would also like to find this text