Python 在BS4中按文本查找HTML标记_Python_Parsing_Web Scraping_Beautifulsoup

Python 在BS4中按文本查找HTML标记

python parsing web-scraping

Python 在BS4中按文本查找HTML标记,python,parsing,web-scraping,beautifulsoup,Python,Parsing,Web Scraping,Beautifulsoup,假设我们有如下HTML模式你好因此，我想通过使用BS4搜索“Hello”来查找标记（现在我不知道文本周围的标记是什么）它应该像smth一样 full_string = soup.find(text=re.compile('Hello')) full_string.get_parent_tag() # <p> full_string.get_parent_class() # cls1 full_string=soup.find（text=re.compile（'Hello'））

假设我们有如下HTML模式

你好

因此，我想通过使用BS4搜索“Hello”来查找标记

（现在我不知道文本周围的标记是什么）

它应该像smth一样

full_string = soup.find(text=re.compile('Hello'))
full_string.get_parent_tag() # <p>
full_string.get_parent_class() # cls1

full_string=soup.find（text=re.compile（'Hello'））
完整字符串。获取父标记（）
完整字符串。获取父类（）

在BS4中可能吗？谢谢

当然有可能

import re

from bs4 import BeautifulSoup


your_html = """<p class='cls1'> Hello </p>"""
print(BeautifulSoup(your_html, "html.parser").find_all(lambda t: t.name == "p" and re.compile("Hello")))

重新导入
从bs4导入BeautifulSoup
你的_html=“”你好“”“
打印（美化组（您的html，“html.parser”）。查找所有（lambda t:t.name==“p”并重新编译（“Hello”））

输出：

[<p class="cls1"> Hello </p>]

[<Element p at 0x7f2b172ae5e0>]

Hello

[你好]

如果您不知道要查找的标签，可以尝试以下方法：

from lxml import html


your_html = """<p class='cls1'> Hello </p>"""
print(html.fromstring(your_html).xpath("//*[contains(text(), 'Hello')]"))

从lxml导入html
你的_html=“”你好“”“
打印（html.fromstring（您的html）.xpath（“/*[contains（text（），'Hello'）]））

输出：

[<p class="cls1"> Hello </p>]

[<Element p at 0x7f2b172ae5e0>]

Hello

[]

要通过文本搜索标记，可以使用CSS选择器

p:contains（）

：

感谢您的回复！但问题是如果我不知道标签，只想通过文本“你好”来搜索怎么办。。。你至少应该有一个你想要的标签，但如果没有，你可以试试

xpath

和

lxml

谢谢，我看到了你的更新！这正是我想要的：）