Python 基于元素字符串中的特定单词搜索HTML元素

Python 基于元素字符串中的特定单词搜索HTML元素,python,beautifulsoup,Python,Beautifulsoup,尝试创建一个程序,该程序可以使用Beauty Soup模块查找并替换特定元素中的标记。然而,我很难通过在元素字符串中找到的特定单词来“搜索”这些元素。假设我可以让我的代码通过字符串中指定的单词“查找”这些元素,那么我将“展开”元素的“p”标记,并将它们“包装”到新的“h1”标记中 以下是一些示例HTML代码作为输入: <p> ExampleStringWord#1 needs to “find” this entire element based on the "finding" o

尝试创建一个程序,该程序可以使用Beauty Soup模块查找并替换特定元素中的标记。然而,我很难通过在元素字符串中找到的特定单词来“搜索”这些元素。假设我可以让我的代码通过字符串中指定的单词“查找”这些元素,那么我将“展开”元素的“p”标记,并将它们“包装”到新的“h1”标记中

以下是一些示例HTML代码作为输入:

<p> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </p>
<p> Example#2  this element ignored </p>
<p> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different <p>
如果使用上面的示例HTML输入,我希望代码如下所示:

<h1> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </h1>
<p> Example#2  this element ignored </p>
<h1> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different <h1>
ExampleStringWord#1需要基于第一个单词的“查找”来“查找”整个元素
示例#2忽略此元素

ExampleStringWord#1也需要找到整个元素,因为此字符串的第一个单词就是我要“搜索”的,即使字符串中第一个单词后面的措辞不同
但是,我的代码只会专门查找显式包含“ExampleStringWord#1”的元素,并将排除包含任何超过该值的字符串措辞的元素。 我确信我将需要以某种方式利用正则表达式来查找指定单词的元素(除了后面的任何字符串措辞)。但是,我不太熟悉正则表达式,所以我不确定如何结合BeautifulSoup模块来实现这一点

此外,我还查看了Beautiful Soup中的文档,以将正则表达式作为过滤器()传递,但在我的案例中,我无法使其正常工作。我还回顾了其他与通过BeautifulSoup传递正则表达式相关的帖子,但我没有发现任何能够充分解决我的问题的帖子。
感谢您的帮助

如果您要定位具有指定子字符串的
p
元素(注意
re.compile()
部分),然后用
h1
替换元素的名称,该怎么办

import re

from bs4 import BeautifulSoup

data = """
<body>
    <p> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </p>
    <p> Example#2  this element ignored </p>
    <p> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different </p>
</body>
"""

soup = BeautifulSoup(data, "html.parser")
for p in soup.find_all("p", string=re.compile("ExampleStringWord#1")):
    p.name = 'h1'
print(soup)
重新导入
从bs4导入BeautifulSoup
data=”“”
ExampleStringWord#1需要基于第一个单词的“查找”来“查找”整个元素

示例#2忽略此元素

ExampleStringWord#1也需要找到整个元素,因为此字符串的第一个单词就是我要“搜索”的,即使字符串中第一个单词后面的措辞不同

""" soup=BeautifulSoup(数据,“html.parser”) 对于汤中的p.find_all(“p”,string=re.compile(“ExampleStringWord#1”): p、 名称='h1' 印花(汤)
印刷品:

<body>
    <h1> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </h1>
    <p> Example#2  this element ignored </p>
    <h1> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different </h1>
</body>

ExampleStringWord#1需要基于第一个单词的“查找”来“查找”整个元素
示例#2忽略此元素

ExampleStringWord#1也需要找到整个元素,因为此字符串的第一个单词就是我要“搜索”的,即使字符串中第一个单词后面的措辞不同
<body>
    <h1> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </h1>
    <p> Example#2  this element ignored </p>
    <h1> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different </h1>
</body>