使用python中的beautifulsoup 4编辑html文档中的所有字符串_Python_Html_Html Parsing_Beautifulsoup

使用python中的beautifulsoup 4编辑html文档中的所有字符串

python html

使用python中的beautifulsoup 4编辑html文档中的所有字符串,python,html,html-parsing,beautifulsoup,Python,Html,Html Parsing,Beautifulsoup,假设我们有这样一个html文档： <html> <head> <title>title</title> </head> <body> <div class="c1">division <p> passage in division <b>bold in passage </b> </p> </div> </body&

假设我们有这样一个html文档：

<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
    <p>
    passage in division
    <b>bold in passage </b>
    </p>
</div>
</body>
</html>


标题
分开

分组通过
粗体

我需要在每个字符串前面加一个单词“cool” （或bs4术语中的Navigablesting）在html文档中

我试着遍历每一个元素并检查它是否正确有任何子项，如果没有，则编辑字符串。这是不准确的,，

此外，编辑没有起任何作用。

通过使用参数调用，可以找到文档中的所有文本节点。用于将文本节点替换为修改后的文本：

from bs4 import BeautifulSoup

html = """
<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
    <p>
    passage in division
    <b>bold in passage </b>
    </p>
</div>
</body>
</html>
"""

soup = BeautifulSoup(html)
for element in soup.find_all(text=True):
    text = element.string.strip()
    if text:
        element.replace_with("cool " + text)

print soup.prettify()

从bs4导入美化组
html=”“”
标题
分开

分组通过
粗体

"""
soup=BeautifulSoup（html）
对于soup.find_all中的元素（text=True）：
text=element.string.strip（）
如果文本：
元素。将_替换为（“酷”+文本）
打印汤。美化

印刷品：

<html>
 <head>
  <title>
   cool title
  </title>
 </head>
 <body>
  <div class="c1">
   cool division
   <p>
    cool passage in division
    <b>
     cool bold in passage
    </b>
   </p>
  </div>
 </body>
</html>


酷标题
冷藏室

分裂中的冷通道
文章中的粗体

太棒了。有没有办法将所有的

标记保留在外？@qed您可以使用

查找父项（）

。例如，

if not元素。find_parent（'script'）：

。。。