Python 用BeautifulSoup替换html标记_Python_Beautifulsoup

Python 用BeautifulSoup替换html标记

python

Python 用BeautifulSoup替换html标记,python,beautifulsoup,Python,Beautifulsoup,我目前正在用BeautifulSoup重新格式化一些HTML页面，我遇到了一些问题我的问题是原始HTML有如下内容： <li><p>stff</p></li> stff 及东西以及 <li><div><p><strong>stff</strong></p></div><li> stff 使用BeautifulSoup，我希望消除div和p

我目前正在用BeautifulSoup重新格式化一些HTML页面，我遇到了一些问题

我的问题是原始HTML有如下内容：

<li><p>stff</p></li>

stff

及

东西

以及

<li><div><p><strong>stff</strong></p></div><li>

stff

使用BeautifulSoup，我希望消除div和p标记（如果存在），但保留强标记

我正在查看漂亮的汤文档，但没有找到任何。想法

谢谢。

可以使用

replaceWith

完成您想要做的事情。您必须复制要用作替换的元素，然后将其作为参数提供给

replaceWith

。关于如何做到这一点，专家们非常清楚

您可以编写自己的函数来剥离标签：

import re

def strip_tags(string):
    return re.sub(r'<.*?>', '', string)

strip_tags("<li><div><p><strong>stff</strong></p></div><li>")
'stff'

重新导入
def条带标签（字符串）：
返回re.sub（r''，''，字符串）
带标签（“stff”）
“stff”

这个问题可能是指较旧版本的BeautifulSoup，因为使用bs4，您只需使用以下功能：

s = BeautifulSoup('<li><div><p><strong>stff</strong></p></div><li>')
s.div.unwrap()
>> <div></div>
s.p.unwrap()
>> <p></p>
s
>> <html><body><li><strong>stff</strong></li><li></li></body></html>

s=BeautifulSoup（'stff
）
s、 部门展开（）
>> 
s、 p.展开（）
>>
s
>>
stff

简单解决方案获取整个节点意味着div
：
转换为字符串
用所需的标记/字符串替换
用空字符串替换相应的标记
通过传递到beautifulsoup，将转换后的字符串转换为可解析字符串
我为mint

例如：
<div class="col-md-12 option" itemprop="text">
<span class="label label-info">A</span>

**-2<sup>31</sup> to 2<sup>31</sup>-1**


我看到了这个简单问题的许多答案，我也来到这里看到了一些有用的东西，但不幸的是，我没有得到我想要的东西，然后经过几次尝试，我找到了这个答案的简单解决方案，它就在这里
soup=BeautifulSoup（htmlData，“html.parser”）
h2\u headers=soup.find\u all（“h2”）
对于h2_标题中的标题：
header.name=“h1”#将h2标记替换为h1

所有h2标签都转换为h1。您可以通过更改名称来转换任何标记。
是的，但我实际上希望在那里。另外，在浏览了几个随机页面之后，只有div和p需要担心。FWIW是bs4文档
<div class="col-md-12 option" itemprop="text">
<span class="label label-info">A</span>

**-2<sup>31</sup> to 2<sup>31</sup>-1**

sup = opt.sup 
    if sup: //opt has sup tag then

         //opts converted to string. 
         opt = str(opts).replace("<sup>","^").replace("</sup>","") //replacing

         //again converted from string to beautiful string.
         s = BeautifulSoup(opt, 'lxml')

         //resign to required variable after manipulation
         opts = s.find("div", class_="col-md-12 option")

-2^31 to 2^31-1
without manipulation it will like this (-231 to 231-1)