Python 如何删除"；文本标记"；配上漂亮的汤_Python_Beautifulsoup

Python 如何删除"；文本标记"；配上漂亮的汤

python

Python 如何删除"；文本标记"；配上漂亮的汤,python,beautifulsoup,Python,Beautifulsoup,请告诉我如何像这样从html中删除文本标记并保留子元素 <text _ngcontent-c0="" _nghost-c2=""> <p>sample text</p> </text> <image> <figure> <img alt="" src="xxxxx.jpg"/> </figure> </image> 您可以获得如下“展开”元素： f

请告诉我如何像这样从html中删除文本标记并保留子元素

<text _ngcontent-c0="" _nghost-c2="">
    <p>sample text</p>
</text>
<image>
    <figure>
        <img alt="" src="xxxxx.jpg"/>
    </figure>
</image>

您可以获得如下“展开”元素：

from bs4 import BeautifulSoup

content = '<text _ngcontent-c0="" _nghost-c2=""><p>sample text</p></text><image><figure><img alt="" src="xxxxx.jpg"/></figure></image>'

soup = BeautifulSoup(content)
for p in soup.find_all('p'):
    p.parent.unwrap()
    print(p.parent)  # prints <p>sample text</p><image><figure><img alt="" src="xxxxx.jpg"/></figure></image>

从bs4导入美化组
内容='示例文本'
汤=美汤（内容）
对于汤中的p。查找所有（'p'）：
p、 父项展开（）
打印（p.parent）#打印示例文本

从您提供的代码来看，似乎您根本没有使用BeautifulSoup，而是尝试在普通字符串上使用

unwrap

方法，因此出现了您提到的错误。

如果您使用的是BeatifulSoup，请提供用于解析HTML的其余代码。

很抱歉，缺少描述。我想知道当内容中有其他元素时如何响应。我更新了我的问题。我的例子应该也适用于你更新的案例。它应该返回

示例文本

。我更新了我的答案，以澄清您的问题。

from bs4 import BeautifulSoup

content = '<text _ngcontent-c0="" _nghost-c2="">
             <p>sample text</p>
           </text>
           <image>
             <figure>
               <img alt="" src="xxxxx.jpg"/>
             </figure>
           </image>'

while (content.text):
    content.text.unwrap()

from bs4 import BeautifulSoup

content = '<text _ngcontent-c0="" _nghost-c2=""><p>sample text</p></text><image><figure><img alt="" src="xxxxx.jpg"/></figure></image>'

soup = BeautifulSoup(content)
for p in soup.find_all('p'):
    p.parent.unwrap()
    print(p.parent)  # prints <p>sample text</p><image><figure><img alt="" src="xxxxx.jpg"/></figure></image>