Python 如何使用beautifulsoup4选择除某个html元素以外的所有内容？_Python_Html Parsing_Beautifulsoup

Python 如何使用beautifulsoup4选择除某个html元素以外的所有内容？

python

Python 如何使用beautifulsoup4选择除某个html元素以外的所有内容？,python,html-parsing,beautifulsoup,Python,Html Parsing,Beautifulsoup,例如： import bs4 html = ''' <div class="short-description std "> <em>Android Apps Security</em> provides guiding principles for how to best design and develop Android apps with security in mind. The book explores techniques that de

例如：

import bs4

html = '''
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
<p class="scroll-down">∨ <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> ∨</p></div>
'''
soup = bs4.BeautifulSoup(html)

只需搜索即可：

soup.find('p', class_='scroll-down')

我使用这个类来限制查找，但是因为没有其他

元素，所以这里有点多余

相反，如果需要删除标记，请使用上述方法首先找到它，然后调用它将其从文档中删除：

>>> soup.find('p', class_='scroll-down').extract()
<p class="scroll-down"> <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> </p>
>>> print soup

<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
</div>

>>soup.find（'p'，class='scroll-down'）.extract（）

>>>印花汤
Android应用程序安全性为如何
在设计和开发Android应用程序时，最好考虑安全性。这本书探讨了
开发人员可用于在中构建附加安全层的技术
他们的应用程序超出了安卓本身提供的安全控制。

两件事：删除的标记从

.extract（）

方法返回，您可以将其保存以供以后使用。标记将从文档中完全删除，如果您仍然需要它出现在文档中，则以后必须手动重新添加它

或者，您可以使用，它将从文档中完全删除标记，而不返回引用。标签将永远消失。

只需搜索它：

soup.find('p', class_='scroll-down')

我使用这个类来限制查找，但是因为没有其他

元素，所以这里有点多余

相反，如果需要删除标记，请使用上述方法首先找到它，然后调用它将其从文档中删除：

>>> soup.find('p', class_='scroll-down').extract()
<p class="scroll-down"> <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> </p>
>>> print soup

<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
</div>

>>soup.find（'p'，class='scroll-down'）.extract（）

>>>印花汤
Android应用程序安全性为如何
在设计和开发Android应用程序时，最好考虑安全性。这本书探讨了
开发人员可用于在中构建附加安全层的技术
他们的应用程序超出了安卓本身提供的安全控制。

两件事：删除的标记从

.extract（）

方法返回，您可以将其保存以供以后使用。标记将从文档中完全删除，如果您仍然需要它出现在文档中，则以后必须手动重新添加它

或者，您可以使用，它将从文档中完全删除标记，而不返回引用。标签将永远消失。

很抱歉，Martijn，标题中的问题是正确的，但我在示例中的问题措辞错误。我编辑了它，它应该是除了

元素以外的所有元素，而不是

元素。你的答案只是得到了

元素。@Bentley4：关于你期望的输出是什么以及你想对剩余部分做什么，这个问题仍然不明确。您的问题也仍然不正确，要过滤掉除

元素以外的所有内容，只需选择

元素。我猜你想反过来。我用过滤掉作为选择的同义词。所以我想要的是选择除

元素之外的所有元素。我在问题中将筛选改为选择。您还没有说明您要实现的目标。是否要打印HTML减去

标记，是否需要提取文本，是否需要搜索

标记中不包含的其他标记，等等。每次使用都有不同的方法遵循。而过滤就是从整体中删除某些内容，将其删除。过滤掉饮用水中的沉淀物等。要过滤的就是通过去除所有你不想要的东西来进行选择。：-）很抱歉，Martijn，标题中的问题是正确的，但我在示例中的问题措辞错误。我编辑了它，它应该是除了