Web scraping 如何在BeautifulSoup中仅获取标记的内部文本，而不包括嵌入的文本？_Web Scraping_Beautifulsoup_Screen Scraping_Urllib2_Python Requests

Web scraping 如何在BeautifulSoup中仅获取标记的内部文本，而不包括嵌入的文本？

web-scraping

Web scraping 如何在BeautifulSoup中仅获取标记的内部文本，而不包括嵌入的文本？,web-scraping,beautifulsoup,screen-scraping,urllib2,python-requests,Web Scraping,Beautifulsoup,Screen Scraping,Urllib2,Python Requests,比如说, <ul> <li> <b>Hey, sexy!</b> Hello </li> </ul> 嘿，性感！你好我只需要li标记中的“Hello” 如果我使用soup.find（“ul”）.li.text，它还包括b标记。您可以使用，这将从树中删除标记就你而言： soup.find("ul").b.extract() # removes the <b

比如说,

<ul>
    <li>
        <b>Hey, sexy!</b>
        Hello
    </li>
</ul>



嘿，性感！
你好

我只需要

li

标记中的“Hello”

如果我使用

soup.find（“ul”）.li.text，它还包括b
标记。
您可以使用，这将从树中删除标记
就你而言：
soup.find("ul").b.extract() # removes the <b> tag
soup.find("ul").li.text     # contents of <li> without <b>

soup.find（“ul”）.b.extract（）#删除标记
soup.find（“ul”）.li.text#不带
您可以像这样使用函数
from bs4 import BeautifulSoup

html = '''<ul><li><b>Hey, sexy!</b>Hello</li></ul>'''
soup = BeautifulSoup(html)
print soup.find('li').find(text=True, recursive=False)

从bs4导入美化组
嘿，性感！你好''
soup=BeautifulSoup（html）
打印soup.find（'li'）.find（text=True，recursive=False）
非常感谢，伙计！不知道在文档中我是怎么错过的。但是有一件事，文本= true是什么？它只返回那个孩子的文本值（或者如果没有文本），并且不考虑子元素。如果去掉recursive=False，那么只会得到“嘿，性感！”当它递归到标记中时。对我不起作用。我使用了contents
和标签之间的索引。@Adam谢谢。我来看看，这个答案已经有一段时间了，所以我对它并不陌生。为什么不考虑发表一个新的答案？