Python 使用BeautifulSoup获取特定数据_Python_Beautifulsoup

Python 使用BeautifulSoup获取特定数据

python

Python 使用BeautifulSoup获取特定数据,python,beautifulsoup,Python,Beautifulsoup,我使用page.prettify（）整理HTML，这是我现在要提取的文本： <div class="item"> <b> name </b> <br/> stuff here </div> 名称这里的东西我的目标是从这里提取内容，但我被难住了，因为除了div之外，它没有包装在任何标记中，而div中已经包

我使用

page.prettify（）

整理HTML，这是我现在要提取的文本：

        <div class="item">
         <b>
          name
         </b>
         <br/>
         stuff here
        </div>


名称


这里的东西

我的目标是从这里提取

内容

，但我被难住了，因为除了

div

之外，它没有包装在任何标记中，而div中已经包含了其他内容。而且每一行前面的额外空格也使它变得更难

这样做的方法是什么？

您可以使用

div

元素的

.contents

属性直接获取其中的所有元素，然后选择一个字符串

编辑：

这就是我提到的方法：

from bs4 import BeautifulSoup
from bs4.element import NavigableString

soup = BeautifulSoup("""<div class='item'> <b> name </b>  <br/>  stuff here </div>""")
div = soup.find('div')
print ''.join([el.strip() for el in div.contents if type(el) == NavigableString])

从bs4导入美化组
从bs4.element导入NavigableString
汤=BeautifulSoup（““名称
这里的东西”）
div=soup.find（'div'）
打印“”。连接（[el.strip（），如果类型（el）=NavigableString]，则用于div.contents中的el）

您可以使用

div

元素的

.contents

属性直接获取其中的所有元素，然后选择一个字符串

编辑：

这就是我提到的方法：

from bs4 import BeautifulSoup
from bs4.element import NavigableString

soup = BeautifulSoup("""<div class='item'> <b> name </b>  <br/>  stuff here </div>""")
div = soup.find('div')
print ''.join([el.strip() for el in div.contents if type(el) == NavigableString])

从bs4导入美化组
从bs4.element导入NavigableString
汤=BeautifulSoup（““名称
这里的东西”）
div=soup.find（'div'）
打印“”。连接（[el.strip（），如果类型（el）=NavigableString]，则用于div.contents中的el）

如果您确实确定，您希望选取在最后一个标记之前结束、在特定标记之后开始的内容，您可以在这一点之后使用RegExp，这不是最优雅的，但如果您的要求是特定的，它可能会起作用。

如果您确实确定，如果您想获取在最后一个标记之前结束、在特定标记之后开始的内容，您可以在这一点之后使用RegExp，这不是最优雅的，但是如果您的要求是特定的，它可能会起作用。

对于您发布的示例，find和nextSibling的组合是有效的

soup = BeautifulSoup(""" <div class="item"> <b> name </b>  <br/>  stuff here </div>""")
soup.find("div", "item").find('br').nextSibling

soup=beautifulsou（““名称
这里的东西”）
soup.find（“div”，“item”）.find（“br”）.nextSibling

对于您发布的示例，find和nextSibling的组合是有效的

soup = BeautifulSoup(""" <div class="item"> <b> name </b>  <br/>  stuff here </div>""")
soup.find("div", "item").find('br').nextSibling

soup=beautifulsou（““名称
这里的东西”）
soup.find（“div”，“item”）.find（“br”）.nextSibling

如果没有帮助，它只会返回包含第一项的列表作为整个内容。如果没有帮助，它只会返回包含第一项的列表作为整个内容。