Python 使用beautifulsoup从嵌套的span标记中获取文本_Python_Beautifulsoup

Python 使用beautifulsoup从嵌套的span标记中获取文本

python

Python 使用beautifulsoup从嵌套的span标记中获取文本,python,beautifulsoup,Python,Beautifulsoup,我有一个html文件： ... <span class="value">401<span class="Suffix">st</span></span> ... 输出包含内部跨距文本（st）。使用。这是一个@属性，它提供单个字符串元素的生成器.text或。get_text是将这些字符串连接在一起的东西 >>> soup = bs4.BeautifulSoup('<span class

我有一个

html

文件：

...
<span class="value">401<span class="Suffix">st</span></span>
...

输出包含内部跨距文本（

st

）。

使用。这是一个

@属性

，它提供单个字符串元素的生成器

.text

或

。get_text

是将这些字符串连接在一起的东西

>>> soup = bs4.BeautifulSoup('<span class="value">401<span class="Suffix">st</span></span>')
>>> t = soup.find(class_='value')
>>> next(t.strings)
'401'
>>> list(t.strings)
['401', 'st']

>soup=bs4.BeautifulSoup（'401st'）
>>>t=soup.find（class='value'）
>>>下一步（t.strings）
'401'
>>>列表（t.strings）
['401'，'st']

您需要在

美化组中使用recursive=False

<span class="value">401<span class="Suffix">st</span></span>
soup = BeautifulSoup(html)

all_parent_p = soup.find_all('p', recursive=False)
for parent_p in all_parent_p:
   ptext = parent_p.find(text=True, recursive=False)

401
soup=BeautifulSoup（html）
all\u parent\u p=soup.find\u all（'p'，recursive=False）
对于所有父项中的父项：
ptext=parent\u p.find（text=True，recursive=False）
这将很好地工作：
from bs4 import BeautifulSoup
s = '''<span class="value">401<span class="Suffix">st</span></span>'''
soup = BeautifulSoup(s, 'html.parser')
get_text = soup.find(class_='value')
print(get_text.contents[0])

from bs4 import BeautifulSoup
s = '''<span class="value">401<span class="Suffix">st</span></span>'''
soup = BeautifulSoup(s, 'html.parser')
get_text = soup.find(class_='value')
print(get_text.contents[0])

401