Python beautifulsoup get_text（）返回空字符串_Python_String_Text_Beautifulsoup

Python beautifulsoup get_text（）返回空字符串

python string text

Python beautifulsoup get_text（）返回空字符串,python,string,text,beautifulsoup,Python,String,Text,Beautifulsoup,我试图从beautifulsoup标记（）中获取一些文本，如下所示： area = <td class="classified-table__data"> 134 <span class="abbreviation"> <span aria-hidden="true"> m² <

我试图从beautifulsoup标记（

）中获取一些文本，如下所示：

area = <td class="classified-table__data">
         134
         <span class="abbreviation">
             <span aria-hidden="true">
                m²
             </span>
             <span class="sr-only">
                mètres carrés
             </span>
         </span>
       </td>

区域=
134
平方米
梅特雷斯·卡尔

我想提取值“134”。或者，如果不可能，我也可以提取“134平方米”或“134平方米”

当我使用
```
.get_text（separator=”“）
```
时，它返回一个空字符串

当我使用

.string.strip（）

时，它返回一个错误：

AttributeError:'NoneType'对象没有属性'strip'

当我使用
```
.strings
```
时，它返回：

我有点迷路了，不知道还能做什么。我的错误在哪里？

来自bs4导入美化组
html=”“”
134
平方米
梅特雷斯·卡尔
"""
soup=BeautifulSoup（html，“html.parser”）
如果汤。查找（“span”）：
soup.find（“span”）.decompose（）
打印（soup.text.strip（））

输出：

或者，如果不希望删除span元素内的文本，请执行以下操作：

result=”“.join（soup.text.split（））
打印（结果）

输出：

134m²mètres carrés

导入
import re
from bs4 import BeautifulSoup
    
area = '''
    <td class="classified-table__data">
        134
        <span class="abbreviation">
            <span aria-hidden="true">
                m²
            </span>
            <span class="sr-only">
                mètres carrés
            </span>
        </span>
    </td>'''
    
soup = BeautifulSoup(area, 'html.parser')
text = soup.find_all('td', {"class": "classified-table__data"})
text = text[0].get_text().strip()
print(text)
'134\n        \n\n                m²\n            \n\n                mètres carrés'

text_split = re.split('\s+', text)
text_split
['134', 'm²', 'mètres', 'carrés']

text_split[0]
'134'

' '.join(text_split[:2])
'134 m²'

从bs4导入BeautifulSoup
面积='''
134
平方米
梅特雷斯·卡尔
'''
soup=BeautifulSoup（区域'html.parser'）
text=soup.find_all（'td'，{“class”：“分类表数据”}）
text=文本[0]。获取文本（）.strip（）
打印（文本）
'134\n\n\n m²\n\n\n mètres carrés'
text_split=re.split（'\s+'，text）
文本分割
['134'、'm²'、'mètres'、'carrés']
文本分割[0]
'134'
''.join（文本分割[：2]）
“134平方米”

您可以使用

.find（text=True）

获取第一个文本：

from bs4 import BeautifulSoup

html_doc = """
    <td class="classified-table__data">
         134
         <span class="abbreviation">
             <span aria-hidden="true">
                m²
             </span>
             <span class="sr-only">
                mètres carrés
             </span>
         </span>
    </td>
"""

soup = BeautifulSoup(html_doc, "html.parser")

v = soup.select_one(".classified-table__data").find(text=True)
print(v.strip())

或：

.contents[0]

v = soup.select_one(".classified-table__data").contents[0]
print(v.strip())

v = soup.select_one(".classified-table__data").contents[0]
print(v.strip())