Python Beautifulsoup AttributeError:&x27;列表';对象没有属性';文本';
我有以下html代码:Python Beautifulsoup AttributeError:&x27;列表';对象没有属性';文本';,python,beautifulsoup,Python,Beautifulsoup,我有以下html代码: <div> <span class="test"> <span class="f1"> 5 times </span> </span> </span> </div> <div> </div> <div> <span class="test"> <sp
<div>
<span class="test">
<span class="f1">
5 times
</span>
</span>
</span>
</div>
<div>
</div>
<div>
<span class="test">
<span class="f1">
6 times
</span>
</span>
</span>
</div>
Python代码工作:
x=soup.select('.f1')
print(x)
提供以下信息:
[]
[]
[]
[]
[<span class="f1"> 19 times</span>]
[<span class="f1"> 12 times</span>]
[<span class="f1"> 6 times</span>]
[]
[]
[]
[<span class="f1"> 6 times</span>]
[<span class="f1"> 1 time</span>]
[<span class="f1"> 11 times</span>]
[]
[]
[]
[]
[19次]
[12次]
[6次]
[]
[]
[]
[6次]
[1次]
[11次]
但是
print(x.prettify)
抛出上述错误。我基本上是在尝试获取所有实例的span标记之间的文本,无时为空,可用时为字符串。我建议您使用.findAll
方法并在匹配的span上循环
from bs4 import BeautifulSoup
html = '''<div>
<span class="test">
<span class="f1">
5 times
</span>
</span>
</span>
</div>
<div>
</div>
<div>
<span class="test">
<span class="f1">
6 times
</span>
</span>
</span>
</div>'''
soup = BeautifulSoup(html, 'html.parser')
aaa = soup.find_all('span', attrs={'class':'f1'})
for i in aaa:
print(i.text)
例如:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for span in soup.findAll("span", class_="f1"):
if span.text.isspace():
continue
else:
print(span.text)
.isspace()
方法正在检查字符串是否为空(检查字符串是否为真在这里不起作用,因为空的html跨多个空格)。select()
返回结果列表,而不管结果是否有0项。由于列表
对象没有文本
属性,因此它为您提供了属性错误
同样,prettify()
是为了使html更具可读性,而不是解释列表的方法
如果您要做的只是提取可用的文本
s:
texts = [''.join(i.stripped_strings) for i in x if i]
# ['5 times', '6 times']
这将删除字符串中所有多余的空格/换行符,并只提供裸文本。最后一个if i
表示如果i
不是None
只返回文本
如果您确实关心空格/换行符,请执行以下操作:
texts = [i.text for i in x if i]
# ['\n 5 times\n ', '\n 6 times\n ']
它不应该抛出:AttributeError:'list'对象没有属性'prettify'?
texts = [''.join(i.stripped_strings) for i in x if i]
# ['5 times', '6 times']
texts = [i.text for i in x if i]
# ['\n 5 times\n ', '\n 6 times\n ']