Python BeautifulSoup:获取课堂文本
假设以下代码:Python BeautifulSoup:获取课堂文本,python,beautifulsoup,Python,Beautifulsoup,假设以下代码: for data in soup.findAll('div',{'class':'value'}): print(data) 提供以下输出: <div class="value"> <p class="name">Michael Jordan</p> </div> <div class="value"> <p class="team">Real Madrid</p> </div
for data in soup.findAll('div',{'class':'value'}):
print(data)
提供以下输出:
<div class="value">
<p class="name">Michael Jordan</p>
</div>
<div class="value">
<p class="team">Real Madrid</p>
</div>
<div class="value">
<p class="Sport">Ping Pong</p>
</div>
我可以使用
数据获取文本。text
但是如何获取类的文本才能命名字典的键(Person[key1],Person[key2]…)?您可以使用以下方法:
content = '''
<div class="value">
<p class="name">Michael Jordan</p>
</div>
<div class="value">
<p class="team">Real Madrid</p>
</div>
<div class="value">
<p class="Sport">Ping Pong</p>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(content)
person = {}
for div in soup.findAll('div', {'class': 'value'}):
person[div.find('p').attrs['class'][0]] = div.text.strip()
print(person)
你可以这样做:
for data in soup.findAll('div',{'class':'value'}):
person = {}
for item in data.find_all('div'):
attr = item.p.attrs.get("class")[0]
value = item.p.text
person[attr] = value
print person
使用此代码段
soup = <div class="value">
<p class="Sport other-name-class other">Ping Pong</p>
</div>
p = soup.find('div.value p')
或
两者都返回一个包含所有类名的数组,如下面的['Sport','other name class','other']
由于您的输出是一个有效的xml,您可以将其视为xml并获取所需的值
for data in soup.findAll('div',{'class':'value'}):
person = {}
for item in data.find_all('div'):
attr = item.p.attrs.get("class")[0]
value = item.p.text
person[attr] = value
print person
soup = <div class="value">
<p class="Sport other-name-class other">Ping Pong</p>
</div>
p = soup.find('div.value p')
p.get_attribute_list('class')
p.attrs['class']