Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sqlite/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup:获取课堂文本_Python_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup:获取课堂文本

Python BeautifulSoup:获取课堂文本,python,beautifulsoup,Python,Beautifulsoup,假设以下代码: for data in soup.findAll('div',{'class':'value'}): print(data) 提供以下输出: <div class="value"> <p class="name">Michael Jordan</p> </div> <div class="value"> <p class="team">Real Madrid</p> </div

假设以下代码:

for data in soup.findAll('div',{'class':'value'}):
    print(data)
提供以下输出:

<div class="value">
<p class="name">Michael Jordan</p>
</div>


<div class="value">
<p class="team">Real Madrid</p>
</div>


<div class="value">
<p class="Sport">Ping Pong</p>
</div>

我可以使用
数据获取文本。text
但是如何获取
类的文本才能命名字典的
键(Person[key1],Person[key2]…)?

您可以使用以下方法:

content = '''
<div class="value">
<p class="name">Michael Jordan</p>
</div>

<div class="value">
<p class="team">Real Madrid</p>
</div>

<div class="value">
<p class="Sport">Ping Pong</p>
</div>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(content)

person = {}

for div in soup.findAll('div', {'class': 'value'}):
    person[div.find('p').attrs['class'][0]] = div.text.strip()

print(person)

你可以这样做:

for data in soup.findAll('div',{'class':'value'}):
    person = {}
    for item in data.find_all('div'):
        attr = item.p.attrs.get("class")[0]
        value = item.p.text
        person[attr] = value

    print person
使用此代码段

soup = <div class="value">
        <p class="Sport other-name-class other">Ping Pong</p>
       </div>

p =  soup.find('div.value p')


两者都返回一个包含所有类名的数组,如下面的
['Sport','other name class','other']

由于您的输出是一个有效的xml,您可以将其视为xml并获取所需的值
for data in soup.findAll('div',{'class':'value'}):
    person = {}
    for item in data.find_all('div'):
        attr = item.p.attrs.get("class")[0]
        value = item.p.text
        person[attr] = value

    print person
soup = <div class="value">
        <p class="Sport other-name-class other">Ping Pong</p>
       </div>

p =  soup.find('div.value p')
p.get_attribute_list('class')
p.attrs['class']