Python 如何使用beautifulsoup获得span中的多个类?

Python 如何使用beautifulsoup获得span中的多个类?,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图用beautifulsoup在中获得span类 HTML有点像这样 ... <p class="card-list"> <span class="span1 class1"></span> <span class="span2 class2"></span> <span class="span3 class3"></span> <span class="span4 class4"></sp

我试图用beautifulsoup在
中获得span类

HTML有点像这样

...
<p class="card-list">
<span class="span1 class1"></span>
<span class="span2 class2"></span>
<span class="span3 class3"></span>
<span class="span4 class4"></span>
</p>
我得到的结果是

classes = ["span1", "class1", "span2", "class2","span3", class3","span4", "class4"]
我想要的是

classes = ["span1 class1", "span2 class2","span3 class3","span4 class4"]
还有其他的
。我只需要

标签中的
类。

请尝试以下方法:

cards = """
<p class="card-list">
<span class="span1 class1"></span>
<span class="span2 class2"></span>
<span class="span3 class3"></span>
<span class="span4 class4"></span>
</p>
"""
from bs4 import BeautifulSoup as bs
soup = bs(cards,'lxml')
classes = []
for c in soup.select('span'):
    elem = ' '.join(map(str, c['class'])) 
    classes.append(elem)
print(classes)
cards=”“”

""" 从bs4导入BeautifulSoup作为bs 汤=bs(卡片,'lxml') 类别=[] 对于汤中的c。选择('span'): 元素=''.join(映射(str,c['class'])) class.append(elem) 打印(类)
输出:

['span1 class1','span2 class2','span3 class3','span4 class4']


请尝试以下代码

from bs4 import BeautifulSoup
html = """
<p class="card-list">
<span class="span1 class1"></span>
<span class="span2 class2"></span>
<span class="span3 class3"></span>
<span class="span4 class4"></span>
</p>
"""

soup = BeautifulSoup(html,'html.parser')
allclasses = []
for item in soup.find('p',class_='card-list').find_all('span'):
    classes=' '.join(item.attrs['class'])
    allclasses.append(classes)
print(allclasses)

已更新

allclasses = []
for item in soup.select("p[class='contact-info '] span[class]"):
    classes=' '.join(item.attrs['class'])
    allclasses.append(classes)
print(allclasses)

我通过创建一个列表并附加每个
span
class
,然后用
'.join(listname)

后来,我将该列表添加到另一个列表中。

使用库SimplifiedDoc的解决方案

from simplified_scrapy import SimplifiedDoc,req,utils
html='''
<li class="card-list">
<p class="card-info">
<span class="span1 class1"></span>
<span class="span2 class2"></span>
<span class="span3 class3"></span>
<span class="span4 class4"></span>
</p>
</li>'''
doc = SimplifiedDoc(html)
classes = doc.selects('li.card-list').select('p.card-info').selects('span>class()')
print (classes)

KeyError:'class'on classes=''。join(item.attrs['class'])这不应该发生。你能分享你的url吗?@Ganesh:尝试更新的解决方案,并让我知道状态。url是shorturl。at/mABES,我正试图从

错误属性中获取Span class。错误属性错误:“NoneType”对象对于汤中的项目没有“find_all”属性。find('p',class='card-list')。查找所有('span'):到底是什么问题?您进行过任何调试吗?

allclasses = []
for item in soup.select("p[class='contact-info '] span[class]"):
    classes=' '.join(item.attrs['class'])
    allclasses.append(classes)
print(allclasses)
from simplified_scrapy import SimplifiedDoc,req,utils
html='''
<li class="card-list">
<p class="card-info">
<span class="span1 class1"></span>
<span class="span2 class2"></span>
<span class="span3 class3"></span>
<span class="span4 class4"></span>
</p>
</li>'''
doc = SimplifiedDoc(html)
classes = doc.selects('li.card-list').select('p.card-info').selects('span>class()')
print (classes)
[['span1 class1', 'span2 class2', 'span3 class3', 'span4 class4']]