Python 美丽的汤让人心满意足

Python 美丽的汤让人心满意足,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我已经解析了html页面:使用beautifulsoup badges = soup.body.find('div', attrs={'class': 'col-md-11'}) 在此之后,我的徽章对象如下所示: <div class="col-md-11"> <h4> <span class="fas fa-user-circle padding-right-sm text-green"></span><span cla

我已经解析了html页面:使用beautifulsoup

badges = soup.body.find('div', attrs={'class': 'col-md-11'})
在此之后,我的
徽章
对象如下所示:

<div class="col-md-11">
   <h4>
      <span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
      <font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
      <span style="color:green;font-weight:bold;"> [activ]</span>
   </h4>
   <p>
      <span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
   </p>
   <p>
      <span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
      <span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
   </p>
</div>

阿沃卡特定义
NEDELCU Paul Iulian,Baroul Dolj
[行动]

Sediu校长n Baroul Dolj,地址:mun.Craiova,Mihail kogălniceanu街16号,jud.Dolj,电话:。

保罗_iulyan@yahoo.com

现在,我想摘录NEDELCU Paul Iulian,Baroul Dolj[activ],Sediu principalîn Baroul Dolj


我试着使用
徽章.span.span
但是没有用。

使用
汤。查找

演示:

from bs4 import BeautifulSoup
s = """<div class="col-md-11">
   <h4>
      <span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
      <font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
      <span style="color:green;font-weight:bold;"> [activ]</span>
   </h4>
   <p>
      <span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
   </p>
   <p>
      <span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
      <span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
   </p>
</div>"""

soup = BeautifulSoup(s, "html.parser")
val = soup.find("font", {"style":"font-weight:bold;"})
print( "{} {}".format(val.text, val.next_sibling ).strip() )
print( soup.find("span", {"style":"color:green;font-weight:bold;"}).text.strip() )
print( soup.find("span", class_="fas fa-map-marker text-red padding-right-sm").next_sibling.strip() )
print( soup.find("span", class_="text-nowrap").text.strip() )
NEDELCU Paul-Iulian , Baroul Dolj
[activ]
Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
paul_iulyan@yahoo.com

使用
汤。查找

演示:

from bs4 import BeautifulSoup
s = """<div class="col-md-11">
   <h4>
      <span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
      <font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
      <span style="color:green;font-weight:bold;"> [activ]</span>
   </h4>
   <p>
      <span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
   </p>
   <p>
      <span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
      <span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
   </p>
</div>"""

soup = BeautifulSoup(s, "html.parser")
val = soup.find("font", {"style":"font-weight:bold;"})
print( "{} {}".format(val.text, val.next_sibling ).strip() )
print( soup.find("span", {"style":"color:green;font-weight:bold;"}).text.strip() )
print( soup.find("span", class_="fas fa-map-marker text-red padding-right-sm").next_sibling.strip() )
print( soup.find("span", class_="text-nowrap").text.strip() )
NEDELCU Paul-Iulian , Baroul Dolj
[activ]
Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
paul_iulyan@yahoo.com

使用单一
汤优化解决方案。选择
方法:

for el in badges.select('h4 font, h4 span:nth-of-type(3), p:nth-of-type(1), p:nth-of-type(2) > span.text-nowrap'):
    if el.name == 'font':
        result.extend([el.text.strip(), el.nextSibling.strip()])
    else:
        result.append(el.text.strip())

print(result)
输出(格式化):


使用单一
汤优化解决方案。选择
方法:

for el in badges.select('h4 font, h4 span:nth-of-type(3), p:nth-of-type(1), p:nth-of-type(2) > span.text-nowrap'):
    if el.name == 'font':
        result.extend([el.text.strip(), el.nextSibling.strip()])
    else:
        result.append(el.text.strip())

print(result)
输出(格式化):


你总是能想出不同的主意。提供了一个加号。谢谢。你总是能想出不同的主意。提供了一个加号。谢谢。我想
[activ]
字符串也应该出现在结果中(如问题所示)。我想
[activ]
字符串也应该出现在结果中(如问题所示)