Python 使用BeautifulSoup提取数据
我需要从文件中提取“7秒前结束”:Python 使用BeautifulSoup提取数据,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我需要从文件中提取“7秒前结束”: <div class="featured__columns"> <div class="featured__column"><i style="color:rgb(149,213,230);" class="fa fa-clock-o"></i> <span title="Today, 11:49am">Ended 7
<div class="featured__columns">
<div class="featured__column"><i style="color:rgb(149,213,230);" class="fa fa-clock-o"></i> <span title="Today, 11:49am">Ended 7 seconds ago</span></div>
<div class="featured__column featured__column--width-fill text-right"><span title="March 7, 2016, 10:50am">2 days ago</span> by <a style="color:rgb(149,213,230);" href="/user/Eclipsy">Eclipsy</a></div><a href="/user/Eclipsy" class="global__image-outer-wrap global__image-outer-wrap--avatar-small">
<div class="global__image-inner-wrap" style="background-image:url(https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/dc/dc5b8424bd5d17e13dcfe613689921dfc29f4574_medium.jpg);"></div>
</a>
</div>
很好,但我觉得我的方法很愚蠢。有不同的方法提取数据?您可以尝试以下方法
f_c = soup.find_all('div', class='featured__columns')[0]
print f_c.find('div', class='featured__column').span.get_text()
类似地,如果有多个
div
标记具有class特征列
,则您可以在其中循环并获取数据。您想要的范围位于第一个特征列
div
:
from bs4 import BeautifulSoup
html ="""<div class="featured__columns">
<div class="featured__column"><i style="color:rgb(149,213,230);" class="fa fa-clock-o"></i> <span title="Today, 11:49am">Ended 7 seconds ago</span></div>
<div class="featured__column featured__column--width-fill text-right"><span title="March 7, 2016, 10:50am">2 days ago</span> by <a style="color:rgb(149,213,230);" href="/user/Eclipsy">Eclipsy</a></div><a href="/user/Eclipsy" class="global__image-outer-wrap global__image-outer-wrap--avatar-small">
<div class="global__image-inner-wrap" style="background-image:url(https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/dc/dc5b8424bd5d17e13dcfe613689921dfc29f4574_medium.jpg);"></div>
</a>
</div>"""
print(BeautifulSoup(html).select("div.featured__column span")[0].text)
Ended 7 seconds ago
最后,要准确地复制您自己的逻辑,只获取第一个跨html,而不考虑类等。。您可以简化为:
BeautifulSoup(html).select("span:nth-of-type(1)")[0].text
BeautifulSoup(html).find("span").text
In [53]: BeautifulSoup(html).select("div.featured__column span")
Out[53]:
[<span title="Today, 11:49am">Ended 7 seconds ago</span>,
<span title="March 7, 2016, 10:50am">2 days ago</span>]
In [54]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(1)")
Out[54]: [<span title="Today, 11:49am">Ended 7 seconds ago</span>]
In [55]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(2)")
Out[55]: [<span title="March 7, 2016, 10:50am">2 days ago</span>]
In [56]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(2)")[0].text
Out[56]: u'2 days ago'
In [57]: BeautifulSoup(html).select("div.featured__column span:nth-of-type(1)")[0].text
Out[57]: u'Ended 7 seconds ago'
In [70]: BeautifulSoup(html).select("i.fa.fa-clock-o + span")
Out[70]: [<span title="Today, 11:49am">Ended 7 seconds ago</span>]
In [71]: BeautifulSoup(html).select("i.fa.fa-clock-o + span")[0].text
Out[71]: u'Ended 7 seconds ago'
BeautifulSoup(html).select("span:nth-of-type(1)")[0].text
BeautifulSoup(html).find("span").text