Python 使用BeautifulSoup导航到第二个字符串文本
这是lxml,它保存为sample.htmlPython 使用BeautifulSoup导航到第二个字符串文本,python,python-2.7,html-parsing,beautifulsoup,text-extraction,Python,Python 2.7,Html Parsing,Beautifulsoup,Text Extraction,这是lxml,它保存为sample.html <html> <body> <div class ="ecopyramid"> <ul id ="producers"> <li class ="producerlist"> <div class ="name">A1</div> <
<html>
<body>
<div class ="ecopyramid">
<ul id ="producers">
<li class ="producerlist">
<div class ="name">A1</div>
<div class ="number">100000</div>
</li>
<li class ="producerlist">
<div class ="name">B1</div>
<div class ="number">100000</div>
</li>
</ul>
<ul id ="primaryconsumers">
<li class ="primaryconsumerlist">
<div class ="name">A2</div>
<div class ="number">1000</div>
</li>
<li class ="primaryconsumerlist">
<div class ="name">B2</div>
<div class ="number">2000</div>
</li>
</ul>
<ul id ="secondaryconsumers">
<li class ="secondaryconsumerlist">
<div class ="name">A3</div>
<div class ="number">100</div>
</li>
<li class ="secondaryconsumerlist">
<div class ="name">B3</div>
<div class ="number">98</div>
</li>
</ul>
<ul id ="tertiaryconsumers">
<li class ="tertiaryconsumerlist">
<div class ="name">A4</div>
<div class ="number">80</div>
</li>
<li class ="tertiaryconsumerlist">
<div class ="name">B4</div>
<div class ="number">50</div>
</li>
</ul>
</body>
</html>
因此,在这段代码中,我能够首先通过标记“ul”和id“secondary consumers”指定文本“A3”的父位置,然后在打印命令中通过“.li.div.string”后缀进一步指定,并输出所需的文本“A3”。我的问题如下:
1) 如何编写代码以调用/打印本例中的文本“B3”
2) 在本例中,如何编写代码以调用/打印文本“98”(在“B3”下面)
我尝试了很多事情,但都没有成功,我可以通过导航调用第一个文本对象,但不能调用共享标记中的第二个文本对象
有什么想法吗?您可以使用以下方法获取姓名和号码:
names = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.name')
numbers = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.number')
print [name.text for name in names]
print [number.text for number in numbers]
印刷品:
[u'A3', u'B3']
[u'100', u'98']
注释中后续问题的示例代码:
from bs4 import BeautifulSoup
data = """
<div class="span9">
<table class="result-data table" border="0">
<tbody>
<tr class="result-item highlighting">
<td class="result-category" scope="row">Name:</td>
<td class="result-value-bold" colspan="4" itemprop="item">
Robin Hood
</td>
</tr>
</tbody>
</table>
</div>
"""
soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)
谢谢Alecxe,这是一个后续问题,在您的print命令中,它使用了for循环,是否可以只打印,比如说,第二个,所以它只打印“98”?@KubiK888当然,您只想从上面的html中获得
98
,对吗?它是否总是在第二个li
类secondaryconsumerlist
中?谢谢。两个部分的问题:1)是的,它总是在第二个“li”中,带有“secondaryconsumerlist”类,我如何打印/返回该值而不是全部?2) 我曾尝试在真实站点中应用CSS选择器方法,但结果发现类或id属性中有空格和连字符,例如…
部分。姓名:罗宾汉
from bs4 import BeautifulSoup
data = """
<div class="span9">
<table class="result-data table" border="0">
<tbody>
<tr class="result-item highlighting">
<td class="result-category" scope="row">Name:</td>
<td class="result-value-bold" colspan="4" itemprop="item">
Robin Hood
</td>
</tr>
</tbody>
</table>
</div>
"""
soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)
table = soup.find('table', class_='result-data')
tr = table.find('tr', class_='result-item')
print tr.find('td', class_="result-value-bold").get_text(strip=True)