Python 使用BeautifulSoup导航到第二个字符串文本

Python 使用BeautifulSoup导航到第二个字符串文本,python,python-2.7,html-parsing,beautifulsoup,text-extraction,Python,Python 2.7,Html Parsing,Beautifulsoup,Text Extraction,这是lxml,它保存为sample.html <html> <body> <div class ="ecopyramid"> <ul id ="producers"> <li class ="producerlist"> <div class ="name">A1</div> <

这是lxml,它保存为sample.html

<html> 
    <body> 
    <div class ="ecopyramid"> 
        <ul id ="producers"> 
            <li class ="producerlist"> 
                <div class ="name">A1</div> 
                <div class ="number">100000</div> 
            </li> 
            <li class ="producerlist"> 
                <div class ="name">B1</div> 
                <div class ="number">100000</div> 
            </li> 
        </ul> 
        <ul id ="primaryconsumers"> 
            <li class ="primaryconsumerlist"> 
                <div class ="name">A2</div> 
                <div class ="number">1000</div> 
            </li> 
            <li class ="primaryconsumerlist"> 
                <div class ="name">B2</div> 
                <div class ="number">2000</div> 
            </li> 
        </ul> 
        <ul id ="secondaryconsumers"> 
            <li class ="secondaryconsumerlist"> 
                <div class ="name">A3</div> 
                <div class ="number">100</div> 
            </li>

            <li class ="secondaryconsumerlist"> 
                <div class ="name">B3</div> 
                <div class ="number">98</div>
            </li> 
        </ul> 
        <ul id ="tertiaryconsumers"> 
            <li class ="tertiaryconsumerlist"> 
                <div class ="name">A4</div> 
                <div class ="number">80</div> 
            </li> 
            <li class ="tertiaryconsumerlist"> 
                <div class ="name">B4</div> 
                <div class ="number">50</div> 
            </li> 
        </ul> 
    </body> 
</html>
因此,在这段代码中,我能够首先通过标记“ul”和id“secondary consumers”指定文本“A3”的父位置,然后在打印命令中通过“.li.div.string”后缀进一步指定,并输出所需的文本“A3”。我的问题如下:

1) 如何编写代码以调用/打印本例中的文本“B3”

2) 在本例中,如何编写代码以调用/打印文本“98”(在“B3”下面)

我尝试了很多事情,但都没有成功,我可以通过导航调用第一个文本对象,但不能调用共享标记中的第二个文本对象

有什么想法吗?

您可以使用以下方法获取姓名和号码:

names = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.name')
numbers = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.number')

print [name.text for name in names]
print [number.text for number in numbers]
印刷品:

[u'A3', u'B3']
[u'100', u'98']

注释中后续问题的示例代码:

from bs4 import BeautifulSoup


data = """
<div class="span9">
    <table class="result-data table" border="0">
        <tbody>
        <tr class="result-item highlighting">
            <td class="result-category" scope="row">Name:</td>
            <td class="result-value-bold" colspan="4" itemprop="item">
                Robin Hood
            </td>
        </tr>
        </tbody>
    </table>
</div>
"""

soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)

谢谢Alecxe,这是一个后续问题,在您的print命令中,它使用了for循环,是否可以只打印,比如说,第二个,所以它只打印“98”?@KubiK888当然,您只想从上面的html中获得
98
,对吗?它是否总是在第二个
li
secondaryconsumerlist
中?谢谢。两个部分的问题:1)是的,它总是在第二个“li”中,带有“secondaryconsumerlist”类,我如何打印/返回该值而不是全部?2) 我曾尝试在真实站点中应用CSS选择器方法,但结果发现类或id属性中有空格和连字符,例如
  • ,是否有方法修改BeautifulSoup中的CSS选择器,我试图通读您提到的文档链接,但找不到任何解决方案。@KubiK888它肯定是可以解决的。只是想澄清一下,你能从真实的网站上提供html吗?只有
    部分。姓名:罗宾汉
    from bs4 import BeautifulSoup
    
    
    data = """
    <div class="span9">
        <table class="result-data table" border="0">
            <tbody>
            <tr class="result-item highlighting">
                <td class="result-category" scope="row">Name:</td>
                <td class="result-value-bold" colspan="4" itemprop="item">
                    Robin Hood
                </td>
            </tr>
            </tbody>
        </table>
    </div>
    """
    
    soup = BeautifulSoup(data)
    print soup.find('td', class_="result-value-bold").get_text(strip=True)
    
    table = soup.find('table', class_='result-data')
    tr = table.find('tr', class_='result-item')
    print tr.find('td', class_="result-value-bold").get_text(strip=True)