Python XPath或BeautifulSoup(或其他方式)来选择和解析某个div块

Python XPath或BeautifulSoup(或其他方式)来选择和解析某个div块,python,parsing,xpath,beautifulsoup,Python,Parsing,Xpath,Beautifulsoup,如果有一个页面具有类似的div块,但其中一些块具有附加块div,我只需要从具有附加块的div中提取数据。如何仅过滤出所需的div块 例如,我需要[div class='level_33']中的数据,但前提是[div class='level_1']包含[div class='level_special']。换句话说,我如何设置这样一个条件:如果[div class='level_33']属于包含[div class='level_1']的[div class='level_1'],则从[div

如果有一个页面具有类似的div块,但其中一些块具有附加块div,我只需要从具有附加块的div中提取数据。如何仅过滤出所需的div块

例如,我需要[div class='level_33']中的数据,但前提是[div class='level_1']包含[div class='level_special']。换句话说,我如何设置这样一个条件:如果[div class='level_33']属于包含[div class='level_1']的[div class='level_1'],则从[div class='level_33']获取数据


尝试以下XPath

//div[@class='level_1'][.//div[@class='level_special']]//div[@class='level_33']


XPath是一个很好的解决方案。这是另一个解决方案。也许它也能解决你的问题

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html='''<div class = 'level_1'>
      <div class = 'level_2'>
           <div class = 'level_31'></div>
           <div class = 'level_32'></div>
           <div class = 'level_33'></div>
           <div class = 'level_special'></div>
       </div>
    </div>   

    <div class = 'level_1'>
      <div class = 'level_2'>
           <div class = 'level_31'></div>
           <div class = 'level_32'></div>
           <div class = 'level_33'></div>
      </div>
    </div>   
'''
doc = SimplifiedDoc(html)
div = doc.getElementByClass('level_special',start='level_1')
if div:
  div = div.parent.getElementByClass('level_33')
  print (div) # {'class': 'level_33', 'tag': 'div', 'html': ''}
from simplified_scrapy.simplified_doc import SimplifiedDoc 
html='''<div class = 'level_1'>
      <div class = 'level_2'>
           <div class = 'level_31'></div>
           <div class = 'level_32'></div>
           <div class = 'level_33'></div>
           <div class = 'level_special'></div>
       </div>
    </div>   

    <div class = 'level_1'>
      <div class = 'level_2'>
           <div class = 'level_31'></div>
           <div class = 'level_32'></div>
           <div class = 'level_33'></div>
      </div>
    </div>   
'''
doc = SimplifiedDoc(html)
div = doc.getElementByClass('level_special',start='level_1')
if div:
  div = div.parent.getElementByClass('level_33')
  print (div) # {'class': 'level_33', 'tag': 'div', 'html': ''}