Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在一个有靓汤的div里面选择一个div类?_Python_Beautifulsoup - Fatal编程技术网

Python 如何在一个有靓汤的div里面选择一个div类?

Python 如何在一个有靓汤的div里面选择一个div类?,python,beautifulsoup,Python,Beautifulsoup,我在div标签中有一堆div标签: <div class="foo"> <div class="bar">I want this</div> <div class="unwanted">Not this</div> </div> <div class="bar">Don't want this either </div> 或者,我尝试: from bs4 import Beaut

我在div标签中有一堆div标签:

<div class="foo">
     <div class="bar">I want this</div>
     <div class="unwanted">Not this</div>
</div>
<div class="bar">Don't want this either
</div>
或者,我尝试:

from bs4 import BeautifulSoup
soup = BeautifulSoup(open(r'C:\test.htm'))
for each_div in soup.findAll('div',{'class':'foo'}):
     print(each_div.findAll('div',{'class':'bar'})).encode("utf-8")
我做错了什么?如果我可以从选择中删除div类“不需要的”,那么我只需要一个简单的打印(每个div)就可以了。

您可以使用
find\u all()
搜索每个
元素,使用
foo
作为属性,对于每个元素,使用
find()
作为属性,例如:

from bs4 import BeautifulSoup
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
    bar = foo.find('div', attrs={'class': 'bar'})
    print(bar.text)
像这样运行:

python3 script.py htmlfile
这将产生:

I want this

更新:假设可能存在多个具有
bar
属性的
元素,以前的脚本将无法工作。它只会找到第一个。但您可以获取它们的后代并对其进行迭代,如:

from bs4 import BeautifulSoup
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
    foo_descendants = foo.descendants
    for d in foo_descendants:
        if d.name == 'div' and d.get('class', '') == ['bar']:
            print(d.text)
输入如下:

<div class="foo">
     <div class="bar">I want this</div>
     <div class="unwanted">Not this</div>
     <div class="bar">Also want this</div>
</div>
<div class="foo">
     <div class="bar">I want this</div>
     <div class="unwanted">Not this</div>
     <div class="bar">Also want this</div>
</div>
I want this
Also want this