Python 使用lxml生成器进行非递归查找

Python 使用lxml生成器进行非递归查找,python,python-2.7,parsing,beautifulsoup,lxml,Python,Python 2.7,Parsing,Beautifulsoup,Lxml,我发现在Python2.7中,如果我使用lxmlbuilder,我无法执行非递归的bs4.BeautifulSoup.find_all 以以下示例HTML代码段为例: <p> <b> Cats </b> are interesting creatures </p> <p> <b> Dogs </b> are cool too </p> <div> <p> <b>

我发现在Python2.7中,如果我使用
lxml
builder,我无法执行非递归的
bs4.BeautifulSoup.find_all

以以下示例HTML代码段为例:

<p> <b> Cats </b> are interesting creatures </p>

<p> <b> Dogs </b> are cool too </p>

<div>
<p> <b> Penguins </b> are pretty neat, but they're inside a div </p>
</div>

<p> <b> Llamas </b> don't live in New York </p>
正常使用
find_all
时,它们都能正确执行:

>>> a.find_all("p")
[<p> <b> Cats </b> are interesting creatures </p>, <p> <b> Dogs </b> are cool too </p>, <p> <b> Penguins </b> are pretty neat, but they're inside a div </p>, <p> <b> Llamas </b> don't live in New York </p>]
>>> b.find_all("p")
[<p> <b> Cats </b> are interesting creatures </p>, <p> <b> Dogs </b> are cool too </p>, <p> <b> Penguins </b> are pretty neat, but they're inside a div </p>, <p> <b> Llamas </b> don't live in New York </p>]

为什么会这样?这是一个错误,还是我做错了什么?
lxml
builder是否支持非递归的
find_all

这是因为
lxml
解析器会将HTML代码放入
HTML/body
中,如果它不存在:

>>> b = bs4.BeautifulSoup(html, "lxml")
>>> print(b)
<html><body><p> <b> Cats </b> are interesting creatures </p>
<p> <b> Dogs </b> are cool too </p>
<div>
<p> <b> Penguins </b> are pretty neat, but they're inside a div </p>
</div>
<p> <b> Llamas </b> don't live in New York </p>
</body></html>

这在我看来似乎是不一致的,为什么不同的解析器在这种方式下的行为会不同呢?@LukeTaylor我同意,这可能会令人困惑。文件段落中有一些关于这方面的信息。这一切都归结为不同的解析器使非良好格式的HTML成为有效的—它们只是做得不同而已。
>>> a.find_all("p", recursive=False)
[<p> <b> Cats </b> are interesting creatures </p>, <p> <b> Dogs </b> are cool too </p>, <p> <b> Llamas </b> don't live in New York </p>]
>>> b.find_all("p", recursive=False)
[]
>>> b = bs4.BeautifulSoup(html, "lxml")
>>> print(b)
<html><body><p> <b> Cats </b> are interesting creatures </p>
<p> <b> Dogs </b> are cool too </p>
<div>
<p> <b> Penguins </b> are pretty neat, but they're inside a div </p>
</div>
<p> <b> Llamas </b> don't live in New York </p>
</body></html>
>>> print(b.find_all("p", recursive=False))
[]
>>> print(b.body.find_all("p", recursive=False))
[<p> <b> Cats </b> are interesting creatures </p>, <p> <b> Dogs </b> are cool too </p>, <p> <b> Llamas </b> don't live in New York </p>]