Python 2.7 BeautifulSoup中的findAll（）缺少节点_Python 2.7_Beautifulsoup_Findall

Python 2.7 BeautifulSoup中的findAll（）缺少节点

python-2.7

Python 2.7 BeautifulSoup中的findAll（）缺少节点,python-2.7,beautifulsoup,findall,Python 2.7,Beautifulsoup,Findall,BeautifulSoup中的方法findAll（）不会返回XML中的所有元素。如果查看下面的代码并打开URL，可以看到XML中有10个PubmedArticle节点。然而findAll方法只能找到其中的6个。输出上只有6个*而不是10个。我做错了什么 import urllib2 from bs4 import BeautifulSoup URL = 'http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&re

BeautifulSoup中的方法findAll（）不会返回XML中的所有元素。如果查看下面的代码并打开URL，可以看到XML中有10个PubmedArticle节点。然而findAll方法只能找到其中的6个。输出上只有6个*而不是10个。我做错了什么

import urllib2
from bs4 import BeautifulSoup

URL = 'http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=23858559,23858558,23858557,23858521,23858508,23858506,23858494,23858473,23858461,23858404'
data = urllib2.urlopen(URL).read()

soup = BeautifulSoup(data)

for x in soup.findAll('pubmedarticle'):
    print '*'

编辑：我发现'findAll'是相对于当前节点的，您可以使用soup设置根节点

提供的xml中的实体名为“PubMedArticle”，请尝试以下操作：

for x in soup.pubmedarticleset.findAll('pubmedarticle'):
    print '*'

我通过添加

xml

参数解决了这个问题。确保已安装

lxml

soup = BeautifulSoup(xmlData, 'xml')

是的，我知道。但如果我这么做，我就什么也得不到。所以我故意使用小写字母。你的代码对我有效，并打印10个字符。试着将

lxml

解析器与beautifulsoup一起使用：

soup=beautifulsoup（data，“lxml”）

（确保安装了

lxml

）。在这种情况下，为什么不直接使用lxml？^^（撇开玩笑不谈，它有一个很棒的xpath支持）。