带SDMX的Python BS4_Python_Python 2.7_Xml Parsing_Bs4_Sdmx

带SDMX的Python BS4

python python-2.7

带SDMX的Python BS4,python,python-2.7,xml-parsing,bs4,sdmx,Python,Python 2.7,Xml Parsing,Bs4,Sdmx,我想检索SDMX文件中给出的数据（如）。我试图使用BeautifulSoup，但它似乎看不到标签。在下面的代码中 import urllib2 from bs4 import BeautifulSoup url = "https://www.bundesbank.de/cae/servlet/StatisticDownload?tsId=BBK01.ST0304&its_fileFormat=sdmx" html_source = urllib2.urlopen(url).read()

我想检索SDMX文件中给出的数据（如）。我试图使用BeautifulSoup，但它似乎看不到标签。在下面的代码中

import urllib2
from bs4 import BeautifulSoup 
url = "https://www.bundesbank.de/cae/servlet/StatisticDownload?tsId=BBK01.ST0304&its_fileFormat=sdmx"
html_source = urllib2.urlopen(url).read()
soup = BeautifulSoup(html_source, 'lxml')
ts_series = soup.findAll("bbk:Series")

这给了我一个空的物体

BS4是错误的工具，还是（更有可能）我做错了什么？提前感谢您。findAll（“bbk:series”）将返回结果

事实上，在这种情况下，即使您使用

lxml

作为解析器，BeautifulSoup仍然将其解析为html，因为html标记是不区分大小写的，BeautifulSoup将所有标记都分解，因此

soup.findAll（“bbk:series”）

工作。见官方文件

如果要将其解析为

xml

，请改用

soup=BeautifulSoup（html\u source，'xml'）

。它还使用

lxml

，因为

lxml

是唯一的

xml

解析器。现在您可以使用

ts_series=soup.findAll（“series”）

获得结果，因为beautifulSoup将剥离命名空间部分

bbk

soup.findAll（“bbk:series”）

将返回结果