使用python迭代xml以查找具有特定扩展名的url_Python_Xml_Regex_Xml Parsing

使用python迭代xml以查找具有特定扩展名的url

python xml regex

使用python迭代xml以查找具有特定扩展名的url,python,xml,regex,xml-parsing,Python,Xml,Regex,Xml Parsing,我有一个从url下载的xml文件。然后，我希望遍历xml以找到指向具有特定文件扩展名的文件的链接我的xml如下所示： <Foo> <bar> <file url="http://foo.txt"/> <file url="http://bar.doc"/> </bar> </Foo> import urllib2, re from xml.dom.minidom impor

我有一个从url下载的xml文件。然后，我希望遍历xml以找到指向具有特定文件扩展名的文件的链接

我的xml如下所示：

<Foo>
    <bar>
        <file url="http://foo.txt"/>
        <file url="http://bar.doc"/>
    </bar>
</Foo>

import urllib2, re
from xml.dom.minidom import parseString

file = urllib2.urlopen('http://foobar.xml')
data = file.read()
file.close()
dom = parseString(data)
xmlTag = dom.getElementsByTagName('file')

然后我想让这样的事情发生：

   i=0
    url = ''
    while( i < len(xmlTag)):
         if re.search('*.txt', xmlTag[i].toxml() ) is not None:
              url = xmlTag[i].toxml()
         i = i + 1;

** Some code that parses out the url **

i=0
url=“”
而（i


但这是一个错误。有人有更好的方法吗
谢谢
 坦白说，你的最后一段代码很恶心dom.getElementsByTagName（'file'）
提供树中所有
元素的列表。。。只需重复它
urls = []
for file_node in dom.getElementsByTagName('file'):
    url = file_node.getAttribute('url')
    if url.endswith('.txt'):
        urls.append(url)

顺便说一句，您不应该使用Python手动编制索引。即使在极少数情况下，您也需要索引号，只需使用enumerate：
mylist = ['a', 'b', 'c']
for i, value in enumerate(mylist):
    print i, value

使用lxml
、urlparse
和os.path
的示例：
from lxml import etree
from urlparse import urlparse
from os.path import splitext

data = """
<Foo>
    <bar>
        <file url="http://foo.txt"/>
        <file url="http://bar.doc"/>
    </bar>
</Foo>
"""

tree = etree.fromstring(data).getroottree()
for url in tree.xpath('//Foo/bar/file/@url'):
    spliturl = urlparse(url)
    name, ext = splitext(spliturl.netloc)
    print url, 'is is a', ext, 'file'

从lxml导入etree
从URLPRASE导入URLPRASE
从os.path导入拆分文本
data=”“”
"""
tree=etree.fromstring（数据）.getroottree（）
对于tree.xpath（'//Foo/bar/file/@url'）中的url：
spliturl=urlparse（url）
名称，ext=splitext（spliturl.netloc）
打印url“是一个”，ext“文件”
是的，今天有点恶心。我上周刚学了python。但这是完美的！只需将行“url=file\u node.getAttribute（'url'）”更改为“url=file\u node.getAttribute（'url'）”，它就像一个符咒。谢谢