Python 解析xml文件并创建文件列表
每个/var/packs/{many folders}/info.xml下都有一个info.xml文件,其中有不同的目录,但目录的信息在info.xml中 我需要解析每个{many folders}并创建一个文件路径列表,如果文件类型为“config”,则该列表位于路径标记内,可以通过检查“config”是否为类型标记内的类型来找到 info.xml文件如下所示Python 解析xml文件并创建文件列表,python,xml,parsing,Python,Xml,Parsing,每个/var/packs/{many folders}/info.xml下都有一个info.xml文件,其中有不同的目录,但目录的信息在info.xml中 我需要解析每个{many folders}并创建一个文件路径列表,如果文件类型为“config”,则该列表位于路径标记内,可以通过检查“config”是否为类型标记内的类型来找到 info.xml文件如下所示 <Files> <File> <Path>usr/share/doc/di
<Files>
<File>
<Path>usr/share/doc/dialog/samples/form1</Path>
<Type>doc</Type>
<Size>1222</Size>
<Uid>0</Uid>
<Gid>0</Gid>
<Mode>0755</Mode>
<Hash>49744d73e8667d0e353923c0241891d46ebb9032</Hash>
</File>
<File>
<Path>usr/share/doc/dialog/samples/form3</Path>
<Type>config</Type>
<Size>1294</Size>
<Uid>0</Uid>
<Gid>0</Gid>
<Mode>0755</Mode>
<Hash>f30277f73e468232c59a526baf3a5ce49519b959</Hash>
</File>
</Files>
usr/share/doc/dialog/samples/form1
医生
1222
0
0
0755
49744D73E8667D0E353923C0241891D46EB9032
usr/share/doc/dialog/samples/form3
配置
1294
0
0
0755
F30277F73E46823C59A526BAF3A5CE49519B959
这是一个非常基本的示例,在处理过程中没有错误,可以使用非常严格定义的XML文件,但您应该将其作为一个开始,并继续使用以下链接:
import os
import os.path
from xml.dom.minidom import parse
def parse_file(path):
files = []
try:
dom = parse(path)
for filetag in dom.getElementsByTagName('File'):
type = filetag.getElementsByTagName('Type')[0].firstChild.data
if type == 'config':
path = tag.getElementsByTagName('Path')[0].firstChild.data
files.append(path)
dom.unlink()
except:
raise
return files
def main():
files = []
for root, dirs, files in os.walk('/var/packs'):
if 'info.xml' in files:
files += parse_file(os.path.join(root, 'info.xml'))
print 'The list of desired files:', files
if __name__ == '__main__':
main()
这是一个非常基本的示例,在处理过程中没有错误,可以使用非常严格定义的XML文件,但您应该以它为起点,继续以下链接:
import os
import os.path
from xml.dom.minidom import parse
def parse_file(path):
files = []
try:
dom = parse(path)
for filetag in dom.getElementsByTagName('File'):
type = filetag.getElementsByTagName('Type')[0].firstChild.data
if type == 'config':
path = tag.getElementsByTagName('Path')[0].firstChild.data
files.append(path)
dom.unlink()
except:
raise
return files
def main():
files = []
for root, dirs, files in os.walk('/var/packs'):
if 'info.xml' in files:
files += parse_file(os.path.join(root, 'info.xml'))
print 'The list of desired files:', files
if __name__ == '__main__':
main()
在我的头顶上写下这句话,但这是事实。我们将使用os.path.walk递归地进入目录和minidom进行解析
import os
from xml.dom import minidom
# opens a given info.xml file and prints out "Path"'s contents
def parseInfoXML(filename):
doc = minidom.parse(filename)
for fileNode in doc.getElementsByTagName("File"):
# warning: we assume the existence of a Path node, and that it contains a Text node
print fileNode.getElementsByTagName("Path")[0].childNodes[0].data
doc.unlink()
def checkDirForInfoXML(arg, dirname, names):
if "info.xml" in names:
parseInfoXML(os.path.join(dirname, "info.xml"))
# recursively walk the directory tree, calling our visitor function to check for info.xml in each dir
# this will include packs as well, so be sure that there's no info.xml in there
os.path.walk("/var/packs" , checkDirForInfoXML, None)
我敢肯定,这不是最有效的方法,但如果你不期望有任何错误/什么的话,这也行。在我的脑海中写下这一点,但现在开始。我们将使用os.path.walk递归地进入目录和minidom进行解析
import os
from xml.dom import minidom
# opens a given info.xml file and prints out "Path"'s contents
def parseInfoXML(filename):
doc = minidom.parse(filename)
for fileNode in doc.getElementsByTagName("File"):
# warning: we assume the existence of a Path node, and that it contains a Text node
print fileNode.getElementsByTagName("Path")[0].childNodes[0].data
doc.unlink()
def checkDirForInfoXML(arg, dirname, names):
if "info.xml" in names:
parseInfoXML(os.path.join(dirname, "info.xml"))
# recursively walk the directory tree, calling our visitor function to check for info.xml in each dir
# this will include packs as well, so be sure that there's no info.xml in there
os.path.walk("/var/packs" , checkDirForInfoXML, None)
我相信这不是实现它的最有效的方法,但如果您不希望出现任何错误/任何情况,它也可以做到。使用和XPath:
files = []
for root, dirnames, filenames in os.walk('/var/packs'):
for filename in filenames:
if filename != 'info.xml':
continue
tree = lxml.etree.parse(os.path.join(root, filename))
files.extend(tree.getroot().xpath('//File[Type[text()="config"]]/Path/text()'))
如果lxml不可用,您也可以使用标准库中的:
files = []
for root, dirnames, filenames in os.walk('/var/packs'):
for filename in filenames:
if filename != 'info.xml':
continue
tree = xml.etree.ElementTree.parse(os.path.join(root, filename))
for file_node in tree.findall('File'):
type_node = file_node.find('Type')
if type_node is not None and type_node.text == 'config':
path_node = file_node.find('Path')
if path_node is not None:
files.append(path_node.text)
使用和XPath:
files = []
for root, dirnames, filenames in os.walk('/var/packs'):
for filename in filenames:
if filename != 'info.xml':
continue
tree = lxml.etree.parse(os.path.join(root, filename))
files.extend(tree.getroot().xpath('//File[Type[text()="config"]]/Path/text()'))
如果lxml不可用,您也可以使用标准库中的:
files = []
for root, dirnames, filenames in os.walk('/var/packs'):
for filename in filenames:
if filename != 'info.xml':
continue
tree = xml.etree.ElementTree.parse(os.path.join(root, filename))
for file_node in tree.findall('File'):
type_node = file_node.find('Type')
if type_node is not None and type_node.text == 'config':
path_node = file_node.find('Path')
if path_node is not None:
files.append(path_node.text)
只需补充说明:os.path.walk已被弃用,并在3.0中被删除,取而代之的是os.walk()。啊哈,谢谢你。不幸的是,我仍然生活在Python2.6石器时代,呵呵。顺便说一句:os.path.walk已被弃用,并在3.0中被删除,取而代之的是os.walk()。啊哈,谢谢你。不幸的是,我仍然生活在Python 2.6石器时代,呵呵。