使用Python从文本文件创建xml树
在解析文本文件时,我需要避免在xml树中创建双分支。假设文本文件如下所示(行的顺序是随机的): branch1:branch11:message11使用Python从文本文件创建xml树,python,xml,elementtree,Python,Xml,Elementtree,在解析文本文件时,我需要避免在xml树中创建双分支。假设文本文件如下所示(行的顺序是随机的): branch1:branch11:message11 branch1:branch12:message12 branch2:branch21:message21 branch2:branch22:message22 因此,生成的xml树应该有一个具有两个分支的根。这两个分支都有两个子分支。我用于解析此文本文件的Python代码如下所示: import string fh = open ('xmlbas
branch1:branch12:message12
branch2:branch21:message21
branch2:branch22:message22 因此,生成的xml树应该有一个具有两个分支的根。这两个分支都有两个子分支。我用于解析此文本文件的Python代码如下所示:
import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')
for line in allLines:
tempv = line.split(':')
branch1 = ET.SubElement(root, tempv[0])
branch2 = ET.SubElement(branch1, tempv[1])
branch2.text = tempv[2]
tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')
这段代码的问题是,xml树中的一个分支是用文本文件中的每一行创建的
如果已经存在具有此名称的分支,如何避免在xml树中创建另一个分支,有什么建议吗?类似于这些建议吗?在dict中保留要重用的分支级别
b1map = {}
for line in allLines:
tempv = line.split(':')
branch1 = b1map.get(tempv[0])
if branch1 is None:
branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
branch2 = ET.SubElement(branch1, tempv[1])
branch2.text = tempv[2]
逻辑很简单——你已经在问题中陈述过了!在创建分支之前,只需检查分支是否已存在于树中
请注意,这可能是低效的,因为您正在为每一行搜索整个树。这是因为ElementTree
不是为唯一性而设计的
如果您需要速度(您可能不需要,尤其是对于较小的树!),更有效的方法是使用
defaultdict
存储树结构,然后再将其转换为ElementTree
import collections
import xml.etree.ElementTree as ET
with open("xmlbasic.txt") as lines_file:
lines = lines_file.read()
root_dict = collections.defaultdict( dict )
for line in lines:
head, subhead, tail = line.split(":")
root_dict[head][subhead] = tail
root = ET.Element('root')
for head, branch in root_dict.items():
head_element = ET.SubElement(root, head)
for subhead, tail in branch.items():
ET.SubElement(head_element,subhead).text = tail
tree = ET.ElementTree(root)
ET.dump(tree)
谢谢,这个答案和其他答案都很好,但我将坚持使用defaultdict,因为实际上文本和xml文件相当大。
import collections
import xml.etree.ElementTree as ET
with open("xmlbasic.txt") as lines_file:
lines = lines_file.read()
root_dict = collections.defaultdict( dict )
for line in lines:
head, subhead, tail = line.split(":")
root_dict[head][subhead] = tail
root = ET.Element('root')
for head, branch in root_dict.items():
head_element = ET.SubElement(root, head)
for subhead, tail in branch.items():
ET.SubElement(head_element,subhead).text = tail
tree = ET.ElementTree(root)
ET.dump(tree)