使用Python从文本文件创建xml树

使用Python从文本文件创建xml树,python,xml,elementtree,Python,Xml,Elementtree,在解析文本文件时,我需要避免在xml树中创建双分支。假设文本文件如下所示(行的顺序是随机的): branch1:branch11:message11 branch1:branch12:message12 branch2:branch21:message21 branch2:branch22:message22 因此,生成的xml树应该有一个具有两个分支的根。这两个分支都有两个子分支。我用于解析此文本文件的Python代码如下所示: import string fh = open ('xmlbas

在解析文本文件时,我需要避免在xml树中创建双分支。假设文本文件如下所示(行的顺序是随机的):

branch1:branch11:message11
branch1:branch12:message12
branch2:branch21:message21
branch2:branch22:message22

因此,生成的xml树应该有一个具有两个分支的根。这两个分支都有两个子分支。我用于解析此文本文件的Python代码如下所示:

import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')

for line in allLines:
   tempv = line.split(':')
   branch1 = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')
这段代码的问题是,xml树中的一个分支是用文本文件中的每一行创建的


如果已经存在具有此名称的分支,如何避免在xml树中创建另一个分支,有什么建议吗?

类似于这些建议吗?在dict中保留要重用的分支级别

b1map = {}

for line in allLines:
   tempv = line.split(':')
   branch1 = b1map.get(tempv[0])
   if branch1 is None:
       branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]
逻辑很简单——你已经在问题中陈述过了!在创建分支之前,只需检查分支是否已存在于树中

请注意,这可能是低效的,因为您正在为每一行搜索整个树。这是因为
ElementTree
不是为唯一性而设计的


如果您需要速度(您可能不需要,尤其是对于较小的树!),更有效的方法是使用
defaultdict
存储树结构,然后再将其转换为
ElementTree

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

谢谢,这个答案和其他答案都很好,但我将坚持使用defaultdict,因为实际上文本和xml文件相当大。
import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)