使用Python从文本文件创建xml树_Python_Xml_Elementtree

使用Python从文本文件创建xml树

python xml

使用Python从文本文件创建xml树,python,xml,elementtree,Python,Xml,Elementtree,在解析文本文件时，我需要避免在xml树中创建双分支。假设文本文件如下所示（行的顺序是随机的）： branch1:branch11:message11 branch1:branch12:message12 branch2:branch21:message21 branch2:branch22:message22 因此，生成的xml树应该有一个具有两个分支的根。这两个分支都有两个子分支。我用于解析此文本文件的Python代码如下所示： import string fh = open ('xmlbas

在解析文本文件时，我需要避免在xml树中创建双分支。假设文本文件如下所示（行的顺序是随机的）：

branch1:branch11:message11
branch1:branch12:message12
branch2:branch21:message21
branch2:branch22:message22

因此，生成的xml树应该有一个具有两个分支的根。这两个分支都有两个子分支。我用于解析此文本文件的Python代码如下所示：

import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')

for line in allLines:
   tempv = line.split(':')
   branch1 = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')

这段代码的问题是，xml树中的一个分支是用文本文件中的每一行创建的

如果已经存在具有此名称的分支，如何避免在xml树中创建另一个分支，有什么建议吗？

类似于这些建议吗？在dict中保留要重用的分支级别

b1map = {}

for line in allLines:
   tempv = line.split(':')
   branch1 = b1map.get(tempv[0])
   if branch1 is None:
       branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

逻辑很简单——你已经在问题中陈述过了！在创建分支之前，只需检查分支是否已存在于树中

请注意，这可能是低效的，因为您正在为每一行搜索整个树。这是因为

ElementTree

不是为唯一性而设计的

如果您需要速度（您可能不需要，尤其是对于较小的树！），更有效的方法是使用

defaultdict

存储树结构，然后再将其转换为

ElementTree

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

谢谢，这个答案和其他答案都很好，但我将坚持使用defaultdict，因为实际上文本和xml文件相当大。

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)