Python 将xml文件与嵌套元素合并,而不使用外部库
我正在尝试使用Python将多个XML文件合并在一起,而不使用外部库。XML文件具有嵌套的元素 示例文件1:Python 将xml文件与嵌套元素合并,而不使用外部库,python,xml,python-2.7,elementtree,Python,Xml,Python 2.7,Elementtree,我正在尝试使用Python将多个XML文件合并在一起,而不使用外部库。XML文件具有嵌套的元素 示例文件1: <root> <element1>textA</element1> <elements> <nested1>text now</nested1> </elements> </root> <root> <element2>textB</el
<root>
<element1>textA</element1>
<elements>
<nested1>text now</nested1>
</elements>
</root>
<root>
<element2>textB</element2>
<elements>
<nested1>text after</nested1>
<nested2>new text</nested2>
</elements>
</root>
<root>
<element1>textA</element1>
<element2>textB</element2>
<elements>
<nested1>text after</nested1>
<nested2>new text</nested2>
</elements>
</root>
<root>
<element1>textA</element1>
<elements>
<nested1>text now</nested1>
</elements>
<element2>textB</element2>
<elements>
<nested1>text after</nested1>
<nested2>new text</nested2>
</elements>
</root>
我得到的:
<root>
<element1>textA</element1>
<elements>
<nested1>text now</nested1>
</elements>
</root>
<root>
<element2>textB</element2>
<elements>
<nested1>text after</nested1>
<nested2>new text</nested2>
</elements>
</root>
<root>
<element1>textA</element1>
<element2>textB</element2>
<elements>
<nested1>text after</nested1>
<nested2>new text</nested2>
</elements>
</root>
<root>
<element1>textA</element1>
<elements>
<nested1>text now</nested1>
</elements>
<element2>textB</element2>
<elements>
<nested1>text after</nested1>
<nested2>new text</nested2>
</elements>
</root>
textA
现在发短信
textB
后文
新文本
我希望你能看到并理解我的问题。我正在寻找一个合适的解决方案,任何指导都会很好
为了澄清这个问题,使用我现有的解决方案,嵌套元素不会合并。您发布的代码所做的是合并所有元素,而不管是否存在具有相同标记的元素。因此,您需要迭代元素,并以您认为合适的方式手动检查和组合它们,因为这不是处理XML文件的标准方式。我无法比代码更好地解释它,所以这里或多或少有评论:
from xml.etree import ElementTree as et
class XMLCombiner(object):
def __init__(self, filenames):
assert len(filenames) > 0, 'No filenames!'
# save all the roots, in order, to be processed later
self.roots = [et.parse(f).getroot() for f in filenames]
def combine(self):
for r in self.roots[1:]:
# combine each element with the first one, and update that
self.combine_element(self.roots[0], r)
# return the string representation
return et.tostring(self.roots[0])
def combine_element(self, one, other):
"""
This function recursively updates either the text or the children
of an element if another element is found in `one`, or adds it
from `other` if not found.
"""
# Create a mapping from tag name to element, as that's what we are fltering with
mapping = {el.tag: el for el in one}
for el in other:
if len(el) == 0:
# Not nested
try:
# Update the text
mapping[el.tag].text = el.text
except KeyError:
# An element with this name is not in the mapping
mapping[el.tag] = el
# Add it
one.append(el)
else:
try:
# Recursively process the element, and update it in the same way
self.combine_element(mapping[el.tag], el)
except KeyError:
# Not in the mapping
mapping[el.tag] = el
# Just add it
one.append(el)
if __name__ == '__main__':
r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine()
print '-'*20
print r
谢谢,但是我的问题是通过考虑属性来合并。以下是我的补丁后的代码:
import sys
from xml.etree import ElementTree as et
class hashabledict(dict):
def __hash__(self):
return hash(tuple(sorted(self.items())))
class XMLCombiner(object):
def __init__(self, filenames):
assert len(filenames) > 0, 'No filenames!'
# save all the roots, in order, to be processed later
self.roots = [et.parse(f).getroot() for f in filenames]
def combine(self):
for r in self.roots[1:]:
# combine each element with the first one, and update that
self.combine_element(self.roots[0], r)
# return the string representation
return et.ElementTree(self.roots[0])
def combine_element(self, one, other):
"""
This function recursively updates either the text or the children
of an element if another element is found in `one`, or adds it
from `other` if not found.
"""
# Create a mapping from tag name to element, as that's what we are fltering with
mapping = {(el.tag, hashabledict(el.attrib)): el for el in one}
for el in other:
if len(el) == 0:
# Not nested
try:
# Update the text
mapping[(el.tag, hashabledict(el.attrib))].text = el.text
except KeyError:
# An element with this name is not in the mapping
mapping[(el.tag, hashabledict(el.attrib))] = el
# Add it
one.append(el)
else:
try:
# Recursively process the element, and update it in the same way
self.combine_element(mapping[(el.tag, hashabledict(el.attrib))], el)
except KeyError:
# Not in the mapping
mapping[(el.tag, hashabledict(el.attrib))] = el
# Just add it
one.append(el)
if __name__ == '__main__':
r = XMLCombiner(sys.argv[1:-1]).combine()
print '-'*20
print et.tostring(r.getroot())
r.write(sys.argv[-1], encoding="iso-8859-1", xml_declaration=True)
扩展@jadkik94的答案以创建一个实用程序方法,该方法不会更改其参数,也会更新属性: 注意,该代码仅在Py2中工作,因为Py3中还不支持元素类的copy()方法
def combine_xmltree_element(element_1, element_2):
"""
Recursively combines the given two xmltree elements. Common properties will be overridden by values of those
properties in element_2.
:param element_1: A xml Element
:type element_1: L{Element}
:param element_2: A xml Element
:type element_2: L{Element}
:return: A xml element with properties combined.
"""
if element_1 is None:
return element_2.copy()
if element_2 is None:
return element_1.copy()
if element_1.tag != element_2.tag:
raise TypeError(
"The two XMLtree elements of type {t1} and {t2} cannot be combined".format(
t1=element_1.tag,
t2=element_2.tag
)
)
combined_element = Element(tag=element_1.tag, attrib=element_1.attrib)
combined_element.attrib.update(element_2.attrib)
# Create a mapping from tag name to child element
element_1_child_mapping = {child.tag: child for child in element_1}
element_2_child_mapping = {child.tag: child for child in element_2}
for child in element_1:
if child.tag not in element_2_child_mapping:
combined_element.append(child.copy())
for child in element_2:
if child.tag not in element_1_child_mapping:
combined_element.append(child.copy())
else:
if len(child) == 0: # Leaf element
combined_child = element_1_child_mapping[child.tag].copy()
combined_child.text = child.text
combined_child.attrib.update(child.attrib)
else:
# Recursively process the element, and update it in the same way
combined_child = combine_xmltree_element(element_1_child_mapping[child.tag], child)
combined_element.append(combined_child)
return combined_element
工作完美,谢谢,我刚刚开始编写自己的代码。:)很好,谢谢。我们还需要合并属性。可以通过在替换元素文本后的
combine\u元素
和mapping[el.tag].attrib.update(el.attrib)
处添加one.attrib.update(other.attrib)
来完成。关于我为什么会得到无效语法错误的任何建议<代码>映射={el.tag:el**for**el in one}。错误指向“for”语法。我正在运行Python2.6.6。@Adrian该错误是因为只有Python2.7+才支持{}
生成器。您应该使用dict((el.tag,el)作为el-in-one),这是等效的。