Python 合并xml文件中具有公共标记的元素
我使用Python中的ElementTree创建了一个xml文件。我对python非常陌生,所以如果我在术语方面犯了一些错误,请原谅。 我想合并具有相同属性名的元素的内容Python 合并xml文件中具有公共标记的元素,python,xml,elementtree,Python,Xml,Elementtree,我使用Python中的ElementTree创建了一个xml文件。我对python非常陌生,所以如果我在术语方面犯了一些错误,请原谅。 我想合并具有相同属性名的元素的内容 <?xml version="1.0" ?> <DefaultLines> <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt"> <
<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>'120'<Child>Statement</Child>w_addr <= (others => '0');</Message>
</DefLines>
<DefLines>
<Message>'136'<Child>Statement</Child>r_addr <= (others => '0');</Message>
</DefLines>
</FileName>
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
</DefLines>
</FileName>
</Files>
</DefaultLines>
“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“119”分支(SRESET='1')然后
例如,Filename1和Filename2具有相同的属性,即“emem\u fifo\u 1c.vhd”。如果“file”相同,我希望文件名中的元素合并为一个
我的输出xml应该如下所示
<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>'120'<Child>Statement</Child>w_addr <= (others => '0');</Message>
</DefLines>
<DefLines>
<Message>'136'<Child>Statement</Child>r_addr <= (others => '0');</Message>
</DefLines>
<DefLines>
<Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
</DefLines>
</FileName>
</Files>
</DefaultLines>
“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“119”分支(SRESET='1')然后
我真的不知道如何在python中使用ElementTree做同样的事情
更新:
我正准备在大兵寿的帮助下解决这个问题。但是,我面临另一个问题,即节点内的内容重复。我试图在将它们添加到xml中时删除它们,但它不起作用
<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-10-31" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>'108'<Child>Expression</Child>Item 1 ((W_EN and not(fifo_full)) and not(SRESET))</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>108<Child>Expression</Child>Item 1 ((W_EN and not(fifo_full)) and not(SRESET))</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>109<Child>Expression</Child>Item 1 ((R_EN and not(fifo_empty)) and not(SRESET))</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<DefLines>
<Message>108<Child>Expression</Child>Item 1 ((W_EN and not(fifo_full)) and not(SRESET))</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>Row 4:<Child>Expression</Child>fifo_full_1 not SRESET && W_EN</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>Row 6:<Child>Expression</Child>SRESET_1 (W_EN and not(fifo_full))</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>Row 4:<Child>Expression</Child>fifo_full_1 not SRESET && W_EN</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>Row 6:<Child>Expression</Child>SRESET_1 (W_EN and not(fifo_full))</Message>
<Justification />
<Comment />
<Status />
</DefLines>
“108”表示第1项(W_EN and not(fifo_full))和非(SRESET))
“119”分支(SRESET='1')然后
108表示第1项(W_EN and not(fifo_full))和非(SRESET))
109表达式项目1(R_EN and not(fifo_u empty))和not(SRESET))
108表示第1项(W_EN and not(fifo_full))和非(SRESET))
第4行:表达式FIFO_full_1未设置&&;沃恩
第6行:表达式重置1(带和不带(先进先出全))
第4行:表达式FIFO_full_1未设置&&;沃恩
第6行:表达式重置1(带和不带(先进先出全))
“108”“109”“第4行”“第6行”将被追加多次。我是否可能只保留第一次出现的内容,并删除其余内容
更新:
使用该方法删除重复项后,我得到的xml节点不完整:
<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-11-01" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>
'108'
<Child>Expression</Child>
Item 1 ((W_EN and not(fifo_full)) and not(SRESET))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'119'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'120'
<Child>Statement</Child>
w_addr <= (others => '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'135'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'136'
<Child>Statement</Child>
r_addr <= (others => '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'157'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'158'
<Child>Statement</Child>
fifo_empty <= '1';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'180'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'181', '182'
<Child>Statement</Child>
fifo_used <= (others => '0');
fifo_used_one <= '0';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'568', '569', '570', '571'
<Child>Statement</Child>
config_rd_fsm
<= '0';
axi4_lite_slave_rdata_ch_out <= AXI4LITE_RDATA32_S2M_DEF;
config_rd_fsm <= IDLE;
</Message>
<Justification />
<Comment />
<Status />
**<
<DefLines>**
<Message>
161
<Child>Condition</Child>
Item 1 (((r_en_valid = '1') and (fifo_used_one = '1')) and (w_en_valid = '0'))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
DefLines>
<DefLines>
<Message>
367
<Child>Branch</Child>
when others =>
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
**<Child>Bran**
<DefLines>
<Message>
Row 5:
<Child>Condition</Child>
(w_en_valid = '0')_0 ((r_en_valid = '1') and (fifo_used_one = '1'))
</Message>
<Justification />
<Comment />
<Status />
**</DefLines>sh</Child>**
All False Count
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
587
<Child>Branch</Child>
when others =>
</Message>
**<Justi>**
</FileName>
'108'
表情
第1项(W_EN and not(fifo_full))和非(SRESET))
'119'
分支机构
如果(SRESET='1'),则
'120'
陈述
w_addr=(其他='0');
'135'
分支机构
如果(SRESET='1'),则
'136'
陈述
r_addr=(其他='0');
'157'
分支机构
如果(SRESET='1'),则
'158'
陈述
fifo_empty='1';
'180'
分支机构
如果(SRESET='1'),则
'181', '182'
陈述
使用的fifo_=(其他='0');
使用的先进先出='0';
消息>
'568', '569', '570', '571'
陈述
配置\u rd\u fsm
= '0';
axi4_-lite_-slave_-Data_-Chu-out=AXI4LITE_-RDATA32_-S2M_-DEF;
config_rd_fsm=空闲;
**<
**
161
条件
第1项((r_en_valid='1')和(fifo_used_one='1'))和(w_en_valid='0'))
DefLines>
367
分支机构
当别人=
**麸皮**
第5行:
条件
(w_en_valid='0')_0((r_en_valid='1')和(fifo_used_one='1'))
**嘘**
全是假计数
587
分支机构
当别人=
****
我试图在树不完整的地方加上粗体,因此在生成和解析xml树时出现错误这就是如何使用lxml完成的;我会尽力解释的 基本原则是,我们随机选择第一个
文件名
作为目标信息的存储库,将该目标信息粘贴到其中,然后删除该目标的父级
from lxml import etree
deflines = """<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>'120'<Child>Statement</Child>w_addr <= (others => '0');</Message>
</DefLines>
<DefLines>
<Message>'136'<Child>Statement</Child>r_addr <= (others => '0');</Message>
</DefLines>
</FileName>
<FileName file="some_other_name.text">
<DefLines>
<Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
</DefLines>
</FileName>
<FileName file="emem_fifo_1c.vhd " id="mushi">
<DefLines>
<Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
</DefLines>
</FileName>
</Files>
</DefaultLines>
"""
# I added another FileName which doesn't meet the requirements, just to demonstrate how it works
doc = etree.XML(deflines)
destination = doc.xpath('//Files/FileName[1]//DefLines')[0]
for dl in doc.xpath('//FileName[@file="emem_fifo_1c.vhd "][position()>1]//DefLines'): #position has to be >1 to make sure we skip the destination element
dl.getparent().getparent().remove(dl.getparent()) #the target was inside a parent which to be removed; so we search for the target's grandparent
destination.append(dl)
print(etree.tostring(doc, xml_declaration = True).decode())
从lxml导入etree
deflines=“”
“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“xxxx”分支(yyyyy=“1000”)然后
“119”分支(SRESET='1')然后
"""
#我添加了另一个不符合要求的文件名,只是为了演示它是如何工作的
doc=etree.XML(deflines)
destination=doc.xpath('//Files/FileName[1]//DefLines')[0]
对于doc.xpath('//FileName[@file=“emem\u fifo\u 1c.vhd”][position()>1]//DefLines')中的dl:#position必须大于1以确保跳过目标元素
dl.getparent().getparent().remove(dl.getparent())#目标位于要删除的父对象内;所以我们寻找目标的祖父母
destination.append(dl)
打印(etree.tostring(doc,xml\u declaration=True).decode())
另一种方法,供您参考
from simplified_scrapy import SimplifiedDoc, utils
xml = '''
<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>'120'<Child>Statement</Child>w_addr <= (others => '0');</Message>
</DefLines>
<DefLines>
<Message>'136'<Child>Statement</Child>r_addr <= (others => '0');</Message>
</DefLines>
</FileName>
<FileName file="some_other_name.text">
<DefLines>
<Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
</DefLines>
</FileName>
<FileName file="emem_fifo_1c.vhd " id="mushi">
<DefLines>
<Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
</DefLines>
</FileName>
</Files>
</DefaultLines>
'''
dic = {}
doc = SimplifiedDoc(xml)
nodes = doc.selects('Files>FileName')
for node in nodes:
last = dic.get(node['file'])
if last:
last.appendChild(node.html)
node.remove()
else:
dic[node['file']]=node
# print (doc.html)
# remove the duplicate items
nodes = doc.selects('Files>FileName')
for node in nodes:
dic.clear()
lst = node.selects('DefLines')
if len(lst) <= 1:
continue
for n in lst:
key = n.select('Message').firstText()
exist = dic.get(key)
if exist:
n.remove()
else:
dic[key] = True
# Sort
nodes = doc.selects('Files>FileName')
for node in nodes:
dic.clear()
lst = node.selects('DefLines')
if len(lst) <= 1:
continue
for n in lst:
dic[n.select('Message').firstText()] = n.outerHtml # Cache, replace it below.
i = 0
for key in sorted(dic):
lst[i].replaceSelf(dic[key]) # Replace after sorting
i = i + 1
# Save
utils.saveFile('test.xml', doc.html)
从simplified\u scrapy导入SimplifiedDoc,utils
xml=“”
“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“xxxx”分支(yyyyy=“1000”)然后
“119”分支(SRESET='1')然后
<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-10-23" Name="D: eport_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
</DefLines>
<DefLines>
<Message>'120'<Child>Statement</Child>w_addr <= (others => '0');</Message>
</DefLines>
<DefLines>
<Message>'136'<Child>Statement</Child>r_addr <= (others => '0');</Message>
</DefLines>
</FileName>
<FileName file="some_other_name.text">
<DefLines>
<Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
</DefLines>
</FileName>
</Files>
</DefaultLines>