Python 合并xml文件中具有公共标记的元素

Python 合并xml文件中具有公共标记的元素,python,xml,elementtree,Python,Xml,Elementtree,我使用Python中的ElementTree创建了一个xml文件。我对python非常陌生,所以如果我在术语方面犯了一些错误,请原谅。 我想合并具有相同属性名的元素的内容 <?xml version="1.0" ?> <DefaultLines> <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt"> <

我使用Python中的ElementTree创建了一个xml文件。我对python非常陌生,所以如果我在术语方面犯了一些错误,请原谅。 我想合并具有相同属性名的元素的内容

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
        </FileName>
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
        </FileName>
    </Files>
</DefaultLines>

“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“119”分支(SRESET='1')然后
例如,Filename1和Filename2具有相同的属性,即“emem\u fifo\u 1c.vhd”。如果“file”相同,我希望文件名中的元素合并为一个

我的输出xml应该如下所示

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
        </FileName>
    </Files>
</DefaultLines>

“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“119”分支(SRESET='1')然后
我真的不知道如何在python中使用ElementTree做同样的事情

更新: 我正准备在大兵寿的帮助下解决这个问题。但是,我面临另一个问题,即节点内的内容重复。我试图在将它们添加到xml中时删除它们,但它不起作用

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-31" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd ">
            <DefLines>
                <Message>'108'<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>108<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>109<Child>Expression</Child>Item    1  ((R_EN and not(fifo_empty)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
            <DefLines>
                <Message>108<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   4:<Child>Expression</Child>fifo_full_1 not SRESET &amp;&amp; W_EN</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   6:<Child>Expression</Child>SRESET_1 (W_EN and not(fifo_full))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   4:<Child>Expression</Child>fifo_full_1           not SRESET &amp;&amp; W_EN</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   6:<Child>Expression</Child>SRESET_1              (W_EN and not(fifo_full))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>

“108”表示第1项(W_EN and not(fifo_full))和非(SRESET))
“119”分支(SRESET='1')然后
108表示第1项(W_EN and not(fifo_full))和非(SRESET))
109表达式项目1(R_EN and not(fifo_u empty))和not(SRESET))
108表示第1项(W_EN and not(fifo_full))和非(SRESET))
第4行:表达式FIFO_full_1未设置&&;沃恩
第6行:表达式重置1(带和不带(先进先出全))
第4行:表达式FIFO_full_1未设置&&;沃恩
第6行:表达式重置1(带和不带(先进先出全))
“108”“109”“第4行”“第6行”将被追加多次。我是否可能只保留第一次出现的内容,并删除其余内容

更新: 使用该方法删除重复项后,我得到的xml节点不完整:

<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-11-01" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>
'108'
<Child>Expression</Child>
Item    1  ((W_EN and not(fifo_full)) and not(SRESET))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'119'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'120'
<Child>Statement</Child>
w_addr &lt;= (others =&gt; '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'135'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>

<DefLines>
<Message>
'136'
<Child>Statement</Child>
r_addr &lt;= (others =&gt; '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'157'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'158'
<Child>Statement</Child>
fifo_empty &lt;= '1';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'180'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>

<DefLines>
<Message>
'181', '182'
<Child>Statement</Child>
fifo_used     &lt;= (others =&gt; '0');
fifo_used_one &lt;= '0';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'568', '569', '570', '571'
<Child>Statement</Child>
config_rd_fsm                     

    &lt;= '0';
axi4_lite_slave_rdata_ch_out &lt;= AXI4LITE_RDATA32_S2M_DEF;
config_rd_fsm                &lt;= IDLE;
</Message>
<Justification />
<Comment />
<Status />
**<

<DefLines>**
<Message>
161
<Child>Condition</Child>
Item    1  (((r_en_valid = '1') and (fifo_used_one = '1')) and (w_en_valid = '0'))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
DefLines>

<DefLines>
<Message>
367
<Child>Branch</Child>
when others =&gt;
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
**<Child>Bran**

<DefLines>
<Message>
Row   5:    
<Child>Condition</Child>
(w_en_valid = '0')_0     ((r_en_valid = '1') and (fifo_used_one = '1'))
</Message>
<Justification />
<Comment />
<Status />
**</DefLines>sh</Child>**
All False Count
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
587
<Child>Branch</Child>
when others =&gt;
</Message>
**<Justi>**


</FileName>

'108'
表情
第1项(W_EN and not(fifo_full))和非(SRESET))
'119'
分支机构
如果(SRESET='1'),则
'120'
陈述
w_addr=(其他='0');
'135'
分支机构
如果(SRESET='1'),则
'136'
陈述
r_addr=(其他='0');
'157'
分支机构
如果(SRESET='1'),则
'158'
陈述
fifo_empty='1';
'180'
分支机构
如果(SRESET='1'),则
'181', '182'
陈述
使用的fifo_=(其他='0');
使用的先进先出='0';
消息>
'568', '569', '570', '571'
陈述
配置\u rd\u fsm
= '0';
axi4_-lite_-slave_-Data_-Chu-out=AXI4LITE_-RDATA32_-S2M_-DEF;
config_rd_fsm=空闲;
**<
**
161
条件
第1项((r_en_valid='1')和(fifo_used_one='1'))和(w_en_valid='0'))
DefLines>
367
分支机构
当别人=
**麸皮**
第5行:
条件
(w_en_valid='0')_0((r_en_valid='1')和(fifo_used_one='1'))
**嘘**
全是假计数
587
分支机构
当别人=
****

我试图在树不完整的地方加上粗体,因此在生成和解析xml树时出现错误

这就是如何使用lxml完成的;我会尽力解释的

基本原则是,我们随机选择第一个
文件名
作为目标信息的存储库,将该目标信息粘贴到其中,然后删除该目标的父级

    from lxml import etree
    deflines = """<?xml version="1.0" ?>
    <DefaultLines>
        <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
            <FileName file="emem_fifo_1c.vhd  ">
                <DefLines>
                    <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
                </DefLines>
                <DefLines>
                    <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
                </DefLines>
            </FileName>
          <FileName file="some_other_name.text">
                <DefLines>
                    <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
                </DefLines>
            </FileName>
            <FileName file="emem_fifo_1c.vhd  " id="mushi">
                <DefLines>
                    <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
                </DefLines>
            </FileName>       
        </Files>
    </DefaultLines>
    """
    # I added another FileName which doesn't meet the requirements, just to demonstrate how it works
    
    doc = etree.XML(deflines)
    destination = doc.xpath('//Files/FileName[1]//DefLines')[0]
    for  dl in doc.xpath('//FileName[@file="emem_fifo_1c.vhd  "][position()>1]//DefLines'): #position has to be >1 to make sure we skip the destination element        
        dl.getparent().getparent().remove(dl.getparent()) #the target was inside a parent which to be removed; so we search for the target's grandparent 
        destination.append(dl)
    print(etree.tostring(doc, xml_declaration = True).decode())
从lxml导入etree
deflines=“”
“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“xxxx”分支(yyyyy=“1000”)然后
“119”分支(SRESET='1')然后
"""
#我添加了另一个不符合要求的文件名,只是为了演示它是如何工作的
doc=etree.XML(deflines)
destination=doc.xpath('//Files/FileName[1]//DefLines')[0]
对于doc.xpath('//FileName[@file=“emem\u fifo\u 1c.vhd”][position()>1]//DefLines')中的dl:#position必须大于1以确保跳过目标元素
dl.getparent().getparent().remove(dl.getparent())#目标位于要删除的父对象内;所以我们寻找目标的祖父母
destination.append(dl)
打印(etree.tostring(doc,xml\u declaration=True).decode())

另一种方法,供您参考

from simplified_scrapy import SimplifiedDoc, utils
xml = '''
<?xml version="1.0" ?>
<DefaultLines>
   <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
      <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
               <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
               <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      </FileName>
      <FileName file="some_other_name.text">
            <DefLines>
               <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
            </DefLines>
      </FileName>
      <FileName file="emem_fifo_1c.vhd  " id="mushi">
            <DefLines>
               <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
      </FileName>       
   </Files>
</DefaultLines>
'''

dic = {}
doc = SimplifiedDoc(xml)
nodes = doc.selects('Files>FileName')
for node in nodes:
   last = dic.get(node['file'])
   if last:
      last.appendChild(node.html)
      node.remove()
   else:
      dic[node['file']]=node
      
# print (doc.html)
# remove the duplicate items
nodes = doc.selects('Files>FileName')
for node in nodes:
    dic.clear()
    lst = node.selects('DefLines')
    if len(lst) <= 1:
        continue
    for n in lst:
        key = n.select('Message').firstText()
        exist = dic.get(key)
        if exist:
            n.remove()
        else:
            dic[key] = True
# Sort
nodes = doc.selects('Files>FileName')
for node in nodes:
    dic.clear()
    lst = node.selects('DefLines')
    if len(lst) <= 1:
        continue
    for n in lst:
        dic[n.select('Message').firstText()] = n.outerHtml # Cache, replace it below.

    i = 0
    for key in sorted(dic):
        lst[i].replaceSelf(dic[key]) # Replace after sorting
        i = i + 1
# Save
utils.saveFile('test.xml', doc.html)
从simplified\u scrapy导入SimplifiedDoc,utils
xml=“”
“120”语句w_addr=(其他=“0”);
“136”语句r_addr=(其他=“0”);
“xxxx”分支(yyyyy=“1000”)然后
“119”分支(SRESET='1')然后
<?xml version="1.0" ?>
<DefaultLines>
   <Files Date="2020-10-23" Name="D: eport_byfile_detailed.txt">
      <FileName file="emem_fifo_1c.vhd ">
            <DefLines>
               <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
            <DefLines>
               <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      
            <DefLines>
               <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      </FileName>
      <FileName file="some_other_name.text">
            <DefLines>
               <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
            </DefLines>
      </FileName>       
   </Files>
</DefaultLines>