Python 将dataframe导出到属性列表结构xml

Python 将dataframe导出到属性列表结构xml,python,xml,pandas,dataframe,Python,Xml,Pandas,Dataframe,我叫巴勃罗,这是我在这个小组里的第一个问题。 在查看了其他相关帖子后,我决定提出一个请求, 我想知道是否有办法执行以下操作 假设我有以下数据帧结构: +----+---------+------------+------------+----------+ | | MRBTS | dest | gw | length | |----+---------+------------+------------+----------| | 0 | 1300

我叫巴勃罗,这是我在这个小组里的第一个问题。 在查看了其他相关帖子后,我决定提出一个请求, 我想知道是否有办法执行以下操作

假设我有以下数据帧结构:

+----+---------+------------+------------+----------+
|    |   MRBTS | dest       | gw         |   length |
|----+---------+------------+------------+----------|
|  0 |   13004 | 10.104.0.0 | 10.48.0.0  |       16 |
|  1 |   13004 | 10.107.0.0 | 10.45.0.0  |       16 |
|  2 |   13005 | 10.104.0.0 | 10.130.0.0 |        8 |
|  3 |   13005 | 10.102.0.0 | 10.130.0.0 |        8 |
|  4 |   13005 | 0.0.0.0    | 10.110.0.0 |       16 |
+----+---------+------------+------------+----------+
测试DF:

我想通过MRBT导出到XML列表groupping,如下所示:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData type="plan" scope="all" name="iprt" id="PlanConfiguration( 7152069 )">
    <header>
      <log dateTime="2020-06-19T07:38:16.000-03:00" action="created" appInfo="PlanExporter">InternalValues are used</log>
    </header>
    <managedObject distName="MRBTS-13004">
      <list >
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.38.0.0</p>
        </item>
        <item>
          <p name="dest">10.107.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.45.0.0</p>
        </item>
      </list>
    </managedObject>
    <managedObject  distName="MRBTS-13005">
      <list >
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="length">8</p>
          <p name="gw">10.130.8.0</p>
        </item>
        <item>
          <p name="dest">10.102.0.0</p>
          <p name="length">8</p>
          <p name="gw">10.130.8.0</p>
        </item>
        <item>
          <p name="dest">0.0.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.110.0.0</p>
        </item>
      </list>
    </managedObject>
  </cmData>
</raml>

使用内部值

10.104.0.0

16

10.38.0.0

10.107.0.0

16

10.45.0.0

10.104.0.0

8

10.130.8.0

10.102.0.0

8

10.130.8.0

0.0.0.0

16

10.110.0.0

我从另一篇文章()中获得了这段代码,但在尝试按MRBTS分组时,我被绊倒了:

import pandas as pd
df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                   'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                   'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                   'length':['16','16','8','8','16']})

def func(row):
    xml = ['<list >']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</list>')
    return '\n'.join(xml)



print ('\n'.join(df.apply(func, axis=1)))
将熊猫作为pd导入
df=pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
“dest”:['10.104.0.0'、'10.107.0.0'、'10.104.0.0'、'10.102.0.0'、'0.0.0'],
‘gw’:[‘10.48.0.0’、‘10.45.0.0’、‘10.130.0.0’、‘10.130.0.0’、‘10.110.0.0’],
'长度':['16','16','8','8','16']})
def func(世界其他地区):
xml=['']
对于row.index中的字段:
append(“{1}.”格式(字段,行[field]))
xml.append(“”)
返回'\n'。加入(xml)
打印('\n'.join(df.apply(func,axis=1)))
这个结果是:

<list >
  <field name="MRBTS">13004</field>
  <field name="dest">10.104.0.0</field>
  <field name="gw">10.48.0.0</field>
  <field name="length">16</field>
</list>
<list >
  <field name="MRBTS">13004</field>
  <field name="dest">10.107.0.0</field>
  <field name="gw">10.45.0.0</field>
  <field name="length">16</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">10.104.0.0</field>
  <field name="gw">10.130.0.0</field>
  <field name="length">8</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">10.102.0.0</field>
  <field name="gw">10.130.0.0</field>
  <field name="length">8</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">0.0.0.0</field>
  <field name="gw">10.110.0.0</field>
  <field name="length">16</field>
</list>

13004
10.104.0.0
10.48.0.0
16
13004
10.107.0.0
10.45.0.0
16
13005
10.104.0.0
10.130.0.0
8.
13005
10.102.0.0
10.130.0.0
8.
13005
0.0.0.0
10.110.0.0
16

您能帮我解决这个问题吗?

我认为关键在于首先为目标xml表示更好地构建数据

  • groupby MRBTS
  • 自定义聚合以返回属性的项列表。我使用了一些速记列表理解来准备KWARG,这些KWARG会转到
    agg()
  • 您现在从这个数据框架获得的JSON/dict的结构与您的目标需求相当
  • 我对XML很生疏,已经15年没有做过任何事情了。可能有更好的库可以将JSON转换为XML。这表明结构基本上在那里。一点XSLT就可以很容易地实现这一点
  • 在通过
    json2xml
  • 输出(仅第一条记录)

    
    13004
    10.104.0.0
    16
    10.48.0.0
    10.107.0.0
    16
    10.45.0.0
    
    我认为关键在于首先为目标xml表示更好地构建数据

  • groupby MRBTS
  • 自定义聚合以返回属性的项列表。我使用了一些速记列表理解来准备KWARG,这些KWARG会转到
    agg()
  • 您现在从这个数据框架获得的JSON/dict的结构与您的目标需求相当
  • 我对XML很生疏,已经15年没有做过任何事情了。可能有更好的库可以将JSON转换为XML。这表明结构基本上在那里。一点XSLT就可以很容易地实现这一点
  • 在通过
    json2xml
  • 输出(仅第一条记录)

    
    13004
    10.104.0.0
    16
    10.48.0.0
    10.107.0.0
    16
    10.45.0.0
    
    由于XML文档不是文本文档,因此避免使用字符串连接构建XML。取而代之的是考虑使用第三方<代码> LXML X/Cub >或内置模块>代码> EtRE < /C>(DOM方法)构建树(稍加修改)。对于数据,通过
    MRBTS
    字段迭代数据帧的子集:

    import lxml.etree as et
    import pandas as pd
    
    ### STATIC PART OF XML
    root = et.Element('raml', {"version": "2.0", "xmlns": "raml20.xsd"})
    
    cmData = et.SubElement(root, "cmData",
                           {"type":"plan", "scope":"all", "name":"iprt", "id":"PlanConfiguration( 7152069 )"})
    
    header = et.SubElement(cmData, "header")
    log = et.SubElement(header, "log",
                        {"dateTime":"2020-06-19T07:38:16.000-03:00", "action":"created", "appInfo":"PlanExporter"})
    log.text = "InternalValues are used"
    
    ### DYNAMIC PART OF XML
    df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                       'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                       'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                       'length':['16','16','8','8','16']})
    # SUBSET ITERATION                 
    for i, g in df.groupby("MRBTS"):
        managedObject = et.SubElement(cmData, "managedObject", {"distName":"MRBTS-"+i})
        list = et.SubElement(managedObject, "list")
        
        # BUILD DICTIONARY OUT OF EACH ROW
        d = g.drop('MRBTS', axis='columns').to_dict('index')
        
        for ik, iv in d.items():
            item = et.SubElement(list, 'item')
            for k, v in iv.items():
                p = et.SubElement(item, 'p', {"name":k})
                p.text = v
    
    # OUTPUT TREE
    tree = et.ElementTree(root)
    tree_out = tree.write("Output.xml",
                          xml_declaration=True, 
                          encoding="UTF-8",
                          pretty_print=True,
                          doctype="<!DOCTYPE raml SYSTEM 'raml20.dtd'>")
    
    将lxml.etree作为et导入
    作为pd进口熊猫
    ###XML的静态部分
    root=et.Element('raml',{“版本”:“2.0”,“xmlns”:“raml20.xsd”})
    cmData=et.SubElement(根,“cmData”,
    {“类型”:“计划”,“范围”:“全部”,“名称”:“iprt”,“id”:“计划配置(7152069)”})
    header=et.SubElement(cmData,“header”)
    log=et.SubElement(标题“log”,
    {“日期时间”:“2020-06-19T07:38:16.000-03:00”,“操作”:“已创建”,“应用信息”:“平面导出器”})
    log.text=“使用内部值”
    ###XML的动态部分
    df=pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
    “dest”:['10.104.0.0'、'10.107.0.0'、'10.104.0.0'、'10.102.0.0'、'0.0.0'],
    ‘gw’:[‘10.48.0.0’、‘10.45.0.0’、‘10.130.0.0’、‘10.130.0.0’、‘10.110.0.0’],
    '长度':['16','16','8','8','16']})
    #子集迭代
    对于df.groupby(“MRBTS”)中的i,g:
    managedObject=et.SubElement(cmData,“managedObject”,{“distName”:“MRBTS-”+i})
    list=et.SubElement(managedObject,“list”)
    #从每一行生成字典
    d=g.drop('MRBTS',axis='columns')。to_dict('index'))
    对于ik,d中的iv。项()
    item=et.SubElement(列表“item”)
    对于iv.项()中的k、v:
    p=et.SubElement(项,'p',{“name”:k})
    p、 text=v
    #输出树
    tree=et.ElementTree(根)
    tree\u out=tree.write(“Output.xml”,
    xml_声明=True,
    encoding=“UTF-8”,
    
    <?xml version="1.0" ?>
    <all>
        <item>
            <distName>13004</distName>
            <item>
                <item>
                    <dest>10.104.0.0</dest>
                    <length>16</length>
                    <gw>10.48.0.0</gw>
                </item>
                <item>
                    <dest>10.107.0.0</dest>
                    <length>16</length>
                    <gw>10.45.0.0</gw>
                </item>
            </item>
        </item>
    </all>
    
    
    import lxml.etree as et
    import pandas as pd
    
    ### STATIC PART OF XML
    root = et.Element('raml', {"version": "2.0", "xmlns": "raml20.xsd"})
    
    cmData = et.SubElement(root, "cmData",
                           {"type":"plan", "scope":"all", "name":"iprt", "id":"PlanConfiguration( 7152069 )"})
    
    header = et.SubElement(cmData, "header")
    log = et.SubElement(header, "log",
                        {"dateTime":"2020-06-19T07:38:16.000-03:00", "action":"created", "appInfo":"PlanExporter"})
    log.text = "InternalValues are used"
    
    ### DYNAMIC PART OF XML
    df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                       'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                       'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                       'length':['16','16','8','8','16']})
    # SUBSET ITERATION                 
    for i, g in df.groupby("MRBTS"):
        managedObject = et.SubElement(cmData, "managedObject", {"distName":"MRBTS-"+i})
        list = et.SubElement(managedObject, "list")
        
        # BUILD DICTIONARY OUT OF EACH ROW
        d = g.drop('MRBTS', axis='columns').to_dict('index')
        
        for ik, iv in d.items():
            item = et.SubElement(list, 'item')
            for k, v in iv.items():
                p = et.SubElement(item, 'p', {"name":k})
                p.text = v
    
    # OUTPUT TREE
    tree = et.ElementTree(root)
    tree_out = tree.write("Output.xml",
                          xml_declaration=True, 
                          encoding="UTF-8",
                          pretty_print=True,
                          doctype="<!DOCTYPE raml SYSTEM 'raml20.dtd'>")
    
    <?xml version='1.0' encoding='UTF-8'?>
    <!DOCTYPE raml SYSTEM 'raml20.dtd'>
    <raml version="2.0" xmlns="raml20.xsd">
      <cmData id="PlanConfiguration( 7152069 )" name="iprt" scope="all" type="plan">
        <header>
          <log action="created" appInfo="PlanExporter" dateTime="2020-06-19T07:38:16.000-03:00">InternalValues are used</log>
        </header>
        <managedObject distName="MRBTS-13004">
          <list>
            <item>
              <p name="dest">10.104.0.0</p>
              <p name="gw">10.48.0.0</p>
              <p name="length">16</p>
            </item>
            <item>
              <p name="dest">10.107.0.0</p>
              <p name="gw">10.45.0.0</p>
              <p name="length">16</p>
            </item>
          </list>
        </managedObject>
        <managedObject distName="MRBTS-13005">
          <list>
            <item>
              <p name="dest">10.104.0.0</p>
              <p name="gw">10.130.0.0</p>
              <p name="length">8</p>
            </item>
            <item>
              <p name="dest">10.102.0.0</p>
              <p name="gw">10.130.0.0</p>
              <p name="length">8</p>
            </item>
            <item>
              <p name="dest">0.0.0.0</p>
              <p name="gw">10.110.0.0</p>
              <p name="length">16</p>
            </item>
          </list>
        </managedObject>
      </cmData>
    </raml>