通过python数据框架将XML文件转换为CSV

通过python数据框架将XML文件转换为CSV,python,xml,Python,Xml,我最近问了一个问题,这个问题已经结束了,所以我试图让它不那么宽泛。我的问题是,我不知道从哪里开始解决这个问题,所以我不能真正展示我“已经尝试过”的东西。无法在网上找到任何有帮助的内容 我有一个遵循以下格式的开源XML文件: <surnames> <cluster> <surname lang="ga" text="Achaorainn" anchor="Achaorainn"/> <surname lang="en

我最近问了一个问题,这个问题已经结束了,所以我试图让它不那么宽泛。我的问题是,我不知道从哪里开始解决这个问题,所以我不能真正展示我“已经尝试过”的东西。无法在网上找到任何有帮助的内容

我有一个遵循以下格式的开源XML文件:

<surnames>
    <cluster>
        <surname lang="ga" text="Achaorainn" anchor="Achaorainn"/>
        <surname lang="en" text="Ahern" anchor="Ahern"/>
        <surname lang="en" text="Aherne" anchor="Aherne"/>
        <surname lang="en" text="Ahearne" anchor="Ahearne"/>
    </cluster>
    <cluster>
        <surname lang="en" text="Achison" anchor="Achison"/>
        <surname lang="en" text="Atchison" anchor="Atchison"/>
    </cluster>
    <cluster>
        <surname lang="en" text="Adams" anchor="Adams"/>
        <surname lang="ga" text="Mac Conamha" anchor="Conamha"/>
    </cluster>
    <cluster>
        <surname lang="ga" text="Ághas" anchor="Ághas"/>
        <surname lang="en" text="Ashe" anchor="Ashe"/>
        <surname lang="ga" text="Ás" anchor="Ás"/>
    </cluster>
    <cluster>
        <surname lang="en" text="Young" anchor="Young"/>
        <surname lang="ga" text="Ó Hógáin" anchor="Hógáin"/>
        <surname lang="ga" text="de Siún" anchor="Siún"/>
    </cluster>
</surnames>
我从来没有尝试过这样的事情,所以即使只是给我指出正确的方向也是一个巨大的帮助

我想先转换到dataframe,然后再转换到CSV

我尝试将此作为起点,但我甚至无法使其工作,因为我认为它在objectify.parse阶段失败:

import csv
import pandas as pd
import xml.etree.ElementTree as ET

#%%

xml = objectify.parse('surnames_reduced.xml')
root = xml.getroot()

data=[]
for i in range(len(root.getchildren())):
    data.append([child.text for child in root.getchildren()[i].getchildren()])

df = pd.DataFrame(data).T

使用etree,将可直接转换为CSV的列表另存为列表:

import lxml.etree
import csv

#  xml = lxml.etree.parse('z.xml')
xml = lxml.etree.fromstring(open('z.xml').read())  # in case there is no XML declaration!
result=[]
for cluster in xml.xpath('//cluster'):
    names = []
    for child in cluster.getchildren():
        names.append(child.get('text'))  # reads the name attribute
    result.append(names)

with open("out.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(result)

print(open('out.csv').read())
输出:

Achaorainn,Ahern,Aherne,Ahearne
Achison,Atchison
Adams,Mac Conamha
Ághas,Ashe,Ás
Young,Ó Hógáin,de Siún

使用python内置XML库不需要外部库


我在第4行得到一个解析错误。我的XML文件看起来和我上面粘贴的完全一样,我需要一个外部对象吗?您的XML文件无效,XML头丢失-我假设它在这里粘贴时丢失了。是的,它工作得很好。非常感谢,您可以看到我的XML体验是不存在的
Achaorainn,Ahern,Aherne,Ahearne
Achison,Atchison
Adams,Mac Conamha
Ághas,Ashe,Ás
Young,Ó Hógáin,de Siún
import xml.etree.ElementTree as ET

xml = '''<surnames>
    <cluster>
        <surname lang="ga" text="Achaorainn" anchor="Achaorainn"/>
        <surname lang="en" text="Ahern" anchor="Ahern"/>
        <surname lang="en" text="Aherne" anchor="Aherne"/>
        <surname lang="en" text="Ahearne" anchor="Ahearne"/>
    </cluster>
    <cluster>
        <surname lang="en" text="Achison" anchor="Achison"/>
        <surname lang="en" text="Atchison" anchor="Atchison"/>
    </cluster>
    <cluster>
        <surname lang="en" text="Adams" anchor="Adams"/>
        <surname lang="ga" text="Mac Conamha" anchor="Conamha"/>
    </cluster>
    <cluster>
        <surname lang="ga" text="Ághas" anchor="Ághas"/>
        <surname lang="en" text="Ashe" anchor="Ashe"/>
        <surname lang="ga" text="Ás" anchor="Ás"/>
    </cluster>
    <cluster>
        <surname lang="en" text="Young" anchor="Young"/>
        <surname lang="ga" text="Ó Hógáin" anchor="Hógáin"/>
        <surname lang="ga" text="de Siún" anchor="Siún"/>
    </cluster>
</surnames>'''

root = ET.fromstring(xml)
data = []
for c in root.findall('.//cluster'):
    data.append([s.attrib['text'] for s in c.findall('./surname')])
for entry in data:
    print(','.join(entry))
Achaorainn,Ahern,Aherne,Ahearne
Achison,Atchison
Adams,Mac Conamha
Ághas,Ashe,Ás
Young,Ó Hógáin,de Siún