通过python数据框架将XML文件转换为CSV
我最近问了一个问题,这个问题已经结束了,所以我试图让它不那么宽泛。我的问题是,我不知道从哪里开始解决这个问题,所以我不能真正展示我“已经尝试过”的东西。无法在网上找到任何有帮助的内容 我有一个遵循以下格式的开源XML文件:通过python数据框架将XML文件转换为CSV,python,xml,Python,Xml,我最近问了一个问题,这个问题已经结束了,所以我试图让它不那么宽泛。我的问题是,我不知道从哪里开始解决这个问题,所以我不能真正展示我“已经尝试过”的东西。无法在网上找到任何有帮助的内容 我有一个遵循以下格式的开源XML文件: <surnames> <cluster> <surname lang="ga" text="Achaorainn" anchor="Achaorainn"/> <surname lang="en
<surnames>
<cluster>
<surname lang="ga" text="Achaorainn" anchor="Achaorainn"/>
<surname lang="en" text="Ahern" anchor="Ahern"/>
<surname lang="en" text="Aherne" anchor="Aherne"/>
<surname lang="en" text="Ahearne" anchor="Ahearne"/>
</cluster>
<cluster>
<surname lang="en" text="Achison" anchor="Achison"/>
<surname lang="en" text="Atchison" anchor="Atchison"/>
</cluster>
<cluster>
<surname lang="en" text="Adams" anchor="Adams"/>
<surname lang="ga" text="Mac Conamha" anchor="Conamha"/>
</cluster>
<cluster>
<surname lang="ga" text="Ághas" anchor="Ághas"/>
<surname lang="en" text="Ashe" anchor="Ashe"/>
<surname lang="ga" text="Ás" anchor="Ás"/>
</cluster>
<cluster>
<surname lang="en" text="Young" anchor="Young"/>
<surname lang="ga" text="Ó Hógáin" anchor="Hógáin"/>
<surname lang="ga" text="de Siún" anchor="Siún"/>
</cluster>
</surnames>
我从来没有尝试过这样的事情,所以即使只是给我指出正确的方向也是一个巨大的帮助
我想先转换到dataframe,然后再转换到CSV
我尝试将此作为起点,但我甚至无法使其工作,因为我认为它在objectify.parse阶段失败:
import csv
import pandas as pd
import xml.etree.ElementTree as ET
#%%
xml = objectify.parse('surnames_reduced.xml')
root = xml.getroot()
data=[]
for i in range(len(root.getchildren())):
data.append([child.text for child in root.getchildren()[i].getchildren()])
df = pd.DataFrame(data).T
使用etree,将可直接转换为CSV的列表另存为列表:
import lxml.etree
import csv
# xml = lxml.etree.parse('z.xml')
xml = lxml.etree.fromstring(open('z.xml').read()) # in case there is no XML declaration!
result=[]
for cluster in xml.xpath('//cluster'):
names = []
for child in cluster.getchildren():
names.append(child.get('text')) # reads the name attribute
result.append(names)
with open("out.csv", "w") as f:
writer = csv.writer(f)
writer.writerows(result)
print(open('out.csv').read())
输出:
Achaorainn,Ahern,Aherne,Ahearne
Achison,Atchison
Adams,Mac Conamha
Ághas,Ashe,Ás
Young,Ó Hógáin,de Siún
使用python内置XML库不需要外部库
我在第4行得到一个解析错误。我的XML文件看起来和我上面粘贴的完全一样,我需要一个外部对象吗?您的XML文件无效,XML头丢失-我假设它在这里粘贴时丢失了。是的,它工作得很好。非常感谢,您可以看到我的XML体验是不存在的
Achaorainn,Ahern,Aherne,Ahearne
Achison,Atchison
Adams,Mac Conamha
Ághas,Ashe,Ás
Young,Ó Hógáin,de Siún
import xml.etree.ElementTree as ET
xml = '''<surnames>
<cluster>
<surname lang="ga" text="Achaorainn" anchor="Achaorainn"/>
<surname lang="en" text="Ahern" anchor="Ahern"/>
<surname lang="en" text="Aherne" anchor="Aherne"/>
<surname lang="en" text="Ahearne" anchor="Ahearne"/>
</cluster>
<cluster>
<surname lang="en" text="Achison" anchor="Achison"/>
<surname lang="en" text="Atchison" anchor="Atchison"/>
</cluster>
<cluster>
<surname lang="en" text="Adams" anchor="Adams"/>
<surname lang="ga" text="Mac Conamha" anchor="Conamha"/>
</cluster>
<cluster>
<surname lang="ga" text="Ághas" anchor="Ághas"/>
<surname lang="en" text="Ashe" anchor="Ashe"/>
<surname lang="ga" text="Ás" anchor="Ás"/>
</cluster>
<cluster>
<surname lang="en" text="Young" anchor="Young"/>
<surname lang="ga" text="Ó Hógáin" anchor="Hógáin"/>
<surname lang="ga" text="de Siún" anchor="Siún"/>
</cluster>
</surnames>'''
root = ET.fromstring(xml)
data = []
for c in root.findall('.//cluster'):
data.append([s.attrib['text'] for s in c.findall('./surname')])
for entry in data:
print(','.join(entry))
Achaorainn,Ahern,Aherne,Ahearne
Achison,Atchison
Adams,Mac Conamha
Ághas,Ashe,Ás
Young,Ó Hógáin,de Siún