Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
尝试解析xml文件并使用Python将文本数据放入字典。关键错误:0_Python_Xml_Csv_Parsing_Elementtree - Fatal编程技术网

尝试解析xml文件并使用Python将文本数据放入字典。关键错误:0

尝试解析xml文件并使用Python将文本数据放入字典。关键错误:0,python,xml,csv,parsing,elementtree,Python,Xml,Csv,Parsing,Elementtree,我正在使用Python elementTree包解析一个XML文件(如下) <?xml version="1.0" encoding="Cp1252"?> <CATALOG> <CD> <COLUMN NAME='TITLE'>Empire Burlesque</COLUMN> <COLUMN NAME='ARTIST'>Bob Dylan</COLUMN> <COLUMN NA

我正在使用Python elementTree包解析一个XML文件(如下)

<?xml version="1.0" encoding="Cp1252"?>
<CATALOG>
  <CD>
    <COLUMN NAME='TITLE'>Empire Burlesque</COLUMN>
    <COLUMN NAME='ARTIST'>Bob Dylan</COLUMN>
    <COLUMN NAME='COUNTRY'>USA</COLUMN>
    <COLUMN NAME='COMPANY'>Columbia</COLUMN>
    <COLUMN NAME='PRICE'>10.90</COLUMN>
    <COLUMN NAME='YEAR'>1985</COLUMN>
  </CD>
  <CD>
    <COLUMN NAME='TITLE'>Hide your heart</COLUMN>
    <COLUMN NAME='ARTIST'>Bonnie Tyler</COLUMN>
    <COLUMN NAME='COUNTRY'>UK</COLUMN>
    <COLUMN NAME='COMPANY'>CBS Records</COLUMN>
    <COLUMN NAME='PRICE'>9.90</COLUMN>
    <COLUMN NAME='YEAR'>1988</COLUMN>
  </CD>
  <CD>
    <COLUMN NAME='TITLE'>Greatest Hits</COLUMN>
    <COLUMN NAME='ARTIST'>Dolly Parton</COLUMN>
    <COLUMN NAME='COUNTRY'>USA</COLUMN>
    <COLUMN NAME='COMPANY'>RCA</COLUMN>
    <COLUMN NAME='PRICE'>9.90</COLUMN>
    <COLUMN NAME='YEAR'>1982</COLUMN>
  </CD>
</CATALOG>
此代码导致键错误:0

$ python sample.py                                                                                                                                                      Traceback (most recent call last):
  File "sample.py", line 30, in <module>
    k = tocsv[0].keys()
KeyError: 0
$python sample.py回溯(最后一次调用):
文件“sample.py”,第30行,在
k=tocsv[0]。键()
关键错误:0

有没有办法解决这个问题并将数据输入到CSV文件中,而不使用重复项?

使用
findall
可能会简化一下:

In [20]: x = """
    ...: <CATALOG>
    ...:   <CD>
    ...:     <COLUMN NAME='TITLE'>Empire Burlesque</COLUMN>
    ...:     <COLUMN NAME='ARTIST'>Bob Dylan</COLUMN>
    ...:     <COLUMN NAME='COUNTRY'>USA</COLUMN>
    ...:     <COLUMN NAME='COMPANY'>Columbia</COLUMN>
    ...:     <COLUMN NAME='PRICE'>10.90</COLUMN>
    ...:     <COLUMN NAME='YEAR'>1985</COLUMN>
    ...:   </CD>
    ...:   <CD>
    ...:     <COLUMN NAME='TITLE'>Hide your heart</COLUMN>
    ...:     <COLUMN NAME='ARTIST'>Bonnie Tyler</COLUMN>
    ...:     <COLUMN NAME='COUNTRY'>UK</COLUMN>
    ...:     <COLUMN NAME='COMPANY'>CBS Records</COLUMN>
    ...:     <COLUMN NAME='PRICE'>9.90</COLUMN>
    ...:     <COLUMN NAME='YEAR'>1988</COLUMN>
    ...:   </CD>
    ...:   <CD>
    ...:     <COLUMN NAME='TITLE'>Greatest Hits</COLUMN>
    ...:     <COLUMN NAME='ARTIST'>Dolly Parton</COLUMN>
    ...:     <COLUMN NAME='COUNTRY'>USA</COLUMN>
    ...:     <COLUMN NAME='COMPANY'>RCA</COLUMN>
    ...:     <COLUMN NAME='PRICE'>9.90</COLUMN>
    ...:     <COLUMN NAME='YEAR'>1982</COLUMN>
    ...:   </CD>
    ...: </CATALOG>"""

In [21]:

In [21]: xdata = fromstring(x)

In [22]: results = []

In [23]: for cd in xdata.findall('.//CD'):
    ...:     each_result = {}
    ...:     for each in cd.findall('.//COLUMN'):
    ...:         each_result[each.attrib.get('NAME')] = each.text
    ...:     results.append(each_result)


首先,我想你指的是
orglist[0].keys()
,而不是
tocsv[0].keys()
。这将解决您的错误

根据你的第二个问题是:

有没有办法解决这个问题,并将数据放入CSV文件中,而无需重复

答案是肯定的,您可以使用
pandas.DataFrame
在三行代码中实现这一点,如下所示:

>>> import pandas as pd

>>> df = pd.DataFrame(orglist)
>>> df.drop_duplicates(inplace=True)
>>> print(df)
编辑 因此,您的代码应该如下所示:

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import fromstring
import pandas as pd


tree = ET.parse('sample.xml')
root = tree.getroot()

orglist = []
for child in root:
    orgdata = {}
    for sub in child:
        if sub.attrib.get('NAME') == 'TITLE':
            orgdata['TITLE'] = sub.text
        if sub.attrib.get('NAME') == 'ARTIST':
            orgdata['ARTIST'] = sub.text
        if sub.attrib.get('NAME') == 'COUNTRY':
            orgdata['COUNTRY'] = sub.text
        if sub.attrib.get('NAME') == 'COMPANY':
            orgdata['COMPANY'] = sub.text
        if sub.attrib.get('NAME') == 'PRICE':
            orgdata['PRICE'] = sub.text
        if sub.attrib.get('NAME') == 'YEAR':
            orgdata['YEAR'] = sub.text
        tocsv = orgdata
    orglist.append(orgdata)

df = pd.DataFrame(orglist)
df.drop_duplicates(inplace=True)
print(df)
将打印:

         ARTIST      COMPANY COUNTRY  PRICE             TITLE  YEAR
0     Bob Dylan     Columbia     USA  10.90  Empire Burlesque  1985
1  Bonnie Tyler  CBS Records      UK   9.90   Hide your heart  1988
2  Dolly Parton          RCA     USA   9.90     Greatest Hits  1982

谢谢,解决方案很有效。对于副本,我尝试使用pandas,它工作得很好(比我的其他解决方案更好),但它每次都会打印出标题和值。我尝试了另一种解决方案:“[link]()”,但仍然不起作用。任何建议。我编辑了我的答案。。。希望这能回答您的问题:)此解决方案有效。我可以删除重复的内容。但是,我不应该使用findall。我可以用find或其他函数来完成吗?非常感谢。
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import fromstring
import pandas as pd


tree = ET.parse('sample.xml')
root = tree.getroot()

orglist = []
for child in root:
    orgdata = {}
    for sub in child:
        if sub.attrib.get('NAME') == 'TITLE':
            orgdata['TITLE'] = sub.text
        if sub.attrib.get('NAME') == 'ARTIST':
            orgdata['ARTIST'] = sub.text
        if sub.attrib.get('NAME') == 'COUNTRY':
            orgdata['COUNTRY'] = sub.text
        if sub.attrib.get('NAME') == 'COMPANY':
            orgdata['COMPANY'] = sub.text
        if sub.attrib.get('NAME') == 'PRICE':
            orgdata['PRICE'] = sub.text
        if sub.attrib.get('NAME') == 'YEAR':
            orgdata['YEAR'] = sub.text
        tocsv = orgdata
    orglist.append(orgdata)

df = pd.DataFrame(orglist)
df.drop_duplicates(inplace=True)
print(df)
         ARTIST      COMPANY COUNTRY  PRICE             TITLE  YEAR
0     Bob Dylan     Columbia     USA  10.90  Empire Burlesque  1985
1  Bonnie Tyler  CBS Records      UK   9.90   Hide your heart  1988
2  Dolly Parton          RCA     USA   9.90     Greatest Hits  1982