使用Python将XML转换为JSON?

使用Python将XML转换为JSON?,python,json,xml,converter,Python,Json,Xml,Converter,我在网上看到了相当一部分笨拙的XML->JSON代码,并且与Stack的用户进行了一段时间的互动,我相信这群人比谷歌搜索结果的前几页能提供更多的帮助 因此,我们正在解析一个天气提要,我们需要在许多网站上填充天气小部件。我们现在正在研究基于Python的解决方案 这是我们解析的一个很好的例子(我们实际的weather.com提要包含了额外的信息,因为我们与他们建立了合作关系) 简而言之,我们应该如何使用Python将XML转换为JSON?嗯,最简单的方法可能就是将XML解析为字典,然后用simpl

我在网上看到了相当一部分笨拙的XML->JSON代码,并且与Stack的用户进行了一段时间的互动,我相信这群人比谷歌搜索结果的前几页能提供更多的帮助

因此,我们正在解析一个天气提要,我们需要在许多网站上填充天气小部件。我们现在正在研究基于Python的解决方案

这是我们解析的一个很好的例子(我们实际的weather.com提要包含了额外的信息,因为我们与他们建立了合作关系)


简而言之,我们应该如何使用Python将XML转换为JSON?

嗯,最简单的方法可能就是将XML解析为字典,然后用simplejson序列化

XML和JSON之间没有“一对一”的映射,因此将两者转换为另一个必然需要对结果的处理有所了解

也就是说,Python的标准库有(包括DOM、SAX和ElementTree)。从Python2.6开始,支持将Python数据结构与JSON进行转换


因此,基础设施就在那里。

虽然用于XML解析的内置libs非常好,但我还是偏爱它

但对于解析RSS提要,我推荐,它也可以解析Atom。 它的主要优点是,它甚至可以消化大多数格式错误的提要

Python2.6已经包含了一个JSON解析器,但是更新的解析器也可以在下面的版本中使用


有了这些工具,构建你的应用程序应该不会那么困难。

你可能想看看。这个项目从一个大型XML文件库的XML到JSON转换开始。在转换过程中做了很多研究,并生成了最简单直观的XML->JSON映射(在文档的前面已经描述过)。总之,将所有内容转换为JSON对象,并将重复块作为对象列表

表示键/值对的对象(Python中的字典、Java中的hashmap、JavaScript中的对象)

没有映射回XML以获得相同的文档,原因是,不知道键/值对是属性还是
值,因此该信息丢失


如果你问我,属性是一个黑客开始;然后,它们同样适用于HTML。

或者如果您使用的是feedparser,您可以尝试使用一种方法将基于XML的标记传输为JSON,它允许无损地转换回其原始形式。看


它是JSON的一种XSLT。我希望您会觉得它很有用

我发现对于简单的XML片段,使用正则表达式可以省去麻烦。例如:

# <user><name>Happy Man</name>...</user>
import re
names = re.findall(r'<name>(\w+)<\/name>', xml_string)
# do some thing to names
#快乐的男人。。。
进口稀土
names=re.findall(r'(\w+),xml\u字符串)
#对名字做点什么
正如@Dan所说,要通过XML解析实现这一点,并没有一个适合所有人的解决方案,因为数据是不同的。我的建议是使用lxml。虽然还没有完成json,但给出了很好的结果:

>>> from lxml import objectify
>>> root = objectify.fromstring("""
... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...   <a attr1="foo" attr2="bar">1</a>
...   <a>1.2</a>
...   <b>1</b>
...   <b>true</b>
...   <c>what?</c>
...   <d xsi:nil="true"/>
... </root>
... """)

>>> print(str(root))
root = None [ObjectifiedElement]
    a = 1 [IntElement]
      * attr1 = 'foo'
      * attr2 = 'bar'
    a = 1.2 [FloatElement]
    b = 1 [IntElement]
    b = True [BoolElement]
    c = 'what?' [StringElement]
    d = None [NoneElement]
      * xsi:nil = 'true'
来自lxml导入objectify的
>
>>>root=objectify.fromstring(“”)
... 
1.
...   1.2
1.
……是的
…什么?
...   
... 
... """)
>>>打印(str(根))
根=无[目标删除]
a=1[单位]
*attr1='foo'
*attr2='bar'
a=1.2[1]
b=1[单位]
b=真[布尔值]
c=‘什么?’[StringElement]
d=无[非元素]
*xsi:nil='true'
(完全公开:我写的)可以帮助您将XML转换为dict+列表+字符串结构,如下所示。它是基于XML的,因此速度非常快,不需要在内存中加载整个XML树

拥有该数据结构后,可以将其序列化为JSON:

import xmltodict, json

o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'
导入xmltodict,json
o=xmltodict.parse('text-text')
json.dumps(o)#{“e”:{“a”:[“text”,“text”]}

我建议不要直接转换。将XML转换为对象,然后从对象转换为JSON

在我看来,这为XML和JSON的对应关系提供了更清晰的定义

这需要时间才能正确,您甚至可以编写工具来帮助您生成其中的一部分,但大致如下所示:

class Channel:
  def __init__(self)
    self.items = []
    self.title = ""

  def from_xml( self, xml_node ):
    self.title = xml_node.xpath("title/text()")[0]
    for x in xml_node.xpath("item"):
      item = Item()
      item.from_xml( x )
      self.items.append( item )

  def to_json( self ):
    retval = {}
    retval['title'] = title
    retval['items'] = []
    for x in items:
      retval.append( x.to_json() )
    return retval

class Item:
  def __init__(self):
    ...

  def from_xml( self, xml_node ):
    ...

  def to_json( self ):
    ...

这是我为此编写的代码。没有对内容的解析,只是简单的转换

from xml.dom import minidom
import simplejson as json
def parse_element(element):
    dict_data = dict()
    if element.nodeType == element.TEXT_NODE:
        dict_data['data'] = element.data
    if element.nodeType not in [element.TEXT_NODE, element.DOCUMENT_NODE, 
                                element.DOCUMENT_TYPE_NODE]:
        for item in element.attributes.items():
            dict_data[item[0]] = item[1]
    if element.nodeType not in [element.TEXT_NODE, element.DOCUMENT_TYPE_NODE]:
        for child in element.childNodes:
            child_name, child_dict = parse_element(child)
            if child_name in dict_data:
                try:
                    dict_data[child_name].append(child_dict)
                except AttributeError:
                    dict_data[child_name] = [dict_data[child_name], child_dict]
            else:
                dict_data[child_name] = child_dict 
    return element.nodeName, dict_data

if __name__ == '__main__':
    dom = minidom.parse('data.xml')
    f = open('data.json', 'w')
    f.write(json.dumps(parse_element(dom), sort_keys=True, indent=4))
    f.close()
from xml.etree import ElementTree as ET

xml    = ET.parse('FILE_NAME.xml')
parsed = parseXmlToJson(xml)


def parseXmlToJson(xml):
  response = {}

  for child in list(xml):
    if len(list(child)) > 0:
      response[child.tag] = parseXmlToJson(child)
    else:
      response[child.tag] = child.text or ''

    # one-liner equivalent
    # response[child.tag] = parseXmlToJson(child) if len(list(child)) > 0 else child.text or ''

  return response
您可以使用库使用不同的方法进行转换

例如,此XML:

<p id="1">text</p>
并通过导入此(不支持属性):

。。。并通过导入此(不支持属性):

可以使用相同的方法从XML转换为JSON,从JSON转换为XML 公约:

>>> import json, xmljson
>>> from lxml.etree import fromstring, tostring
>>> xml = fromstring('<p id="1">text</p>')
>>> json.dumps(xmljson.badgerfish.data(xml))
'{"p": {"@id": 1, "$": "text"}}'
>>> xmljson.parker.etree({'ul': {'li': [1, 2]}})
# Creates [<ul><li>1</li><li>2</li></ul>]
导入json,xmljson >>>从lxml.etree导入fromstring,tostring >>>xml=fromstring(“

text

”) >>>json.dumps(xmljson.badgerfish.data(xml)) “{”p:{@id:1,“$”:“text”}” >>>xmljson.parker.etree({'ul':{'li':[1,2]}) #创建[
  • 1
  • 2
    • ]

披露:我写了这个图书馆。希望它能对未来的搜索者有所帮助。

我的回答解决了一个特定的(而且有些常见的)情况,即您不需要将整个xml转换为json,但您需要的是遍历/访问xml的特定部分,并且需要它的快速简单(使用类似json/dict的操作)

方法 为此,需要注意的是,使用
lxml
将xml解析为etree非常快。在大多数其他答案中,最慢的部分是第二步:遍历etree结构(通常在python中),将其转换为json

这使我找到了最适合这种情况的方法:使用
lxml
解析xml,然后(惰性地)包装etree节点,为它们提供类似dict的接口

代码 代码如下:

from collections import Mapping
import lxml.etree

class ETreeDictWrapper(Mapping):

    def __init__(self, elem, attr_prefix = '@', list_tags = ()):
        self.elem = elem
        self.attr_prefix = attr_prefix
        self.list_tags = list_tags

    def _wrap(self, e):
        if isinstance(e, basestring):
            return e
        if len(e) == 0 and len(e.attrib) == 0:
            return e.text
        return type(self)(
            e,
            attr_prefix = self.attr_prefix,
            list_tags = self.list_tags,
        )

    def __getitem__(self, key):
        if key.startswith(self.attr_prefix):
            return self.elem.attrib[key[len(self.attr_prefix):]]
        else:
            subelems = [ e for e in self.elem.iterchildren() if e.tag == key ]
            if len(subelems) > 1 or key in self.list_tags:
                return [ self._wrap(x) for x in subelems ]
            elif len(subelems) == 1:
                return self._wrap(subelems[0])
            else:
                raise KeyError(key)

    def __iter__(self):
        return iter(set( k.tag for k in self.elem) |
                    set( self.attr_prefix + k for k in self.elem.attrib ))

    def __len__(self):
        return len(self.elem) + len(self.elem.attrib)

    # defining __contains__ is not necessary, but improves speed
    def __contains__(self, key):
        if key.startswith(self.attr_prefix):
            return key[len(self.attr_prefix):] in self.elem.attrib
        else:
            return any( e.tag == key for e in self.elem.iterchildren() )


def xml_to_dictlike(xmlstr, attr_prefix = '@', list_tags = ()):
    t = lxml.etree.fromstring(xmlstr)
    return ETreeDictWrapper(
        t,
        attr_prefix = '@',
        list_tags = set(list_tags),
    )
这个实现是不完整的,例如,它不完全支持一个元素同时具有文本和属性,或者同时具有文本和子元素的情况(只是因为我在编写它时不需要它…),但是改进它应该很容易

速度 就我而言
>>> import json, xmljson
>>> from lxml.etree import fromstring, tostring
>>> xml = fromstring('<p id="1">text</p>')
>>> json.dumps(xmljson.badgerfish.data(xml))
'{"p": {"@id": 1, "$": "text"}}'
>>> xmljson.parker.etree({'ul': {'li': [1, 2]}})
# Creates [<ul><li>1</li><li>2</li></ul>]
from collections import Mapping
import lxml.etree

class ETreeDictWrapper(Mapping):

    def __init__(self, elem, attr_prefix = '@', list_tags = ()):
        self.elem = elem
        self.attr_prefix = attr_prefix
        self.list_tags = list_tags

    def _wrap(self, e):
        if isinstance(e, basestring):
            return e
        if len(e) == 0 and len(e.attrib) == 0:
            return e.text
        return type(self)(
            e,
            attr_prefix = self.attr_prefix,
            list_tags = self.list_tags,
        )

    def __getitem__(self, key):
        if key.startswith(self.attr_prefix):
            return self.elem.attrib[key[len(self.attr_prefix):]]
        else:
            subelems = [ e for e in self.elem.iterchildren() if e.tag == key ]
            if len(subelems) > 1 or key in self.list_tags:
                return [ self._wrap(x) for x in subelems ]
            elif len(subelems) == 1:
                return self._wrap(subelems[0])
            else:
                raise KeyError(key)

    def __iter__(self):
        return iter(set( k.tag for k in self.elem) |
                    set( self.attr_prefix + k for k in self.elem.attrib ))

    def __len__(self):
        return len(self.elem) + len(self.elem.attrib)

    # defining __contains__ is not necessary, but improves speed
    def __contains__(self, key):
        if key.startswith(self.attr_prefix):
            return key[len(self.attr_prefix):] in self.elem.attrib
        else:
            return any( e.tag == key for e in self.elem.iterchildren() )


def xml_to_dictlike(xmlstr, attr_prefix = '@', list_tags = ()):
    t = lxml.etree.fromstring(xmlstr)
    return ETreeDictWrapper(
        t,
        attr_prefix = '@',
        list_tags = set(list_tags),
    )
def xml_to_json(xmlstr, **kwargs):
    x = xml_to_dictlike(xmlstr, **kwargs)
    return json.dumps(x)
from lxml import etree 
import json


class Element:
    '''
    Wrapper on the etree.Element class.  Extends functionality to output element
    as a dictionary.
    '''

    def __init__(self, element):
        '''
        :param: element a normal etree.Element instance
        '''
        self.element = element

    def toDict(self):
        '''
        Returns the element as a dictionary.  This includes all child elements.
        '''
        rval = {
            self.element.tag: {
                'attributes': dict(self.element.items()),
            },
        }
        for child in self.element:
            rval[self.element.tag].update(Element(child).toDict())
        return rval


class XmlDocument:
    '''
    Wraps lxml to provide:
        - cleaner access to some common lxml.etree functions
        - converter from XML to dict
        - converter from XML to json
    '''
    def __init__(self, xml = '<empty/>', filename=None):
        '''
        There are two ways to initialize the XmlDocument contents:
            - String
            - File

        You don't have to initialize the XmlDocument during instantiation
        though.  You can do it later with the 'set' method.  If you choose to
        initialize later XmlDocument will be initialized with "<empty/>".

        :param: xml Set this argument if you want to parse from a string.
        :param: filename Set this argument if you want to parse from a file.
        '''
        self.set(xml, filename) 

    def set(self, xml=None, filename=None):
        '''
        Use this to set or reset the contents of the XmlDocument.

        :param: xml Set this argument if you want to parse from a string.
        :param: filename Set this argument if you want to parse from a file.
        '''
        if filename is not None:
            self.tree = etree.parse(filename)
            self.root = self.tree.getroot()
        else:
            self.root = etree.fromstring(xml)
            self.tree = etree.ElementTree(self.root)


    def dump(self):
        etree.dump(self.root)

    def getXml(self):
        '''
        return document as a string
        '''
        return etree.tostring(self.root)

    def xpath(self, xpath):
        '''
        Return elements that match the given xpath.

        :param: xpath
        '''
        return self.tree.xpath(xpath);

    def nodes(self):
        '''
        Return all elements
        '''
        return self.root.iter('*')

    def toDict(self):
        '''
        Convert to a python dictionary
        '''
        return Element(self.root).toDict()

    def toJson(self, indent=None):
        '''
        Convert to JSON
        '''
        return json.dumps(self.toDict(), indent=indent)


if __name__ == "__main__":
    xml='''<system>
    <product>
        <demod>
            <frequency value='2.215' units='MHz'>
                <blah value='1'/>
            </frequency>
        </demod>
    </product>
</system>
'''
    doc = XmlDocument(xml)
    print doc.toJson(indent=4)
{
    "system": {
        "attributes": {}, 
        "product": {
            "attributes": {}, 
            "demod": {
                "attributes": {}, 
                "frequency": {
                    "attributes": {
                        "units": "MHz", 
                        "value": "2.215"
                    }, 
                    "blah": {
                        "attributes": {
                            "value": "1"
                        }
                    }
                }
            }
        }
    }
}
<system>
    <product>
        <demod>
            <frequency value='2.215' units='MHz'>
                <blah value='1'/>
            </frequency>
        </demod>
    </product>
</system>
from xml.etree import ElementTree as ET

xml    = ET.parse('FILE_NAME.xml')
parsed = parseXmlToJson(xml)


def parseXmlToJson(xml):
  response = {}

  for child in list(xml):
    if len(list(child)) > 0:
      response[child.tag] = parseXmlToJson(child)
    else:
      response[child.tag] = child.text or ''

    # one-liner equivalent
    # response[child.tag] = parseXmlToJson(child) if len(list(child)) > 0 else child.text or ''

  return response
import xmltodict

data = requests.get(url)
xpars = xmltodict.parse(data.text)
json = json.dumps(xpars)
print json 
[
  {
    "details": [
      {
        "@attributes": [
          {
            "class": "4b"
          },
          {
            "count": "1"
          },
          {
            "boy": ""
          }
        ]
      },
      {
        "$values": [
          {
            "name": [
              {
                "@attributes": [
                  {
                    "type": "firstname"
                  }
                ]
              },
              {
                "$values": "John"
              }
            ]
          },
          {
            "age": [
              {
                "@attributes": []
              },
              {
                "$values": "13"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Coin collection"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Stamp collection"
              }
            ]
          },
          {
            "address": [
              {
                "@attributes": []
              },
              {
                "$values": [
                  {
                    "country": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "USA"
                      }
                    ]
                  },
                  {
                    "state": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "CA"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": [
          {
            "empty": "True"
          }
        ]
      },
      {
        "$values": ""
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": []
      },
      {
        "$values": ""
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": [
          {
            "class": "4a"
          },
          {
            "count": "2"
          },
          {
            "girl": ""
          }
        ]
      },
      {
        "$values": [
          {
            "name": [
              {
                "@attributes": [
                  {
                    "type": "firstname"
                  }
                ]
              },
              {
                "$values": "Samantha"
              }
            ]
          },
          {
            "age": [
              {
                "@attributes": []
              },
              {
                "$values": "13"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Fishing"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Chess"
              }
            ]
          },
          {
            "address": [
              {
                "@attributes": [
                  {
                    "current": "no"
                  }
                ]
              },
              {
                "$values": [
                  {
                    "country": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "Australia"
                      }
                    ]
                  },
                  {
                    "state": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "NSW"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
]
python xml_to_json.py -x PurchaseOrder.xsd PurchaseOrder.xml

INFO - 2018-03-20 11:10:24 - Parsing XML Files..
INFO - 2018-03-20 11:10:24 - Processing 1 files
INFO - 2018-03-20 11:10:24 - Parsing files in the following order:
INFO - 2018-03-20 11:10:24 - ['PurchaseOrder.xml']
DEBUG - 2018-03-20 11:10:24 - Generating schema from PurchaseOrder.xsd
DEBUG - 2018-03-20 11:10:24 - Parsing PurchaseOrder.xml
DEBUG - 2018-03-20 11:10:24 - Writing to file PurchaseOrder.json
DEBUG - 2018-03-20 11:10:24 - Completed PurchaseOrder.xml