Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:遍历xmltodict创建的所有嵌套键值对_Python_Xml_Recursion - Fatal编程技术网

Python:遍历xmltodict创建的所有嵌套键值对

Python:遍历xmltodict创建的所有嵌套键值对,python,xml,recursion,Python,Xml,Recursion,基于xml文件的布局获取特定值非常简单。见: 但是当我不知道xml元素时,我就不能在它上面递归。 因为xmltodoc在OrderedDicts中嵌套OrderedDicts。Python将这些嵌套的OrderedPict表示为类型:“unicode”。而且还不如预定的那样。因此,像这样循环是行不通的: def myprint(d): for k, v in d.iteritems(): if isinstance(v, list): myprin

基于xml文件的布局获取特定值非常简单。见:

但是当我不知道xml元素时,我就不能在它上面递归。 因为xmltodoc在OrderedDicts中嵌套OrderedDicts。Python将这些嵌套的OrderedPict表示为类型:“unicode”。而且还不如预定的那样。因此,像这样循环是行不通的:

def myprint(d):
    for k, v in d.iteritems():
        if isinstance(v, list):
            myprint(v)
        else:
            print "Key :{0},  Value: {1}".format(k, v)
我基本上希望递归整个xml文件,其中显示每个键值对。当一个键的值是另一个键-值对列表时,它应该递归到其中

使用此xml文件作为输入:

<?xml version="1.0" encoding="utf-8"?>
<session id="2934" name="Valves" docVersion="5.0.1">
    <docInfo>
        <field name="Employee" isMandotory="True">Jake Roberts</field>
        <field name="Section" isOpen="True" isMandotory="False">5</field>
        <field name="Location" isOpen="True" isMandotory="False">Munchen</field>
    </docInfo>
</session>

这显然不是我想要的。

如果您在数据中遇到一个列表,那么您只需要在列表的每个元素上调用myprint:

def myprint(d):
    if isinstance(d,dict): #check if it's a dict before using .iteritems()
        for k, v in d.iteritems():
            if isinstance(v, (list,dict)): #check for either list or dict
                myprint(v)
            else:
                print "Key :{0},  Value: {1}".format(k, v)
    elif isinstance(d,list): #allow for list input too
        for item in d:
            myprint(item)
然后您将得到如下输出:

...
Key :@name,  Value: Employee
Key :@isMandotory,  Value: True
Key :#text,  Value: Jake Roberts
Key :@name,  Value: Section
Key :@isOpen,  Value: True
Key :@isMandotory,  Value: False
Key :#text,  Value: 5
...
虽然我不确定这是否有用,因为您有许多重复键,如@name,但我想提供一个我刚才创建的函数,用于遍历嵌套dict和list的嵌套json数据:

然后,您可以使用以下工具遍历数据:

for path,value in traverse(doc):
    print("{} = {}".format(path,value))
使用prev_path和path_repr的默认值,它会给出如下输出:

obj[u'session'][u'@id'] = 2934
obj[u'session'][u'@name'] = Valves
obj[u'session'][u'@docVersion'] = 5.0.1
obj[u'session'][u'docInfo'][u'field'][0][u'@name'] = Employee
obj[u'session'][u'docInfo'][u'field'][0][u'@isMandotory'] = True
obj[u'session'][u'docInfo'][u'field'][0]['#text'] = Jake Roberts
obj[u'session'][u'docInfo'][u'field'][1][u'@name'] = Section
obj[u'session'][u'docInfo'][u'field'][1][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][1][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][1]['#text'] = 5
obj[u'session'][u'docInfo'][u'field'][2][u'@name'] = Location
obj[u'session'][u'docInfo'][u'field'][2][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][2][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][2]['#text'] = Munchen
尽管您可以为path_repr编写一个函数,以获取通过递归调用path_repr和新键确定的prev_path的值,例如,一个函数获取一个元组并在末尾添加另一个元素,这意味着我们可以得到一个索引元组:elem格式,它非常适合传递给dict构造函数

def _tuple_concat(tup, idx):
    return (*tup, idx)   
def flatten_data(obj):
    """converts nested dict and list structure into a flat dictionary with tuple keys
    corresponding to the sequence of indices to reach particular element"""
    return dict(traverse(obj, (), _tuple_concat))

new_data = flatten_data(obj)
import pprint
pprint.pprint(new_data)
这将为您提供此字典格式的数据:

{('session', '@docVersion'): '5.0.1',
 ('session', '@id'): 2934,
 ('session', '@name'): 'Valves',
 ('session', 'docInfo', 'field', 0, '#text'): 'Jake Roberts',
 ('session', 'docInfo', 'field', 0, '@isMandotory'): True,
 ('session', 'docInfo', 'field', 0, '@name'): 'Employee',
 ('session', 'docInfo', 'field', 1, '#text'): 5,
 ('session', 'docInfo', 'field', 1, '@isMandotory'): False,
 ('session', 'docInfo', 'field', 1, '@isOpen'): True,
 ('session', 'docInfo', 'field', 1, '@name'): 'Section',
 ('session', 'docInfo', 'field', 2, '#text'): 'Munchen',
 ('session', 'docInfo', 'field', 2, '@isMandotory'): False,
 ('session', 'docInfo', 'field', 2, '@isOpen'): True,
 ('session', 'docInfo', 'field', 2, '@name'): 'Location'}

我发现这在处理json数据时特别有用,但我不确定您想用xml做什么。

您在else语句上的缩进是在for循环之后,我非常确定这不是您想要的,为什么您不能为isinstancev添加一个案例,list?哦,是的。打字错误谢谢。您正在尝试解析xml吗?或者你要去哪里?是的,我正在尝试解析xml。只需要从xml文件中获取所有具有相应键的值,而不需要按字面上的方式命名xml元素。太棒了!工作起来很有魅力。特别是遍历函数!非常感谢你!这太酷了。我用它来创建一个pandas数据框架,这样我就可以将json和xml.NET与pd.DataFrame进行比较。from_recordsdata=[traversexml中tup的tup\u dict,root\u name],columns=['key','value'],index='key'
def _tuple_concat(tup, idx):
    return (*tup, idx)   
def flatten_data(obj):
    """converts nested dict and list structure into a flat dictionary with tuple keys
    corresponding to the sequence of indices to reach particular element"""
    return dict(traverse(obj, (), _tuple_concat))

new_data = flatten_data(obj)
import pprint
pprint.pprint(new_data)
{('session', '@docVersion'): '5.0.1',
 ('session', '@id'): 2934,
 ('session', '@name'): 'Valves',
 ('session', 'docInfo', 'field', 0, '#text'): 'Jake Roberts',
 ('session', 'docInfo', 'field', 0, '@isMandotory'): True,
 ('session', 'docInfo', 'field', 0, '@name'): 'Employee',
 ('session', 'docInfo', 'field', 1, '#text'): 5,
 ('session', 'docInfo', 'field', 1, '@isMandotory'): False,
 ('session', 'docInfo', 'field', 1, '@isOpen'): True,
 ('session', 'docInfo', 'field', 1, '@name'): 'Section',
 ('session', 'docInfo', 'field', 2, '#text'): 'Munchen',
 ('session', 'docInfo', 'field', 2, '@isMandotory'): False,
 ('session', 'docInfo', 'field', 2, '@isOpen'): True,
 ('session', 'docInfo', 'field', 2, '@name'): 'Location'}