Python 计算JSON叶节点

Python 计算JSON叶节点,python,json,recursion,count,yield,Python,Json,Recursion,Count,Yield,我想计算JSON结构中的叶节点(即,只有那些没有更多子元素的键)的数量 我找不到任何明显的方法来实现这一点,所以我一直在尝试编写一个函数,但我很难找到一个不使用全局变量就能工作的函数 这就是我到目前为止所做的: def count_leafs(nested): is isinstance(nested, Mapping): for k, v in nested.items(): if isinstance(v, Mapping): for i_k, i_v

我想计算JSON结构中的叶节点(即,只有那些没有更多子元素的键)的数量

我找不到任何明显的方法来实现这一点,所以我一直在尝试编写一个函数,但我很难找到一个不使用全局变量就能工作的函数

这就是我到目前为止所做的:

def count_leafs(nested):
  is isinstance(nested, Mapping):
    for k, v in nested.items():
      if isinstance(v, Mapping):
        for i_k, i_v in count_leafs(v):
          yield i_k, i_v
      elif isinstance(v, MutableSequence):
        for i_k in v:
          for i_i_k, i_i_v in i_k.items():
            count_leafs(i_i_v)
      else:
        yield k, v
  elif isinstance(nested, MutableSequence):
    for k in nested:
      count_leafs(k)


for k,v in count_leafs(json):
 leaf_count += 1

当一些非叶节点被计数时,它实际上不起作用,并且它不会一直递归到一些结构中

您的伪代码过于复杂且有缺陷。我还建议您为自己和阅读您编写的代码的其他人编写紧跟主题的代码

无论如何,作为一个测试用例,假设您有一些JSON数据,如下所示:

json_data = {
    "glossary": {
        "title": "example glossary",
        "answer": 42,
        "boolean": True,
        "nada": None,
        "GlossDiv": {
            "GlossList": {
                "GlossEntry": {
                    "GlossDef": {
                        "GlossSeeAlso": [
                            "GML",
                            "XML"
                        ],
                        "para": "A meta-markup language, used to create markup "
                                "languages such as DocBook."
                    },
                    "GlossSee": "markup",
                    "Acronym": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "SortAs": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "ID": "SGML"
                }
            },
            "title": "S"
        }
    }
}
from collections import Mapping, MutableSequence

def count_leaves(json_obj):

    def leaf_iterator(json_obj):
        if isinstance(json_obj, Mapping):
            for v in json_obj.values():
                for obj in leaf_iterator(v):
                    yield obj
        elif isinstance(json_obj, MutableSequence):
            for v in json_obj:
                for obj in leaf_iterator(v):
                    yield obj
        else:
            yield json_obj

    return sum(1 for leaf in leaf_iterator(json_obj))

leaf_count = count_leaves(json_data)
print('leaf count: {}'.format(leaf_count))  # -> leaf_count: 14
您可以像这样递归地计算树叶数:

json_data = {
    "glossary": {
        "title": "example glossary",
        "answer": 42,
        "boolean": True,
        "nada": None,
        "GlossDiv": {
            "GlossList": {
                "GlossEntry": {
                    "GlossDef": {
                        "GlossSeeAlso": [
                            "GML",
                            "XML"
                        ],
                        "para": "A meta-markup language, used to create markup "
                                "languages such as DocBook."
                    },
                    "GlossSee": "markup",
                    "Acronym": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "SortAs": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "ID": "SGML"
                }
            },
            "title": "S"
        }
    }
}
from collections import Mapping, MutableSequence

def count_leaves(json_obj):

    def leaf_iterator(json_obj):
        if isinstance(json_obj, Mapping):
            for v in json_obj.values():
                for obj in leaf_iterator(v):
                    yield obj
        elif isinstance(json_obj, MutableSequence):
            for v in json_obj:
                for obj in leaf_iterator(v):
                    yield obj
        else:
            yield json_obj

    return sum(1 for leaf in leaf_iterator(json_obj))

leaf_count = count_leaves(json_data)
print('leaf count: {}'.format(leaf_count))  # -> leaf_count: 14

我将
leaf\u iterator()
生成器嵌套在leaf counting函数中,但是如果在更大的上下文中证明它有用,也可以在外部定义它。在Python 3中,通过使用Python 3.3版中引入的a,可以进一步简化其中的代码。

一般来说,我更喜欢非递归解决方案,而不是递归解决方案。我的算法是这样工作的:

json_data = {
    "glossary": {
        "title": "example glossary",
        "answer": 42,
        "boolean": True,
        "nada": None,
        "GlossDiv": {
            "GlossList": {
                "GlossEntry": {
                    "GlossDef": {
                        "GlossSeeAlso": [
                            "GML",
                            "XML"
                        ],
                        "para": "A meta-markup language, used to create markup "
                                "languages such as DocBook."
                    },
                    "GlossSee": "markup",
                    "Acronym": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "SortAs": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "ID": "SGML"
                }
            },
            "title": "S"
        }
    }
}
from collections import Mapping, MutableSequence

def count_leaves(json_obj):

    def leaf_iterator(json_obj):
        if isinstance(json_obj, Mapping):
            for v in json_obj.values():
                for obj in leaf_iterator(v):
                    yield obj
        elif isinstance(json_obj, MutableSequence):
            for v in json_obj:
                for obj in leaf_iterator(v):
                    yield obj
        else:
            yield json_obj

    return sum(1 for leaf in leaf_iterator(json_obj))

leaf_count = count_leaves(json_data)
print('leaf count: {}'.format(leaf_count))  # -> leaf_count: 14
  • 初始化队列并将json对象放入其中
  • 在队列不为空时循环
  • 从队列中获取一个节点
    • 如果是映射,请将所有值添加到队列中以供以后处理
    • 如果它是一个序列或一个集合(注意:字符串也是序列——我们需要对其进行测试),我们将所有元素添加到队列中以供以后处理
    • 如果不是上述任何一项,则计算它
  • 代码如下:

    from collections import Mapping, Sequence, Set, deque
    
    def count_leaves(nested):
        queue = deque([nested])
        count = 0
        while queue:
            node = queue.popleft()
            if isinstance(node, Mapping):
                queue.extend(node.values())
            elif isinstance(node, (Sequence, Set)) and not isinstance(node, basestring):
                queue.extend(node)
            else:
                count += 1
    
        return count
    

    在return语句中,我相信您的意思是
    json_obj
    ,而不是
    json_data
    。@HaiVu:是的,我做了-修复了。非常感谢。很好的捕获,虽然在这一点上,它们总是相同的东西,所以它不明显,也不会改变结果(无论如何在测试代码中)。在Python3上用str替换了基串,谢谢。酷。我学到了一些新东西。对不起。。。这是怎么离题的?