Python中不同嵌套字典的Gen填充率？_Python_Loops_Counting

Python中不同嵌套字典的Gen填充率？

python loops

Python中不同嵌套字典的Gen填充率？,python,loops,counting,Python,Loops,Counting,我有一个进程可以接收Python中的嵌套字典：示例嵌套字典模式（伪代码）示例嵌套字典数据（伪代码）我想迭代/扫描这个dict，并检查每个键是否都有一个值（不是Null或空白-false/0是ok）。我需要扫描一串相同的dict，以获得该组dict的总体“填充率”。流程每次运行时都会看到不同格式的DICT集，因此需要自动生成填充率报告：上述单个嵌套示例的填充率示例（理想情况下为扁平dict）：例如，如果我们扫描十个相同结构的DICT，我们可能会看到：填充率“如下所示： key1: 5 k

我有一个进程可以接收Python中的嵌套字典：

示例嵌套字典模式（伪代码）

示例嵌套字典数据（伪代码）

我想迭代/扫描这个dict，并检查每个键是否都有一个值（不是Null或空白-false/0是ok）。我需要扫描一串相同的dict，以获得该组dict的总体“填充率”。流程每次运行时都会看到不同格式的DICT集，因此需要自动生成填充率报告：

上述单个嵌套示例的填充率示例（理想情况下为扁平dict）：

例如，如果我们扫描十个相同结构的DICT，我们可能会看到：填充率“如下所示：

key1: 5
key2: 6
key2-key3: 6
key2-key4: 4
key5: 3

example1_dict = {
    'key1': 'value',
    'key2': {
        'key3': '',
        'key4': 12345,
    },
    'key5': list()
}



example={}
for ka,l in example1_dict.items():
    if isinstance(l,dict):
        def hi(fg, track=''):
            print(fg)
            for i, k in fg.items():
                track="{}-{}".format(ka,i)
                if i not in example:
                    example[track] = 1
                else:
                    example[track] += 1
                if isinstance(k, dict):
                    return hi(k)



        print(hi(l))
    elif l:
        example[ka]=1
    else:
        example[ka]=0
print(example)

问题

什么是扫描不同结构的dict以生成填充率的最适合的方法？如果我必须这样做数百万次，有没有更有效的方法

创建一个平面dict来存储计数的最简单的方法是什么？如何更新它

好的，我想我解决了。我做了一些非常小的测试，但我认为这是可行的：

def scan_dict(d):
    counts = {}
    for k, v in d.items():
        if isinstance(v, dict):
            subcounts = scan_dict(v)
            for subkey, subcount in subcounts.items():
                new_key = str(k) + "-" + str(subkey)
                count = counts.get(new_key, 0)
                counts[new_key] = count + subcount
        key = str(k)
        count = counts.get(key, 0)
        counts[key] = count + 1
    return counts

def scan_all_dicts(ds):
    total_counts = {}
    for d in ds:
        counts = scan_dict(d)
        for k, v in counts.items():
            count = total_counts.get(k, 0)
            total_counts[k] = count + v
    return total_counts

本质上，有一个递归函数来扫描每一个字典，并计算字典中的所有内容和它找到的任何子字典

“驱动程序”是第二个函数，它获取DICT的iterable（例如列表）并在第一个函数中运行它们，然后返回所有值的扁平列表

我没有检查这些值以确保它们“不为空”；我将由您决定。

以下是我的看法：

扫描不同结构的DICT以生成填充率的最适合的方法是什么

递归。特别是，我将遍历子树的结果返回给调用者。调用者负责将多个子树合并到自己的树的结果中

如果我必须这样做数百万次，有没有更有效的方法

可能吧。试试一个解决方案，看看它是否A）正确，B）足够快。如果两者都是，不要费心寻找效率最高的

创建一个平面dict来存储计数的最简单的方法是什么？如何更新它

使用Python附带的一个库。在本例中，

collections.Counter（）

。并调用其

.update（）

函数

from collections import Counter
from pprint import pprint

example1_dict = {
    'key1': 'value',
    'key2': {
        'key3': '',
        'key4': 12345,
    },
    'key5': list()
}

example2_dict = {
    'key1': 'value',
    'key7': {
        'key3': '',
        'key4': 12345,
    },
    'key5': [1]
}

def get_fill_rate(d, path=()):
    result = Counter()
    for k, v in d.items():
        if isinstance(v, dict):
            result[path+(k,)] += 1
            result.update(get_fill_rate(v, path+(k,)))
        elif v in (False, 0):
            result[path+(k,)] += 1
        elif v:
            result[path+(k,)] += 1
        else:
            result[path+(k,)] += 0
    return result

def get_fill_rates(l):
    result = Counter()
    for d in l:
        result.update(get_fill_rate(d))
    return dict(result)

result = get_fill_rates([example1_dict, example2_dict])

# Raw result
pprint(result)

# Formatted result
print('\n'.join(
    '-'.join(single_key for single_key in key) + ': ' + str(value)
    for key, value in sorted(result.items())))

结果:

{('key1',): 2,
 ('key2',): 1,
 ('key2', 'key3'): 0,
 ('key2', 'key4'): 1,
 ('key5',): 1,
 ('key7',): 1,
 ('key7', 'key3'): 0,
 ('key7', 'key4'): 1}
key1: 2
key2: 1
key2-key3: 0
key2-key4: 1
key5: 1
key7: 1
key7-key3: 0
key7-key4: 1

递归是解决此问题的最常用的方法；但是，此解决方案利用装饰器更新全局字典以存储总体填充率。使用

集合。defaultdict

，

final_dict

可以在每次

get\u出现时更新多次：
from collections import defaultdict
import re
final_dict = defaultdict(int)
def fill(f):
   def update_count(structure, last):
     data = f(structure, last=None)
     def update_final(d):
        for a, b in d.items():
            global final_dict
            final_dict[a] += int(bool(b)) if not isinstance(b, dict) else int(bool(update_final(b)))
      update_final(data)
   return update_count

@fill
def get_occurences(d, last=None):
   return {"{}-{}".format(last, a) if last else a:int(bool(b)) if not isinstance(b, dict) else get_occurences(b, a) for a, b in d.items()}

structures = [{'key1':'value', 'key2':{'key3':'', 'key4':12345}, 'key5':[]}, {'key1':18, 'key2':'value1', 'key3':['James', 'Bob', 'Bill']},{'key1':'value2', 'key2':{'key3':'233', 'key4':12345}, 'key5':100}]
for structure in structures:
   get_occurences(structure)

for i in sorted(final_dict.items(), key=lambda (c, d):(int(re.findall('\d+$', c)[0]), bool(re.findall('\w+-\w+', c)))):
  print("{}: {}".format(*i))

输出：
{'key2-key3': 1, 'key2-key4': 2, 'key1': 3, 'key2': 1, 'key5': 1, 'key3': 1}

key1: 3
key2: 1
key3: 1
key2-key3: 1
key2-key4: 2
key5: 1

输出：
{'key2-key3': 1, 'key2-key4': 2, 'key1': 3, 'key2': 1, 'key5': 1, 'key3': 1}

key1: 3
key2: 1
key3: 1
key2-key3: 1
key2-key4: 2
key5: 1

您可以尝试以下方法：
key1: 5
key2: 6
key2-key3: 6
key2-key4: 4
key5: 3

example1_dict = {
    'key1': 'value',
    'key2': {
        'key3': '',
        'key4': 12345,
    },
    'key5': list()
}



example={}
for ka,l in example1_dict.items():
    if isinstance(l,dict):
        def hi(fg, track=''):
            print(fg)
            for i, k in fg.items():
                track="{}-{}".format(ka,i)
                if i not in example:
                    example[track] = 1
                else:
                    example[track] += 1
                if isinstance(k, dict):
                    return hi(k)



        print(hi(l))
    elif l:
        example[ka]=1
    else:
        example[ka]=0
print(example)

输出：
{'key5': 0, 'key2-key4': 1, 'key1': 1, 'key2-key3': 1}

只是想澄清一下：你所说的“填充率”是指你想要看到的，在你最终看到的所有词典中，有多少次（例如，key1
）被指定了？你的例子不清楚。我更新了这个问题，希望更清楚。Gen的意思是：检查每个字段并将其计算为“填充”“如果已设置且不为Null或空白。通过fillrate，我指的是在具有相同结构的一组dict中，每个键值对不为null或空的频率。