Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用dict计算列表的大小?_Python_List_Python 2.7_Dictionary - Fatal编程技术网

Python 如何使用dict计算列表的大小?

Python 如何使用dict计算列表的大小?,python,list,python-2.7,dictionary,Python,List,Python 2.7,Dictionary,如果我有一个dict列表,比如: { 'id1': ['a', 'b', 'c'], 'id2': ['a', 'b'], # etc. } 我想统计列表的大小,即ID的数量>0,>1,>2…等等 对于这样的循环,是否有比嵌套更简单的方法: dictOfOutputs = {} for x in range(1,11): count = 0 for agentId in userIdDict: if len(userIdDict[agen

如果我有一个
dict
列表,比如:

{
    'id1': ['a', 'b', 'c'],
    'id2': ['a', 'b'],
    # etc.
}
我想统计列表的大小,即ID的数量>0,>1,>2…等等

对于这样的循环,是否有比嵌套更简单的方法:

dictOfOutputs = {}
for x in range(1,11):
    count = 0
    for agentId in userIdDict:
        if len(userIdDict[agentId]) > x:
            count += 1
    dictOfOutputs[x] = count        
return dictOfOutputs
我会使用a来收集长度,然后累积总和:

from collections import Counter

lengths = Counter(len(v) for v in userIdDict.values())
total = 0
accumulated = {}
for length in range(max(lengths), -1, -1):
    count = lengths.get(length, 0)
    total += count
    accumulated[length] = total
因此,它收集每个长度的计数,然后构建一个具有累积长度的字典。这是一个O(N)算法;将所有值循环一次,然后添加一些较小的直循环(对于
max()
和累加循环):


是的,有更好的办法

首先,根据ID的数据长度对其进行索引:

my_dict = {
    'id1': ['a', 'b', 'c'],
    'id2': ['a', 'b'],
}

from collections import defaultdict
ids_by_data_len = defaultdict(list)

for id, data in my_dict.items():
    my_dict[len(data)].append(id)
现在,创建您的dict:

output_dict = {}
accumulator = 0
# note: the end of a range is non-inclusive!
for data_len in reversed(range(1, max(ids_by_data_len.keys()) + 1):
    accumulator += len(ids_by_data_len.get(data_len, []))
    output_dict[data_len-1] = accumulator
这具有O(n)复杂度,而不是O(n²),因此对于大型数据集,它的速度也要快得多

output_dict = {}
accumulator = 0
# note: the end of a range is non-inclusive!
for data_len in reversed(range(1, max(ids_by_data_len.keys()) + 1):
    accumulator += len(ids_by_data_len.get(data_len, []))
    output_dict[data_len-1] = accumulator