在Python中，基于计数器排序，基于频率重新组织_Python_Data Processing

在Python中，基于计数器排序，基于频率重新组织

python

在Python中，基于计数器排序，基于频率重新组织,python,data-processing,Python,Data Processing,我的代码如下所示： with open('toy_two.json', 'rb') as inpt: dict_hash_gas = list() for line in inpt: resource = json.loads(line) dict_hash_gas.append({resource['first']:resource['second']}) # Count up the values counts = collections.

我的代码如下所示：

with open('toy_two.json', 'rb') as inpt:

    dict_hash_gas = list()
    for line in inpt:
        resource = json.loads(line)
        dict_hash_gas.append({resource['first']:resource['second']})

# Count up the values
counts = collections.Counter(v for d in dict_hash_gas for v in d.values())

# Apply a threshold
counts = {k:v for k,v in counts.iteritems() if v > 1}

print(counts)

以下是数据：

{"first":"A","second":"1","third":"2"} 
{"first":"B","second":"1","third":"2"} 
{"first":"C","second":"2","third":"2"} 
{"first":"D","second":"3","third":"2"} 
{"first":"E","second":"3","third":"2"} 
{"first":"F","second":"3","third":"2"} 
{"first":"G","second":"3","third":"2"} 
{"first":"H","second":"4","third":"2"} 
{"first":"I","second":"4","third":"2"} 
{"first":"J","second":"0","third":"2"} 
{"first":"K","second":"0","third":"2"} 
{"first":"L","second":"0","third":"2"} 
{"first":"M","second":"0","third":"2"} 
{"first":"N","second":"0","third":"2"}

相应的输出：

{u'1': 2, u'0': 5, u'3': 4, u'4': 2}

我要做的是对输出进行排序，使其呈现为：

{ u'0': 5, u'3': 4, u'4': 2, u'1': 2}

到目前为止，我尝试了

counts=counts.most_common（）

，但没有成功。我得到了以下错误：

AttributeError: 'dict' object has no attribute 'most_common'

counts是一个实例，它理解方法

计数现在是一个

dict

，它不理解

最常见的
您只需先应用最常见的，然后应用树名：
data = [{"first":"A","second":"1","third":"2"} ,
    {"first":"B","second":"1","third":"2"} ,
    {"first":"C","second":"2","third":"2"} ,
    {"first":"D","second":"3","third":"2"} ,
    {"first":"E","second":"3","third":"2"} ,
    {"first":"F","second":"3","third":"2"} ,
    {"first":"G","second":"3","third":"2"} ,
    {"first":"H","second":"4","third":"2"} ,
    {"first":"I","second":"4","third":"2"} ,
    {"first":"J","second":"0","third":"2"} ,
    {"first":"K","second":"0","third":"2"} ,
    {"first":"L","second":"0","third":"2"} ,
    {"first":"M","second":"0","third":"2"} ,
    {"first":"N","second":"0","third":"2"}]

from collections import Counter
c = Counter(int(d["second"]) for d in data)
print(c)
# Counter({0: 5, 3: 4, 1: 2, 4: 2, 2: 1})
print(c.most_common())
# [(0, 5), (3, 4), (1, 2), (4, 2), (2, 1)]
print([(value, count) for value, count in c.most_common() if count > 1])
# [(0, 5), (3, 4), (1, 2), (4, 2)]

嗯，我明白了-谢谢你。但是-现在当我对它们重新排序时-我得到了一个新的错误：AttributeError:“list”对象没有属性“iteritems”
cool，这是一个很好的错误-我还能够用这个疯狂的东西来设置阈值counts=[list（group）for val，group in itertools.groupby（counts，lambda x:x[1]>threshold），如果val]，但是你的是干净的你知道什么是将它可视化为柱状图/条形图的最好方法吗？它很简单，所以任何库都应该这样做。Matplotlib有很好的文档记录：
# Apply a threshold
counts = {k:v for k,v in counts.iteritems() if v > 1}

data = [{"first":"A","second":"1","third":"2"} ,
    {"first":"B","second":"1","third":"2"} ,
    {"first":"C","second":"2","third":"2"} ,
    {"first":"D","second":"3","third":"2"} ,
    {"first":"E","second":"3","third":"2"} ,
    {"first":"F","second":"3","third":"2"} ,
    {"first":"G","second":"3","third":"2"} ,
    {"first":"H","second":"4","third":"2"} ,
    {"first":"I","second":"4","third":"2"} ,
    {"first":"J","second":"0","third":"2"} ,
    {"first":"K","second":"0","third":"2"} ,
    {"first":"L","second":"0","third":"2"} ,
    {"first":"M","second":"0","third":"2"} ,
    {"first":"N","second":"0","third":"2"}]

from collections import Counter
c = Counter(int(d["second"]) for d in data)
print(c)
# Counter({0: 5, 3: 4, 1: 2, 4: 2, 2: 1})
print(c.most_common())
# [(0, 5), (3, 4), (1, 2), (4, 2), (2, 1)]
print([(value, count) for value, count in c.most_common() if count > 1])
# [(0, 5), (3, 4), (1, 2), (4, 2)]