Python 如何从Dataframe groupby对象获取一系列json/字典_Python_Json_Pandas Groupby

Python 如何从Dataframe groupby对象获取一系列json/字典

python json

Python 如何从Dataframe groupby对象获取一系列json/字典,python,json,pandas-groupby,Python,Json,Pandas Groupby,我有一个超过2列的数据帧（Col1，Col2，等等），我想生成一个序列，其中索引是Col1，序列的值是字典，其中键是Col2，值（dict）是元组（Col1，Col2）的发生率假设数据帧是这样的： Col1 Col2 Col3 ... 0 A b ... 1 B e ... 2 A a ... 3 C a ... 4 A b ... 5 B c ... 6 A

我有一个超过2列的数据帧（Col1，Col2，等等），我想生成一个序列，其中索引是Col1，序列的值是字典，其中键是Col2，值（dict）是元组（Col1，Col2）的发生率

假设数据帧是这样的：

    Col1 Col2 Col3 ...
 0    A    b   ... 
 1    B    e   ... 
 2    A    a   ... 
 3    C    a   ... 
 4    A    b   ... 
 5    B    c   ... 
 6    A    e   ... 
 7    B    c   ...

我想要的输出是：

A {'a':1,'b':2,'e':1}
B {'c':2,'e':1}
C {'a':1}

我通过这个循环成功地解决了这个问题：

for t in my_df['Col1'].unique(): 
  my_series.loc[t] = my_df[my_df['Col1'] == t].groupby('Col2').size().to_json()

但我想知道是否有一种方法可以更有效地使用pandas方法，而无需迭代

我还尝试使用两个索引groupby：

   my_df.groupby(['Col1','Col2']).size() 
   >
   Col1  Col2
    A     a     1
          b     2
          e     1
    B     c     2
          e     1
    C     a     1

但无法找到将结果转换为上述dict序列的下一步

您需要的是defaultdict：

import collections

resul = collections.defaultdict(dict)
for row in my_df.groupby(['Col1','Col2']).size().iteritems():
    resul[row[0][0]][row[0][1]] = row[1]

pprint.pprint(resul)

如预期所示：

defaultdict(<class 'dict'>,
            {'A': {'a': 1, 'b': 2, 'e': 1},
             'B': {'c': 2, 'e': 1},
             'C': {'a': 1}})

defaultdict是您需要的：

import collections

resul = collections.defaultdict(dict)
for row in my_df.groupby(['Col1','Col2']).size().iteritems():
    resul[row[0][0]][row[0][1]] = row[1]

pprint.pprint(resul)

如预期所示：

defaultdict(<class 'dict'>,
            {'A': {'a': 1, 'b': 2, 'e': 1},
             'B': {'c': 2, 'e': 1},
             'C': {'a': 1}})

但这里也有一个迭代。我想知道是否有一种方法可以在一行中完成，但这里也有一个迭代。我想知道是否有方法在一行中完成它。