Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在GroupBy输出中将缺少的组键包括为NaN_Python_Pandas_Group By_Pandas Groupby - Fatal编程技术网

Python 在GroupBy输出中将缺少的组键包括为NaN

Python 在GroupBy输出中将缺少的组键包括为NaN,python,pandas,group-by,pandas-groupby,Python,Pandas,Group By,Pandas Groupby,我有一个熊猫的数据框 test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'], 'transaction': ['aa', 'bb', 'cc', 'aa', 'bb', 'bb'], 'ccy': ['USD', 'EUR',

我有一个熊猫的数据框

test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'],
                       'transaction': ['aa', 'bb', 'cc', 'aa', 'bb', 'bb'],
                       'ccy': ['USD', 'EUR', 'EUR', 'USD', 'USD', 'USD'],
                       'amt': np.random.random(6)})
测试单元df:

date         transaction  ccy       amt
2018-12-28   aa           USD  0.323439
2018-12-28   bb           EUR  0.048948
2018-12-29   cc           EUR  0.793263
2018-12-29   aa           USD  0.013865
2018-12-30   bb           USD  0.658571
2018-12-30   bb           USD  0.224951
下面的代码给出了这个输出

grouper = test_df.groupby([pd.Grouper('date'), 'transaction', 'ccy'])
grp_transactions = grouper['amt'].sum().unstack()
输出:

ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.323439
           bb           0.048948       NaN
2018-12-29 aa                NaN  0.013865
           cc           0.793263       NaN
2018-12-30 bb                NaN  0.883523
我相信这是意料之中的,因为groupby函数将根据上面的顺序对列中的值进行分组,并相应地求和,而不会为DF中没有的事务创建新行

如果在使用groupby的特定日期未完成交易,pandas中是否有方法包含NaN值?即,如果我的DF没有交易:2018年12月28日抄送,则两个ccy的输出均应为NaN

预期产出:

ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.323439
           bb           0.048948       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.013865
           bb                NaN       NaN
           cc           0.793263       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  0.883523
           cc                NaN       NaN

任何帮助都将不胜感激。谢谢

如果在分组之前将事务转换为分类列,则这很容易

df.transaction = pd.Categorical(df.transaction)
df.groupby(['date', 'transaction', 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.404488
           bb           0.459295       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.439354
           bb                NaN       NaN
           cc           0.429269       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  1.542451
           cc                NaN       NaN
输出中缺少的类别由NAN表示。这通常在执行数值聚合时是可能的

如果您不想修改df,可以这样做:

u = pd.Series(pd.Categorical(df.transaction), name='transaction')
df.groupby(['date', u, 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.429134
           bb           0.852355       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.541576
           bb                NaN       NaN
           cc           0.994095       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  0.744587
           cc                NaN       NaN

我尝试了您建议的解决方案,但在最终结果数据框中,我得到的是amt列,而不是其他列。如何使用unstack和stack返回所有列?