Python 熊猫groupby结果形状意外_Python_Pandas_Group By

Python 熊猫groupby结果形状意外

python pandas

Python 熊猫groupby结果形状意外,python,pandas,group-by,Python,Pandas,Group By,我有一个“堆叠”格式的时间序列数据，我想计算一个基于两列的滚动函数。但是，如下面的示例所示，groupby将水平而不是垂直地连接我的结果。我可以在最后应用堆栈，以返回到高格式。但是，我认为正确的行为应该是垂直连接以允许分配回原始数据帧（类似于x['res']=df.groupby（…）.apply（func））。有人知道为什么groupby的行为不符合预期，或者我做错了什么吗 x Out[52]: group month a b 0 185

我有一个“堆叠”格式的时间序列数据，我想计算一个基于两列的滚动函数。但是，如下面的示例所示，

groupby

将水平而不是垂直地连接我的结果。我可以在最后应用

堆栈

，以返回到高格式。但是，我认为正确的行为应该是垂直连接以允许分配回原始数据帧（类似于

x['res']=df.groupby（…）.apply（func）

）。有人知道为什么groupby的行为不符合预期，或者我做错了什么吗

x
Out[52]: 
    group      month         a         b
0   18527 2014-09-01  0.534152  0.973451
1   18527 2014-10-01  0.079879  0.354498
2   18527 2014-11-01  0.032298  0.203997
3   18527 2014-12-01  0.148435  0.352703
4   18527 2015-01-01  0.879930  0.819328
5   18527 2015-02-01  0.475297  0.693203
6   18527 2015-03-01  0.223759  0.731594
7   18527 2015-04-01  0.391933  0.332801
8   18671 2014-09-01  0.740621  0.305298
9   18671 2014-10-01  0.230585  0.772569
10  18671 2014-11-01  0.664834  0.755219
11  18671 2014-12-01  0.987118  0.896310
12  18671 2015-01-01  0.228804  0.058641
13  18671 2015-02-01  0.415715  0.182683
14  18671 2015-03-01  0.574570  0.144686
15  18671 2015-04-01  0.488804  0.545102

x.dtypes
Out[53]: 
group             int64
month    datetime64[ns]
a               float64
b               float64
dtype: object

def func(s):
    return pd.rolling_sum(s.a, 3) / pd.rolling_sum(s.b, 3)


x.set_index('month').groupby('group').apply(func)
Out[55]: 
month  2014-09-01  2014-10-01  2014-11-01  2014-12-01  2015-01-01  2015-02-01  group                                                                           
18527         NaN         NaN    0.421900    0.286010    0.770814    0.806152   
18671         NaN         NaN    0.892505    0.776593    1.099748    1.434238   

month  2015-03-01  2015-04-01  
group                          
18527    0.703609    0.620728  
18671    3.158185    1.695287  

x.set_index('month').groupby('group').apply(func).stack()
Out[56]: 
group  month     
18527  2014-11-01    0.421900
       2014-12-01    0.286010
       2015-01-01    0.770814
       2015-02-01    0.806152
       2015-03-01    0.703609
       2015-04-01    0.620728
18671  2014-11-01    0.892505
       2014-12-01    0.776593
       2015-01-01    1.099748
       2015-02-01    1.434238
       2015-03-01    3.158185
       2015-04-01    1.695287
dtype: float64

您可以在

func（）

中将结果转换为数据帧：

谢谢，这似乎有效。你能解释一下为什么需要转换成数据帧吗？简单地返回一个序列在大部分时间都有效。如果

func

返回的所有序列都具有相同的索引，pandas将使序列成为结果

DataFrame

的行。它在大多数情况下都有效，因为该系列在大多数情况下都有不同的索引。这很有趣。谢谢！

def func(s):
    return (pd.rolling_sum(s.a, 3) / pd.rolling_sum(s.b, 3)).dropna().to_frame()

df.groupby('group').apply(func)