Python 如何根据每组的大小设置滚动窗口的大小？_Python_Window_Grouping_Rolling Computation

Python 如何根据每组的大小设置滚动窗口的大小？

python

Python 如何根据每组的大小设置滚动窗口的大小？,python,window,grouping,rolling-computation,Python,Window,Grouping,Rolling Computation,我有一个如下所示的数据框： >df ID Value --------------- 1 1.0 1 2.0 1 3.0 1 4.0 2 6.0 2 7.0 2 8.0 3 2.0 我想在最后一个int（组大小/2）的“值”字段上计算每个组的min/max/sum/mean/var，而不是固定数量的记录对于ID=1，在最后4/2=2条记录的“值”字段上应用min/max/sum/m

我有一个如下所示的数据框：

>df

ID    Value
---------------
1       1.0
1       2.0
1       3.0
1       4.0
2       6.0
2       7.0
2       8.0
3       2.0

我想在最后一个

int（组大小/2）

的“值”字段上计算每个组的

min/max/sum/mean/var

，而不是固定数量的记录

对于ID=1，在最后4/2=2条记录的“值”字段上应用
```
min/max/sum/mean/var
```
对于ID=2，在最后3/2=1记录的“值”字段上应用
```
min/max/sum/mean/var
```
对于ID=3，对最后1条记录的“值”字段应用
```
min/max/sum/mean/var
```
，因为它在组中只有一条记录

所以输出应该是

             Value
ID    min   max  sum  mean  var
----------------------------------
1     3.0   4.0  7.0  3.5    0.5 # the last 4/2 rows for group with ID =1
2     7.0   7.0  7.0  7.0    0.5 # the last 3/2 rows for group with ID =2
3     2.0   2.0  2.0  2.0    Nan # the last 1 rows for group with ID =3

我正在考虑使用

滚动功能，如下所示：
df_group=df.groupby('ID')
           .apply(lambda x: x \
                           .sort_values(by=['ID'])
                           .rolling(window=int(x.size/2),min_periods=1)
                           .agg({'Value':['min','max','sum','mean','var']})
                           .tail(1)
                  )

但结果如下
                Value
        min max sum    mean  var
ID                      
------------------------------------------------
1   3   1.0 4.0 10.0    2.5 1.666667
2   6   6.0 8.0 21.0    7.0 1.000000
3   7   2.0 2.0 2.0     2.0 NaN

看起来x码根本不起作用
有没有办法根据组大小设置滚动大小
 可能的解决方案，包括：
import pandas as pd
df = pd.DataFrame(dict(ID=[1,1,1,1,2,2,2,3],
                      Value=[1,2,3,4,6,7,8,2]))

print(df)
##
   ID  Value
0   1      1
1   1      2
2   1      3
3   1      4
4   2      6
5   2      7
6   2      8
7   3      2

按如下所示循环分组
#Object to store the result
stats = []

#Group over ID
for ID, Values in df.groupby('ID'):
    # tail : to get last n values, with n max between 1 and group length / 2
    # describe : to get the statistics
    _stat = Values.tail(max(1,int(len(Values)/2)))['Value'].describe()
    #Add group ID to the result
    _stat.loc['ID'] = ID
    #Store the result
    stats.append(_stat)

#Create the new dataframe
pd.DataFrame(stats).set_index('ID')

结果
     count  mean       std  min   25%  50%   75%  max
ID                                                   
1.0    2.0   3.5  0.707107  3.0  3.25  3.5  3.75  4.0
2.0    1.0   8.0       NaN  8.0  8.00  8.0  8.00  8.0
3.0    1.0   2.0       NaN  2.0  2.00  2.0  2.00  2.0


链接：



可能的解决方案，包括：
import pandas as pd
df = pd.DataFrame(dict(ID=[1,1,1,1,2,2,2,3],
                      Value=[1,2,3,4,6,7,8,2]))

print(df)
##
   ID  Value
0   1      1
1   1      2
2   1      3
3   1      4
4   2      6
5   2      7
6   2      8
7   3      2

按如下所示循环分组
#Object to store the result
stats = []

#Group over ID
for ID, Values in df.groupby('ID'):
    # tail : to get last n values, with n max between 1 and group length / 2
    # describe : to get the statistics
    _stat = Values.tail(max(1,int(len(Values)/2)))['Value'].describe()
    #Add group ID to the result
    _stat.loc['ID'] = ID
    #Store the result
    stats.append(_stat)

#Create the new dataframe
pd.DataFrame(stats).set_index('ID')

结果
     count  mean       std  min   25%  50%   75%  max
ID                                                   
1.0    2.0   3.5  0.707107  3.0  3.25  3.5  3.75  4.0
2.0    1.0   8.0       NaN  8.0  8.00  8.0  8.00  8.0
3.0    1.0   2.0       NaN  2.0  2.00  2.0  2.00  2.0


链接：



您好，您能否分享您尝试过的操作以及预期结果（数据帧或其他）？我已经用我所做的操作和预期输出更新了问题，是否有任何提示？不知道您为什么需要滚动数据帧，请参阅以获取可能的解决方案您好，您能否分享您尝试过的操作以及预期结果（数据帧或其他）？我已经用我所做的和预期的输出更新了问题，有什么提示吗？不知道为什么需要滚动数据帧，请参阅以获取可能的解决方案