Python 熊猫的滚动行为
这是我的熊猫:Python 熊猫的滚动行为,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,这是我的熊猫: df = pd.DataFrame({ 'location': ['USA','USA','USA','USA', 'France','France','France','France'], 'date':['2020-11-20','2020-11-21','2020-11-22','2020-11-23', '2020-11-20','2020-11-21','2020-11-22','2020-11-23'], 'dm':[5.,4.,2.,2.,17.,3.,3.,7.]
df = pd.DataFrame({
'location': ['USA','USA','USA','USA', 'France','France','France','France'],
'date':['2020-11-20','2020-11-21','2020-11-22','2020-11-23', '2020-11-20','2020-11-21','2020-11-22','2020-11-23'],
'dm':[5.,4.,2.,2.,17.,3.,3.,7.]
})
对于精确的位置(因此需要groupby),我需要2天内dm的平均值。如果我使用这个:
df['rolling']=df.groupby('location').dm.rolling(2).mean().values
我得到了这个错误的答案
location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 10.0
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 5.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 4.5
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 2.0
虽然它应该是:
location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 4.5
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 2.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 10
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 5.0
两个问题:
- 我的语法实际上在做什么
- 正确的方法是什么
groupby
创建新级别的多索引
,因此为了匹配原始索引值,必须使用删除它,如果使用.value
则不按索引对齐,因此顺序应该不同,如下所示:
df['rolling']=df.groupby('location').dm.rolling(2).mean().reset_index(level=0, drop=True)
print (df)
location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 4.5
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 2.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 10.0
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 5.0
详细信息:
print (df.groupby('location').dm.rolling(2).mean())
location
France 4 NaN
5 10.0
6 3.0
7 5.0
USA 0 NaN
1 4.5
2 3.0
3 2.0
Name: dm, dtype: float64
print (df.groupby('location').dm.rolling(2).mean().reset_index(level=0, drop=True))
4 NaN
5 10.0
6 3.0
7 5.0
0 NaN
1 4.5
2 3.0
3 2.0
Name: dm, dtype: float64