Python 如何获得多列分组后的移动窗口平均值
首先,我想按列排序,Python 如何获得多列分组后的移动窗口平均值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,首先,我想按列排序,name、group和place。 然后,我想得到相邻两个月的平均值y。 最后,我想将平均值添加到原始数据帧 tmp = df.groupby(['name', 'group', 'place'])['y'].rolling(2).mean() print(tmp) 原始数据帧: import pandas as pd df = pd.DataFrame({"name":["Amy", "Amy", "Amy", "Bob", "Bob", "Bob", "Bob", "B
name
、group
和place
。
然后,我想得到相邻两个月的平均值y
。
最后,我想将平均值添加到原始数据帧
tmp = df.groupby(['name', 'group', 'place'])['y'].rolling(2).mean()
print(tmp)
原始数据帧:
import pandas as pd
df = pd.DataFrame({"name":["Amy", "Amy", "Amy", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob"],
"group":[1, 1, 1, 1, 1, 1, 2, 2, 2],
"place":['a', 'a', "a", 'b', 'b', 'b', 'b', 'b', 'b' ],
"yearmonth": ["2019-01", "2019-02", "2019-03", "2019-01", "2019-02", "2019-03", "2019-01", "2019-02", "2019-03"],
"y":[1, 2, 3, 1, 2, 0, 2, 0, 0]
})
print(df)
name group place yearmonth y
0 Amy 1 a 2019-01 1
1 Amy 1 a 2019-02 2
2 Amy 1 a 2019-03 3
3 Bob 1 b 2019-01 1
4 Bob 1 b 2019-02 2
5 Bob 1 b 2019-03 0
6 Bob 2 b 2019-01 2
7 Bob 2 b 2019-02 0
8 Bob 2 b 2019-03 0
Dataframe:
import pandas as pd
df = pd.DataFrame({"name":["Amy", "Amy", "Amy", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob"],
"group":[1, 1, 1, 1, 1, 1, 2, 2, 2],
"place":['a', 'a', "a", 'b', 'b', 'b', 'b', 'b', 'b' ],
"yearmonth": ["2019-01", "2019-02", "2019-03", "2019-01", "2019-02", "2019-03", "2019-01", "2019-02", "2019-03"],
"y":[1, 2, 3, 1, 2, 0, 2, 0, 0]
})
print(df)
name group place yearmonth y
0 Amy 1 a 2019-01 1
1 Amy 1 a 2019-02 2
2 Amy 1 a 2019-03 3
3 Bob 1 b 2019-01 1
4 Bob 1 b 2019-02 2
5 Bob 1 b 2019-03 0
6 Bob 2 b 2019-01 2
7 Bob 2 b 2019-02 0
8 Bob 2 b 2019-03 0
预期结果:
name group place yearmonth y average_2months
0 Amy 1 a 2019-01 1 nan
1 Amy 1 a 2019-02 2 1.5
2 Amy 1 a 2019-03 3 2.5
3 Bob 1 b 2019-01 1 nan
4 Bob 1 b 2019-02 2 1.5
5 Bob 1 b 2019-03 0 1.0
6 Bob 2 b 2019-01 2 nan
7 Bob 2 b 2019-02 0 1.0
8 Bob 2 b 2019-03 0 0.0
我尝试的内容:
现在我知道如何得到相邻两个月的平均值。但是,我不知道如何将其添加到原始数据帧
tmp = df.groupby(['name', 'group', 'place'])['y'].rolling(2).mean()
print(tmp)
tmp:
name group place
Amy 1 a 0 NaN
1 1.5
2 2.5
Bob 1 b 3 NaN
4 1.5
5 1.0
2 b 6 NaN
7 1.0
8 0.0
Name: y, dtype: float64
第四级索引是您的原始索引
df['new']=temp.reset_index(level=[0,1,2], drop=True)
是的,我发现在某些情况下,我们最好添加
df=df.sort_值(按=['name','group','place'])
,否则列new
和列y
不对应。