Python 分组与加权平均
我有一个数据框:Python 分组与加权平均,python,pandas,Python,Pandas,我有一个数据框: import pandas as pd import numpy as np df=pd.DataFrame.from_items([('STAND_ID',[1,1,2,3,3,3]),('Species',['Conifer','Broadleaves','Conifer','Broadleaves','Conifer','Conifer']), ('Height',[20,19,13,24,25,18]),('S
import pandas as pd
import numpy as np
df=pd.DataFrame.from_items([('STAND_ID',[1,1,2,3,3,3]),('Species',['Conifer','Broadleaves','Conifer','Broadleaves','Conifer','Conifer']),
('Height',[20,19,13,24,25,18]),('Stems',[1500,2000,1000,1200,1700,1000]),('Volume',[200,100,300,50,100,10])])
STAND_ID Species Height Stems Volume
0 1 Conifer 20 1500 200
1 1 Broadleaves 19 2000 100
2 2 Conifer 13 1000 300
3 3 Broadleaves 24 1200 50
4 3 Conifer 25 1700 100
5 3 Conifer 18 1000 10
我想根据林分ID和物种进行分组,对高度和茎进行加权平均,体积作为重量和未堆叠
所以我试着:
newdf=df.groupby(['STAND_ID','Species']).agg({'Height':lambda x: np.average(x['Height'],weights=x['Volume']),
'Stems':lambda x: np.average(x['Stems'],weights=x['Volume'])}).unstack()
这给了我一个错误:
builtins.KeyError:“高度”
如何修复此问题?您的错误是因为无法使用
agg
执行多个系列/列操作。Agg以一个系列/列作为时间。让我们使用apply
和pd.concat
g = df.groupby(['STAND_ID','Species'])
newdf = pd.concat([g.apply(lambda x: np.average(x['Height'],weights=x['Volume'])),
g.apply(lambda x: np.average(x['Stems'],weights=x['Volume']))],
axis=1, keys=['Height','Stems']).unstack()
编辑更好的解决方案:
输出:
Height Stems
Species Broadleaves Conifer Broadleaves Conifer
STAND_ID
1 19.0 20.000000 2000.0 1500.000000
2 NaN 13.000000 NaN 1000.000000
3 24.0 24.363636 1200.0 1636.363636
这里似乎也有答案:
Height Stems
Species Broadleaves Conifer Broadleaves Conifer
STAND_ID
1 19.0 20.000000 2000.0 1500.000000
2 NaN 13.000000 NaN 1000.000000
3 24.0 24.363636 1200.0 1636.363636