Pandas 按熊猫分组的总金额
我有一个数据框,如下所示,这是整个城市的区域使用情况,比如班加罗尔Pandas 按熊猫分组的总金额,pandas,pandas-groupby,Pandas,Pandas Groupby,我有一个数据框,如下所示,这是整个城市的区域使用情况,比如班加罗尔 Sector Plot Usage Status Area A 1 Villa Constructed 40 A 2 Residential Constructed 50 A 3 Substation Not_Constructed 120 A
Sector Plot Usage Status Area
A 1 Villa Constructed 40
A 2 Residential Constructed 50
A 3 Substation Not_Constructed 120
A 4 Villa Not_Constructed 60
A 5 Residential Not_Constructed 30
A 6 Substation Constructed 100
B 1 Villa Constructed 80
B 2 Residential Constructed 60
B 3 Substation Not_Constructed 40
B 4 Villa Not_Constructed 80
B 5 Residential Not_Constructed 100
B 6 Substation Constructed 40
班加罗尔由A区和B区组成
根据以上内容,我想计算班加罗尔的总面积及其使用分布
预期产出:
City Total_Area %_Villa %_Resid %_Substation %_Constructed %_Not_Constructed
Bangalore(A+B) 800 32.5 30 37.5 46.25 53.75
我认为您需要在应用解决方案之前将标量值设置为列
city
(如果只有扇区A
和B
):
简单数据透视表可以帮助您强> 1。一线解决方案:完成80%的工作
pv=
df.pivot_表(values='Area',aggfunc=np.sum,index=['Status'],columns=['Usage'],margins=True,margins_name='Total',fill_value=0)。取消堆栈()
2。现在为%设置格式:完成90%的工作
ans=
pd.数据框([[pv['Villa']['Total']/pv['Total']['Total'].aType('float')、pv['Resid']['Total']/pv['Total'].aType('float')、pv['Substation']['Total']['Total'].aType('float')、pv['Total']['Constructed']/pv['Total'.['Total'].aType('float')、pv['Total'.['Total']*100
3。添加总列:99%的工作量已完成
ans['Total']=pv['Total']['Total']
4。重命名列并按预期顺序排列:完成强>
ans.columns=['%\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
ans=ans[[总计,'%'别墅','%'住宅','%'变电站','%'已施工','%'未施工']
df['Sector'] = 'Bangalore(A+B)'
#aggregate sum per 2 columns Sector and Usage
df1 = df.groupby(['Sector', 'Usage'])['Area'].sum()
#percentage by division of total per Sector
df1 = df1.div(df1.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
#aggregate sum per 2 columns Sector and Status
df2 = df.groupby(['Sector', 'Status'])['Area'].sum()
df2 = df2.div(df2.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
#total Area per Sector
s = df.groupby('Sector')['Area'].sum().rename('Total_area')
#join all together
dfA = pd.concat([s, df1, df2], axis=1).reset_index()
print (dfA)
Sector Total_area %_Residential %_Substation %_Villa \
0 Bangalore(A+B) 800 30.0 37.5 32.5
%_Constructed %_Not_Constructed
0 46.25 53.75