Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/apache-flex/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas Groupby sum,根据条件计算数字,并按顺序连接_Pandas_Pandas Groupby - Fatal编程技术网

Pandas Groupby sum,根据条件计算数字,并按顺序连接

Pandas Groupby sum,根据条件计算数字,并按顺序连接,pandas,pandas-groupby,Pandas,Pandas Groupby,我有一个如下所示的数据帧 Sector Property_ID Unit_ID Unit_usage Property_Usage Rent_Unit_Status Unit_Area SE1 1 1 Shop Commercial Rented 200 SE1 1 2 Resid Commercial Rented

我有一个如下所示的数据帧

Sector Property_ID  Unit_ID  Unit_usage   Property_Usage  Rent_Unit_Status  Unit_Area
SE1    1            1        Shop         Commercial      Rented            200
SE1    1            2        Resid        Commercial      Rented            200
SE1    1            3        Shop         Commercial      Vacant            100
SE1    2            1        Shop         Residential     Vacant            200
SE1    2            2        Apartment    Residential     Rented            100
SE2    1            1        Resid        Commercial      Rented            400
SE2    1            2        Shop         Commercial      Vacant            100
SE2    2            1        Apartment    Residential     Vacant            500
从上面的数据框中,我想准备下面的数据框

Sector  No_of_Properties  No_of_Units  Total_area  %_Vacant   %_Rented  %_Shop  %_Apartment
SE1     2                 5            800         37.5       62.5      62.5    12.5
SE2     2                 3            1000        60         40        10      50
这里是聚合函数字典的必要用法,这里和计数:

#aggregate sum per 2 columns Sector and Usage
df1 = df.groupby(['Sector', 'Unit_usage'])['Unit_Area'].sum()
#percentage by division of total per Sector
df1 = df1.div(df1.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
#aggregate sum per 2 columns Sector and Status
df2 = df.groupby(['Sector', 'Rent_Unit_Status'])['Unit_Area'].sum()
df2 = df2.div(df2.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
#aggregations
s = df.groupby('Sector').agg({'Property_ID':'nunique','Unit_ID':'size', 'Unit_Area':'sum'})
s = s.rename(columns={'Property_ID':'No_of_Properties','Unit_ID':'No_of_Units',
                      'Unit_Area':'Total_area'})
#join all together
df = pd.concat([s, df1, df2], axis=1).reset_index()
print (df)
  Sector  No_of_Properties  No_of_Units  Total_area  %_Apartment  %_Resid  \
0    SE1                 2            5         800         12.5     25.0   
1    SE2                 2            3        1000         50.0     40.0   

   %_Shop  %_Rented  %_Vacant  
0    62.5      62.5      37.5  
1    10.0      40.0      60.0  
熊猫0.25+溶液:

#aggregate sum per 2 columns Sector and Usage
df1 = df.groupby(['Sector', 'Unit_usage'])['Unit_Area'].sum()
#percentage by division of total per Sector
df1 = df1.div(df1.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
#aggregate sum per 2 columns Sector and Status
df2 = df.groupby(['Sector', 'Rent_Unit_Status'])['Unit_Area'].sum()
df2 = df2.div(df2.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
#aggregations
s = df.groupby('Sector').agg(No_of_Properties=('Property_ID','nunique'),
                             No_of_Units=('Unit_ID','size'),
                             Total_area= ('Unit_Area','sum'))
#join all together
df = pd.concat([s, df1, df2], axis=1).reset_index()
print (df)

  Sector  No_of_Properties  No_of_Units  Total_area  %_Apartment  %_Resid  \
0    SE1                 2            5         800         12.5     25.0   
1    SE2                 2            3        1000         50.0     40.0   

   %_Shop  %_Rented  %_Vacant  
0    62.5      62.5      37.5  
1    10.0      40.0      60.0  

更新:现在计算总面积的百分比

您可以为此使用
pd.groupby.apply

def summarise(df):
    output = pd.Series()
    output['No_of_Properties'] = df['Property_ID'].nunique()
    output['No_of_Units'] = df['Unit_ID'].size
    output['Total_area'] = df['Unit_Area'].sum()
    output['%_Rented'] = (df['Unit_Area'].loc[df['Rent_Unit_Status'] == 'Rented'].sum() / output['Total_area']) * 100
    output['%_Shop'] = (df['Unit_Area'].loc[df['Unit_usage'] == 'Shop'].sum() / output['Total_area']) * 100
    output['%_Apartment'] = (df['Unit_Area'].loc[df['Unit_usage'] == 'Apartment'].sum() / output['Total_area']) * 100

    return output

print(df.groupby('Sector').apply(summarise))
输出:

No_of_Properties  No_of_Units  Total_area  %_Rented  %_Shop  \
Sector                                                                
SE1                  2.0          5.0       800.0      62.5    62.5   
SE2                  2.0          3.0      1000.0      40.0    10.0   

        %_Apartment  
Sector               
SE1            12.5  
SE2            50.0  

您的pandas版本是什么?为什么第一个扇区租用了
%\u
62.5%?不应该是60%?@jezrael我的熊猫版-0.25。1@mrzo (200+200+100)/800 *100 = 62.5