Pandas 基于多个条件和多列的Groupby计数_Pandas_Pandas Groupby

Pandas 基于多个条件和多列的Groupby计数

pandas

Pandas 基于多个条件和多列的Groupby计数,pandas,pandas-groupby,Pandas,Pandas Groupby,我有一个如下所示的数据帧 ID Ownwer_ID Building Nationality Age Sector 1 2 Villa India 24 SE1 2 2 Villa India 28 SE1 3 4 Apartment USA 82 SE2 4 4 Apartment USA

我有一个如下所示的数据帧

ID  Ownwer_ID   Building   Nationality  Age   Sector
1   2           Villa      India        24    SE1
2   2           Villa      India        28    SE1
3   4           Apartment  USA          82    SE2
4   4           Apartment  USA          68    SE2
5   7           Villa      UK           32    SE2
6   7           Villa      UK           28    SE2
7   7           Villa      UK            4    SE2
8   8           LabourCamp Pakistan     27    SE3
9   2           Villa      India        1     SE1
10  10          LabourCamp India        23    SE2
11  11          Apartment  Germany      34    SE3

在上面的数据中，ID是唯一的，它表示一个人

从上面的数据框中，我想准备下面的数据框

Sector   #Age_0-12  #Agemore70   #Asians  #Europe  #USA  #Asians_LabourCamp #USA_Apartment
SE1      1          0            3        0        0     0                  0
SE2      1          1            1        3        2     1                  2
SE3      0          0            1        1        0     1                  0

我认为亚洲人的国籍是印度或巴基斯坦。欧洲=国籍英国或德国

#年龄0-12=年龄在0到12岁之间的人数（含）

#Agemore70=年龄大于或等于70岁的人数

同样，剩下的所有列都是按姓名解释的人数

我尝试了以下代码

d = {'India': 'Asians', 'Pakistan': 'Asians', 'UK': 'Europe', 'Germany': 'Europe',
'USA': 'USA'}
df['natinality_Group'] = df['Nationality'].map(d)

bins = [-1, , 12, , 21, 50, 100]
df['binned_age'] = pd.cut(df['Age'], bins)

在那之后，我不知所措，如果你有解决方案，请帮助我好吗？

让我们试试这个，使用

pd.cut

获得年龄组和

pd.get_dummies

使用

groupby

获得所选列中每个值的计数：

df['Age Group'] = pd.cut(df['Age'],[0,12,70,np.inf],labels=['Age_0-12','Age_12-70','Agemore70'])


df_out = pd.get_dummies(df[['Sector','Building', 'Age Group', 'Nationality']], 
                        columns=['Age Group', 'Building', 'Nationality'], 
                        prefix='#', prefix_sep='').groupby('Sector').sum()

输出：

       #Age_0-12  #Age_12-70  #Agemore70  #Apartment  #LabourCamp  #Villa  \
Sector                                                                       
SE1             1           2           0           0            0       3   
SE2             1           4           1           2            1       3   
SE3             0           2           0           1            1       0   

        #Germany  #India  #Pakistan  #UK  #USA  
Sector                                          
SE1            0       3          0    0     0  
SE2            0       1          0    3     2  
SE3            1       0          1    0     0

检查枢轴？