Python 熊猫:如何为每个子组应用函数

Python 熊猫:如何为每个子组应用函数,python,pandas,pandas-groupby,pandas-apply,Python,Pandas,Pandas Groupby,Pandas Apply,我有一个包含国籍、职业和年龄列的简单数据框架。 欧盟、美洲、亚洲的国籍为热编码0,1,2 对于每个职业,我想找出每个国籍的百分比 例如:67%的医生是欧洲人,33%是亚洲人 import pandas as pd import numpy as np #create dataframe df=pd.DataFrame(np.concatenate((np.random.randint(low=0, high=3, size= (10,1)),np.random.randint(low=24,

我有一个包含国籍、职业和年龄列的简单数据框架。 欧盟、美洲、亚洲的国籍为热编码0,1,2

对于每个职业,我想找出每个国籍的百分比 例如:67%的医生是欧洲人,33%是亚洲人

import pandas as pd
import numpy as np
#create dataframe
df=pd.DataFrame(np.concatenate((np.random.randint(low=0, high=3, size=   (10,1)),np.random.randint(low=24, high=70, size=(10,1))),axis=1))
df.columns=['nationality','age']
df['occupation']=['teacher']*2+['engineer']*3+['doctor']*3+['lawyer']*2


  nationality   age occupation
0   0   65  teacher
1   0   31  teacher
2   0   30  engineer
3   2   63  engineer
4   0   28  engineer
5   1   27  doctor
6   0   52  doctor
7   0   60  doctor
8   0   33  lawyer
9   0   38  lawyer

df.groupby(['occupation','nationality']).count()

def iseuropean(x):
    if x==0:
        return 1
    else:
        return 0
def isamerican(x):
    if x==1:
        return 1
    else:
        return 0
def isasian(x):
    if x==2:
        return 1
    else:
        return 0
使用groupby,我可以获得计数,但我想为每个职业组应用一个函数,确定百分比。不过我还没弄明白


任何帮助都将不胜感激。

我想您正在寻找每个职业的国籍百分比:

In [11]: c = df.groupby(['occupation','nationality'])["age"].count().rename("count")

In [12]: c
Out[12]:
occupation  nationality
doctor      0              2
            1              1
engineer    0              2
            2              1
lawyer      0              2
teacher     0              2
Name: count, dtype: int64

In [13]: c / c.sum()  # proportion of each, maybe not very useful
Out[13]:
occupation  nationality
doctor      0              0.2
            1              0.1
engineer    0              0.2
            2              0.1
lawyer      0              0.2
teacher     0              0.2
Name: count, dtype: float64

In [14]: c / c.groupby(level=0).sum()  # proportion of each occupation
Out[14]:
occupation  nationality
doctor      0              0.666667
            1              0.333333
engineer    0              0.666667
            2              0.333333
lawyer      0              1.000000
teacher     0              1.000000
Name: count, dtype: float64

除此之外,您可能希望使用分类代码,而不是is_XXX:

In [21]: pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])
Out[21]:
[european, european, european, asian, european, american, european, european, european, european]
Categories (3, object): [european, american, asian]

In [22]: df.nationality = pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])

In [23]: df
Out[23]:
  nationality  age occupation
0    european   65    teacher
1    european   31    teacher
2    european   30   engineer
3       asian   63   engineer
4    european   28   engineer
5    american   27     doctor
6    european   52     doctor
7    european   60     doctor
8    european   33     lawyer
9    european   38     lawyer

非常感谢你,安迪,你做得很好!另外,非常感谢你关于分类的说明。真的很有用。再次感谢:-)