Python 熊猫找到字符串出现的平均值

Python 熊猫找到字符串出现的平均值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我正在使用一个数据帧来试图找到平均值,当我试图将值计数转换为分组df的平均值时,我陷入了困境。代码如下: df2 = df.groupby(['school', 'Race/Ethnicity']).size() school Race/Ethnicity school1 African American/Black 15 American

我正在使用一个数据帧来试图找到平均值,当我试图将值计数转换为分组df的平均值时,我陷入了困境。代码如下:

df2 = df.groupby(['school', 'Race/Ethnicity']).size()

school          Race/Ethnicity                        
school1         African American/Black                     15
                American Indian/Alaska Native               1
                Bi-racial/Multi-racial                      4
                Latino/a                                   53
                Other - Write In (Required)                 1
                White                                       2
school2         African American/Black                      1
                American Indian/Alaska Native               5
                Asian                                       1
                Bi-Racial/Multi-Racial                      1
                Latino/a                                   26

我有很多不同的学校,而不是规模,我想找到每个学校每个种族的平均数。如何遍历组以找到每个组的总和,然后将每行除以其组的总和?

使用
值\u计数中的
规范化
参数

df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True)

school   Race/Ethnicity               
school1  Latino/a                         0.697368
         African American/Black           0.197368
         Bi-racial/Multi-racial           0.052632
         White                            0.026316
         American Indian/Alaska Native    0.013158
         Other - Write In (Required)      0.013158
school2  Latino/a                         0.764706
         American Indian/Alaska Native    0.147059
         African American/Black           0.029412
         Asian                            0.029412
         Bi-Racial/Multi-Racial           0.029412
Name: Race/Ethnicity, dtype: float64

您也可以跳过排序

df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True, sort=False)

school   Race/Ethnicity               
school1  African American/Black           0.197368
         American Indian/Alaska Native    0.013158
         Bi-racial/Multi-racial           0.052632
         Latino/a                         0.697368
         Other - Write In (Required)      0.013158
         White                            0.026316
school2  African American/Black           0.029412
         American Indian/Alaska Native    0.147059
         Asian                            0.029412
         Bi-Racial/Multi-Racial           0.029412
         Latino/a                         0.764706
Name: Race/Ethnicity, dtype: float64

设置

df = pd.DataFrame(
    [['school1', 'African American/Black']] * 15 +
    [['school1', 'American Indian/Alaska Native']] * 1 + 
    [['school1', 'Bi-racial/Multi-racial']] * 4 +
    [['school1', 'Latino/a']] * 53 +
    [['school1', 'Other - Write In (Required)']] * 1 +
    [['school1', 'White']] * 2 +
    [['school2', 'African American/Black']] * 1 +
    [['school2', 'American Indian/Alaska Native']] * 5 +
    [['school2', 'Asian']] * 1 +
    [['school2', 'Bi-Racial/Multi-Racial']] * 1 +
    [['school2', 'Latino/a']] * 26,
    columns=['school', 'Race/Ethnicity']
)

查看示例数据会很有帮助,但听起来您可以将
df2
除以
df.groupby('school').size()
@AndrewL谢谢,这正是我需要的!我知道我让事情变得比他们需要的更难。