Python 熊猫找到字符串出现的平均值
我正在使用一个数据帧来试图找到平均值,当我试图将值计数转换为分组df的平均值时,我陷入了困境。代码如下:Python 熊猫找到字符串出现的平均值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我正在使用一个数据帧来试图找到平均值,当我试图将值计数转换为分组df的平均值时,我陷入了困境。代码如下: df2 = df.groupby(['school', 'Race/Ethnicity']).size() school Race/Ethnicity school1 African American/Black 15 American
df2 = df.groupby(['school', 'Race/Ethnicity']).size()
school Race/Ethnicity
school1 African American/Black 15
American Indian/Alaska Native 1
Bi-racial/Multi-racial 4
Latino/a 53
Other - Write In (Required) 1
White 2
school2 African American/Black 1
American Indian/Alaska Native 5
Asian 1
Bi-Racial/Multi-Racial 1
Latino/a 26
我有很多不同的学校,而不是规模,我想找到每个学校每个种族的平均数。如何遍历组以找到每个组的总和,然后将每行除以其组的总和?使用
值\u计数中的规范化参数
df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True)
school Race/Ethnicity
school1 Latino/a 0.697368
African American/Black 0.197368
Bi-racial/Multi-racial 0.052632
White 0.026316
American Indian/Alaska Native 0.013158
Other - Write In (Required) 0.013158
school2 Latino/a 0.764706
American Indian/Alaska Native 0.147059
African American/Black 0.029412
Asian 0.029412
Bi-Racial/Multi-Racial 0.029412
Name: Race/Ethnicity, dtype: float64
您也可以跳过排序
df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True, sort=False)
school Race/Ethnicity
school1 African American/Black 0.197368
American Indian/Alaska Native 0.013158
Bi-racial/Multi-racial 0.052632
Latino/a 0.697368
Other - Write In (Required) 0.013158
White 0.026316
school2 African American/Black 0.029412
American Indian/Alaska Native 0.147059
Asian 0.029412
Bi-Racial/Multi-Racial 0.029412
Latino/a 0.764706
Name: Race/Ethnicity, dtype: float64
设置
df = pd.DataFrame(
[['school1', 'African American/Black']] * 15 +
[['school1', 'American Indian/Alaska Native']] * 1 +
[['school1', 'Bi-racial/Multi-racial']] * 4 +
[['school1', 'Latino/a']] * 53 +
[['school1', 'Other - Write In (Required)']] * 1 +
[['school1', 'White']] * 2 +
[['school2', 'African American/Black']] * 1 +
[['school2', 'American Indian/Alaska Native']] * 5 +
[['school2', 'Asian']] * 1 +
[['school2', 'Bi-Racial/Multi-Racial']] * 1 +
[['school2', 'Latino/a']] * 26,
columns=['school', 'Race/Ethnicity']
)
查看示例数据会很有帮助,但听起来您可以将df2
除以df.groupby('school').size()
@AndrewL谢谢,这正是我需要的!我知道我让事情变得比他们需要的更难。