Pandas 条件和数据帧
我正试图根据列“性别”中的值,对熊猫数据框中的值进行聚合和求和。这是我正在处理的数据集的一个示例:Pandas 条件和数据帧,pandas,dataframe,Pandas,Dataframe,我正试图根据列“性别”中的值,对熊猫数据框中的值进行聚合和求和。这是我正在处理的数据集的一个示例: df_genders = pd.DataFrame({'Country': ['US','US','US','US','US','India','India','India','UK','UK','UK','UK'], 'Gender': ['Man','Woman', 'Non-Binary,Genderqueer', 'Non-Binary'
df_genders = pd.DataFrame({'Country': ['US','US','US','US','US','India','India','India','UK','UK','UK','UK'],
'Gender': ['Man','Woman', 'Non-Binary,Genderqueer', 'Non-Binary', 'Non-Binary,Genderqueer,Non-Conforming',
'Man','Woman','Non-Binary','Man','Woman', 'Non-Binary,Genderqueer', 'Non-Binary,Genderqueer,Non-Conforming'],
'Count': [7996,915,11,34,153,3857,287,47,2566,272,72,99]})
df_genders
由于性别价值观不太一致,我想把它们归为一组,并对它们的数量进行合计,以便为每个国家得出男性、女性和非二元(非二元既不是“男性”也不是“女性”)的总和。
我无法编写条件分组和求和的代码,因此我的方法是找出每个国家的总数,然后从总数中减去男性+女性的总和,因此剩下非二进制的总和:
df_genders.groupby('Country')['Count'].sum() - df_genders[(df_genders['Gender']=='Man') | (df_genders['Gender']=='Woman')].groupby('Country')['Count'].sum()
您知道解决此问题的更好方法吗,或者通常知道执行条件聚合(group by和sum)的方法吗
谢谢大家! 您可以直接执行以下操作:
res = df_genders[~df_genders['Gender'].isin(('Man', 'Woman'))]['Count'].sum()
print(res)
输出
416
Country Gender Count grouped-genders
0 US Man 7996 Man
1 US Woman 915 Woman
2 US Non-Binary,Genderqueer 11 Non-Binary
3 US Non-Binary 34 Non-Binary
4 US Non-Binary,Genderqueer,Non-Conforming 153 Non-Binary
5 India Man 3857 Man
6 India Woman 287 Woman
7 India Non-Binary 47 Non-Binary
8 UK Man 2566 Man
9 UK Woman 272 Woman
10 UK Non-Binary,Genderqueer 72 Non-Binary
11 UK Non-Binary,Genderqueer,Non-Conforming 99 Non-Binary
grouped-genders Count
0 Man 14419
1 Non-Binary 416
2 Woman 1474
但我认为,如果您创建一个包含您正在寻找的分类的新列,例如,一种方法:
df_genders['grouped-genders'] = df_genders['Gender'].map({ 'Man' : 'Man', 'Woman' : 'Woman' }).fillna('Non-Binary')
print(df_genders)
输出
416
Country Gender Count grouped-genders
0 US Man 7996 Man
1 US Woman 915 Woman
2 US Non-Binary,Genderqueer 11 Non-Binary
3 US Non-Binary 34 Non-Binary
4 US Non-Binary,Genderqueer,Non-Conforming 153 Non-Binary
5 India Man 3857 Man
6 India Woman 287 Woman
7 India Non-Binary 47 Non-Binary
8 UK Man 2566 Man
9 UK Woman 272 Woman
10 UK Non-Binary,Genderqueer 72 Non-Binary
11 UK Non-Binary,Genderqueer,Non-Conforming 99 Non-Binary
grouped-genders Count
0 Man 14419
1 Non-Binary 416
2 Woman 1474
然后按新列分组以获得所有性别的计数:
res = df_genders.groupby('grouped-genders')['Count'].sum().reset_index()
print(res)
输出
416
Country Gender Count grouped-genders
0 US Man 7996 Man
1 US Woman 915 Woman
2 US Non-Binary,Genderqueer 11 Non-Binary
3 US Non-Binary 34 Non-Binary
4 US Non-Binary,Genderqueer,Non-Conforming 153 Non-Binary
5 India Man 3857 Man
6 India Woman 287 Woman
7 India Non-Binary 47 Non-Binary
8 UK Man 2566 Man
9 UK Woman 272 Woman
10 UK Non-Binary,Genderqueer 72 Non-Binary
11 UK Non-Binary,Genderqueer,Non-Conforming 99 Non-Binary
grouped-genders Count
0 Man 14419
1 Non-Binary 416
2 Woman 1474
谢谢你,丹尼!我在groupby中还添加了“国家”一栏,结果正是我想要的:)