Python 熊猫中的数据帧过滤
如何筛选或子集数据帧中的特定组(例如,从下面的数据帧中选择)? 我试图根据性别总结入学率/拒绝率。这个数据帧很小,但如果它大得多,比如说数万行,那么不可能对单个值进行索引会怎么样Python 熊猫中的数据帧过滤,python,pandas,dataframe,filtering,subset,Python,Pandas,Dataframe,Filtering,Subset,如何筛选或子集数据帧中的特定组(例如,从下面的数据帧中选择)? 我试图根据性别总结入学率/拒绝率。这个数据帧很小,但如果它大得多,比如说数万行,那么不可能对单个值进行索引会怎么样 Admit Gender Dept Freq 0 Admitted Male A 512 1 Rejected Male A 313 2 Admitted Female A 89 3 Rejected Female A 19 4
Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317
要过滤数据,您可以使用非常全面的
query
功能
# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
'Freq': [512, 313, 89, 19, 353, 207, 17],
'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})
df.query('Admit == "Admitted" and Gender == "Female"')
Admit Freq Gender Gender Dept
2 Admitted 89 Female A
6 Admitted 17 Female B
要汇总数据,请使用groupby
group = df.groupby(['Admit', 'Gender']).sum()
print(group)
Freq
Admit Gender
Admitted Female 106
Male 865
Rejected Female 19
Male 520
只需在已创建的多索引
上进行子集设置,即可过滤结果
group.loc[('Admitted', 'Female')]
Freq 106
Name: (Admitted, Female), dtype: int64
要过滤数据,您可以使用非常全面的
query
功能
# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
'Freq': [512, 313, 89, 19, 353, 207, 17],
'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})
df.query('Admit == "Admitted" and Gender == "Female"')
Admit Freq Gender Gender Dept
2 Admitted 89 Female A
6 Admitted 17 Female B
要汇总数据,请使用groupby
group = df.groupby(['Admit', 'Gender']).sum()
print(group)
Freq
Admit Gender
Admitted Female 106
Male 865
Rejected Female 19
Male 520
只需在已创建的多索引
上进行子集设置,即可过滤结果
group.loc[('Admitted', 'Female')]
Freq 106
Name: (Admitted, Female), dtype: int64
查看
groupby
Ayhan,谢谢你编辑这个问题。Ami,如果这是重复的,请告诉我原始帖子。@ahmedawaji你是想留住被录取的女性,还是根据性别和被录取的女性来统计Freq
?关于前者,请参见。对于后者,请您用谷歌搜索一下groupby
+sum
?@AmiTavory目前女性入学频率。但是我也希望能对其他的蛙泳做同样的事情(男性被认可,男性不被认可…等等)查看groupby
Ayhan,谢谢你编辑这个问题。Ami,如果这是重复的,请告诉我原始帖子。@ahmedawaji你想留住被认可的女性吗,或者根据性别和承认来计算频率?关于前者,请参见。对于后者,请您用谷歌搜索一下groupby
+sum
?@AmiTavory目前女性入学频率。但是我也希望能对其他的青蛙做同样的事情(雄性被接纳,雄性不被接纳…等等)好的内容展示了如何用多索引过滤群组
好的内容展示了如何用多索引过滤群组