Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫中的数据帧过滤_Python_Pandas_Dataframe_Filtering_Subset - Fatal编程技术网

Python 熊猫中的数据帧过滤

Python 熊猫中的数据帧过滤,python,pandas,dataframe,filtering,subset,Python,Pandas,Dataframe,Filtering,Subset,如何筛选或子集数据帧中的特定组(例如,从下面的数据帧中选择)? 我试图根据性别总结入学率/拒绝率。这个数据帧很小,但如果它大得多,比如说数万行,那么不可能对单个值进行索引会怎么样 Admit Gender Dept Freq 0 Admitted Male A 512 1 Rejected Male A 313 2 Admitted Female A 89 3 Rejected Female A 19 4

如何筛选或子集数据帧中的特定组(例如,从下面的数据帧中选择)? 我试图根据性别总结入学率/拒绝率。这个数据帧很小,但如果它大得多,比如说数万行,那么不可能对单个值进行索引会怎么样

      Admit  Gender Dept  Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted  Female    A    89
3   Rejected  Female    A    19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted  Female    B    17
7   Rejected  Female    B     8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted  Female    C   202
11  Rejected  Female    C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted  Female    D   131
15  Rejected  Female    D   244
16  Admitted    Male    E    53
17  Rejected    Male    E   138
18  Admitted  Female    E    94
19  Rejected  Female    E   299
20  Admitted    Male    F    22
21  Rejected    Male    F   351
22  Admitted  Female    F    24
23  Rejected  Female    F   317

要过滤数据,您可以使用非常全面的
query
功能

# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
        'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
        'Freq': [512, 313, 89, 19, 353, 207, 17],
        'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})

df.query('Admit == "Admitted" and Gender == "Female"')

      Admit  Freq  Gender Gender Dept
2  Admitted    89  Female           A
6  Admitted    17  Female           B
要汇总数据,请使用
groupby

group = df.groupby(['Admit', 'Gender']).sum()
print(group)

                 Freq
Admit    Gender      
Admitted Female   106
         Male     865
Rejected Female    19
         Male     520
只需在已创建的
多索引
上进行子集设置,即可过滤结果

group.loc[('Admitted', 'Female')]

Freq    106
Name: (Admitted, Female), dtype: int64

要过滤数据,您可以使用非常全面的
query
功能

# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
        'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
        'Freq': [512, 313, 89, 19, 353, 207, 17],
        'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})

df.query('Admit == "Admitted" and Gender == "Female"')

      Admit  Freq  Gender Gender Dept
2  Admitted    89  Female           A
6  Admitted    17  Female           B
要汇总数据,请使用
groupby

group = df.groupby(['Admit', 'Gender']).sum()
print(group)

                 Freq
Admit    Gender      
Admitted Female   106
         Male     865
Rejected Female    19
         Male     520
只需在已创建的
多索引
上进行子集设置,即可过滤结果

group.loc[('Admitted', 'Female')]

Freq    106
Name: (Admitted, Female), dtype: int64

查看
groupby
Ayhan,谢谢你编辑这个问题。Ami,如果这是重复的,请告诉我原始帖子。@ahmedawaji你是想留住被录取的女性,还是根据
性别和
被录取的女性来统计
Freq
?关于前者,请参见。对于后者,请您用谷歌搜索一下
groupby
+
sum
?@AmiTavory目前女性入学频率。但是我也希望能对其他的蛙泳做同样的事情(男性被认可,男性不被认可…等等)查看
groupby
Ayhan,谢谢你编辑这个问题。Ami,如果这是重复的,请告诉我原始帖子。@ahmedawaji你想留住被认可的女性吗,或者根据性别和承认来计算频率?关于前者,请参见。对于后者,请您用谷歌搜索一下
groupby
+
sum
?@AmiTavory目前女性入学频率。但是我也希望能对其他的青蛙做同样的事情(雄性被接纳,雄性不被接纳…等等)好的内容展示了如何用
多索引过滤群组
好的内容展示了如何用
多索引过滤群组