Python pandas groupby筛选器，删除一些组_Python_Pandas

Python pandas groupby筛选器，删除一些组

python pandas

Python pandas groupby筛选器，删除一些组,python,pandas,Python,Pandas,我有groupby对象 grouped = df.groupby('name') for k,group in grouped: print group 共有3组bar、foo和foobar name time 2 bar 5 3 bar 6 name time 0 foo 5 1 foo 2 name time 4 foobar 20 5 foobar

我有groupby对象

grouped = df.groupby('name')
for k,group in grouped:    
    print group

共有3组bar、foo和foobar

  name  time  
2  bar     5  
3  bar     6  


  name  time  
0  foo     5  
1  foo     2  

  name      time  
4  foobar     20  
5  foobar     1

grouped.filter(lambda x: (x.max()['time']>5))

我需要筛选这些组并删除所有时间不超过5的组。在我的示例中，应该删除foo组。我正在尝试使用函数filter（）

但是x显然不仅仅是数据帧格式的组。

假设您的最后一行代码应该是

>5

而不是

>20

，您可以执行类似的操作：

grouped.filter(lambda x: (x.time > 5).any())

正如您正确发现的那样，

实际上是一个

DataFrame

，用于

name

列与for循环中

中的键匹配的所有索引

因此，您希望根据时间列中是否有大于5的时间进行筛选，您可以执行上述

（x.time>5）。any（）

来测试它。

我还不习惯python、numpy或pandas。但我正在研究一个类似问题的解决方案，所以让我以这个问题为例来报告我的答案

import pandas as pd

df = pd.DataFrame()
df['name'] = ['foo', 'foo', 'bar', 'bar', 'foobar', 'foobar']
df['time'] = [5, 2, 5, 6, 20, 1]

grouped = df.groupby('name')
for k, group in grouped:
    print(group)

我的答覆1：我的答覆3：要点我的回答1不使用组名删除组。如果您需要组名，可以通过以下方式获得它们：

df.loc[index\u should\u drop].name.unique（）

grouped['time'].max（）结果是数据帧，所以我必须再次执行groupby（'name'），对吗？grouped.filter（lambda x:（x.time>5）.any（））.groupby（'name'））
indexes_should_drop = grouped.filter(lambda x: (x['time'].max() <= 5)).index
result1 = df.drop(index=indexes_should_drop)

filter_time_max = grouped['time'].max() > 5
groups_should_keep = filter_time_max.loc[filter_time_max].index
result2 = df.loc[df['name'].isin(groups_should_keep)]

filter_time_max = grouped['time'].max() <= 5
groups_should_drop = filter_time_max.loc[filter_time_max].index
result3 = df.drop(df[df['name'].isin(groups_should_drop)].index)

    name    time
2   bar     5
3   bar     6
4   foobar  20
5   foobar  1

name
foo        True
bar       False
foobar    False
Name: time, dtype: bool