Python 删除数据框中条件首次出现后的所有行

Python 删除数据框中条件首次出现后的所有行,python,pandas,Python,Pandas,使用以下示例: df = pd.DataFrame({"Person":[1,1,2,2,2,2,3,3,3], "Bank":["OPEN","OPEN","OPEN","OPEN","CLOSED","OPEN","OPEN","CLOSED","CLOSED"]})

使用以下示例:

df = pd.DataFrame({"Person":[1,1,2,2,2,2,3,3,3], "Bank":["OPEN","OPEN","OPEN","OPEN","CLOSED","OPEN","OPEN","CLOSED","CLOSED"]})

   Person   Bank
0       1   OPEN
1       1   OPEN
2       2   OPEN
3       2   OPEN
4       2   CLOSED
5       2   OPEN
6       3   OPEN
7       3   CLOSED
8       3   CLOSED
我想生成一个输出,使每个
组的所有行都保持在第一次出现的
关闭
。所以它应该看起来像:

   Person   Bank
0       1   OPEN
1       1   OPEN
2       2   OPEN
3       2   OPEN
4       2   CLOSED
6       3   OPEN
7       3   CLOSED
我能够使用构建一个接近的输出:

mask = (df['Bank']
    .where(df['Bank'] == 'OPEN')
    .groupby(df['Person'])
    .ffill(limit=1)
)
df[mask.notnull()]

# The above produces this
   Person   Bank
0       1   OPEN
1       1   OPEN
2       2   OPEN
3       2   OPEN
4       2   CLOSED
5       2   OPEN
6       3   OPEN
7       3   CLOSED

因此,我当前的代码无法处理从
关闭
返回到
打开
的情况。有没有一种速度不太慢的好方法呢?

您可以使用
groupby
创建掩码。它需要2次操作
cummax
+
shift
,因此直接的方法是使用较慢的apply,但对于许多组,使用内置操作的2次单独的
groupby
调用会带来更好的性能

m = (df['Bank'].eq('CLOSED')
       .groupby(df['Person'])
       .apply(lambda x: ~x.cummax().shift().fillna(False)))

# or
m = ~(df['Bank'].eq('CLOSED')
        .groupby(df['Person']).cummax()
        .groupby(df['Person']).shift()
        .fillna(False))

df[m]
   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
6       3    OPEN
7       3  CLOSED
df=pd.DataFrame({“Person”:[1,1,2,2,2,3,3],“Bank”:[“打开”、“打开”、“打开”、“关闭”、“打开”、“关闭”])
对于范围内的i(df.shape[0]):
尝试:
如果df.iloc[i,:][“Bank”]==“CLOSED”和df.iloc[i+1,:][“Person”]==df.iloc[i,:][“Person”]:
测向下降(测向指数[i+1],轴=0,原地=True)。重置指数(True)
除:
通过

这可能不是最优雅的解决方案,但似乎适用于我的测试场景:

代码:

   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
   Person    Bank
0       1  CLOSED
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1  CLOSED
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
将熊猫作为pd导入
df1=pd.DataFrame({“Person”:[1,1,2,2,2,2,3,3],“Bank”:[“OPEN”,“OPEN”,“OPEN”,“CLOSED”,“OPEN”,“CLOSED”,“OPEN”,“CLOSED”,“CLOSED”]))
df2=pd.DataFrame({“Person”:[1,1,2,2,2,2,3,3],“Bank”:[“CLOSED”,“OPEN”,“OPEN”,“OPEN”,“CLOSED”,“OPEN”,“CLOSED”,“CLOSED”,“CLOSED”]))
def过程_数据(df):
persons=df['Person'].unique()
idx_arr=[]
对于p个人:
mask1=df['Person']==p
mask2=(df['Bank']=“已关闭”)&mask1
idx_arr+=范围(列表(mask1).索引(True),1+列表(mask1和mask2).索引(True)如果有(mask2)其他
len(mask1)-1-list(mask1)[::-1]。索引(True)+1)
返回df.iloc[idx_arr]
打印(过程数据(df1))
打印(过程数据(df2))
输入1:

   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
   Person    Bank
0       1  CLOSED
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1  CLOSED
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
输出1:

   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
   Person    Bank
0       1  CLOSED
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1  CLOSED
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
输入2:

   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
   Person    Bank
0       1  CLOSED
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1  CLOSED
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
输出2:

   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1    OPEN
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED
   Person    Bank
0       1  CLOSED
1       1    OPEN
2       2    OPEN
3       2    OPEN
4       2  CLOSED
5       2    OPEN
6       2  CLOSED
7       3    OPEN
8       3  CLOSED
9       3  CLOSED
   Person    Bank
0       1  CLOSED
2       2    OPEN
3       2    OPEN
4       2  CLOSED
7       3    OPEN
8       3  CLOSED