Python 删除数据框中条件首次出现后的所有行
使用以下示例:Python 删除数据框中条件首次出现后的所有行,python,pandas,Python,Pandas,使用以下示例: df = pd.DataFrame({"Person":[1,1,2,2,2,2,3,3,3], "Bank":["OPEN","OPEN","OPEN","OPEN","CLOSED","OPEN","OPEN","CLOSED","CLOSED"]})
df = pd.DataFrame({"Person":[1,1,2,2,2,2,3,3,3], "Bank":["OPEN","OPEN","OPEN","OPEN","CLOSED","OPEN","OPEN","CLOSED","CLOSED"]})
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 3 OPEN
7 3 CLOSED
8 3 CLOSED
我想生成一个输出,使每个人
组的所有行都保持在第一次出现的关闭
。所以它应该看起来像:
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
6 3 OPEN
7 3 CLOSED
我能够使用构建一个接近的输出:
mask = (df['Bank']
.where(df['Bank'] == 'OPEN')
.groupby(df['Person'])
.ffill(limit=1)
)
df[mask.notnull()]
# The above produces this
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 3 OPEN
7 3 CLOSED
因此,我当前的代码无法处理从
关闭
返回到打开
的情况。有没有一种速度不太慢的好方法呢?您可以使用groupby
创建掩码。它需要2次操作cummax
+shift
,因此直接的方法是使用较慢的apply,但对于许多组,使用内置操作的2次单独的groupby
调用会带来更好的性能
m = (df['Bank'].eq('CLOSED')
.groupby(df['Person'])
.apply(lambda x: ~x.cummax().shift().fillna(False)))
# or
m = ~(df['Bank'].eq('CLOSED')
.groupby(df['Person']).cummax()
.groupby(df['Person']).shift()
.fillna(False))
df[m]
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
6 3 OPEN
7 3 CLOSED
df=pd.DataFrame({“Person”:[1,1,2,2,2,3,3],“Bank”:[“打开”、“打开”、“打开”、“关闭”、“打开”、“关闭”])
对于范围内的i(df.shape[0]):
尝试:
如果df.iloc[i,:][“Bank”]==“CLOSED”和df.iloc[i+1,:][“Person”]==df.iloc[i,:][“Person”]:
测向下降(测向指数[i+1],轴=0,原地=True)。重置指数(True)
除:
通过
这可能不是最优雅的解决方案,但似乎适用于我的测试场景:
代码:
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
Person Bank
0 1 CLOSED
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 CLOSED
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
将熊猫作为pd导入
df1=pd.DataFrame({“Person”:[1,1,2,2,2,2,3,3],“Bank”:[“OPEN”,“OPEN”,“OPEN”,“CLOSED”,“OPEN”,“CLOSED”,“OPEN”,“CLOSED”,“CLOSED”]))
df2=pd.DataFrame({“Person”:[1,1,2,2,2,2,3,3],“Bank”:[“CLOSED”,“OPEN”,“OPEN”,“OPEN”,“CLOSED”,“OPEN”,“CLOSED”,“CLOSED”,“CLOSED”]))
def过程_数据(df):
persons=df['Person'].unique()
idx_arr=[]
对于p个人:
mask1=df['Person']==p
mask2=(df['Bank']=“已关闭”)&mask1
idx_arr+=范围(列表(mask1).索引(True),1+列表(mask1和mask2).索引(True)如果有(mask2)其他
len(mask1)-1-list(mask1)[::-1]。索引(True)+1)
返回df.iloc[idx_arr]
打印(过程数据(df1))
打印(过程数据(df2))
输入1:
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
Person Bank
0 1 CLOSED
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 CLOSED
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
输出1:
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
Person Bank
0 1 CLOSED
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 CLOSED
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
输入2:
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
Person Bank
0 1 CLOSED
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 CLOSED
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
输出2:
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 OPEN
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED
Person Bank
0 1 CLOSED
1 1 OPEN
2 2 OPEN
3 2 OPEN
4 2 CLOSED
5 2 OPEN
6 2 CLOSED
7 3 OPEN
8 3 CLOSED
9 3 CLOSED
Person Bank
0 1 CLOSED
2 2 OPEN
3 2 OPEN
4 2 CLOSED
7 3 OPEN
8 3 CLOSED