Python 删除数据帧中特定值之后的行（用于循环？）_Python_Pandas_Dataframe_Session

Python 删除数据帧中特定值之后的行（用于循环？）

python pandas dataframe session

Python 删除数据帧中特定值之后的行（用于循环？）,python,pandas,dataframe,session,Python,Pandas,Dataframe,Session,我有一个数据框，上面有在线商店中用户的历史记录。例如： In [1]: a = pd.DataFrame([[1, 'view', 'a'], [1, 'cart', 'b'], [2, 'cart','b'], [2, 'cart','c'], [2, 'view','d'], [2, 'purchase','d'], [2, 'view','e'], [2, 'cart','e']], columns=['user_se

我有一个数据框，上面有在线商店中用户的历史记录。例如：

In [1]:   a = pd.DataFrame([[1, 'view', 'a'], [1, 'cart', 'b'], [2, 'cart','b'], [2, 'cart','c'], [2, 'view','d'], 
                 [2, 'purchase','d'], [2, 'view','e'], [2, 'cart','e']],
                columns=['user_session', 'event_type', 'product_id'])

在一个用户会话中可以有更多的购买。我需要在第一次购买时立即删除会话中的所有其他行。我在这里找到的部分解决方案是：

df.loc[:(df['event_type'] == 'purchase').idxmax()]

但我需要遍历一个包含数百万行的庞大数据集。在这里使用for循环是一个好主意吗？这可能是一个更好的机会

另一种方法可能是建立我要删除的行的索引列表，如下所述：

但是，还有别的办法吗

非常感谢

您可以检查条件，然后在组内第一次出现条件后，使用

cummax

将条件设置为True。然后对数据帧进行切片：

mask = ~(a['event_type'].eq('purchase').groupby(a['user_session']).cummax())

a[mask]
#   user_session event_type product_id
#0             1       view          a
#1             1       cart          b
#2             2       cart          b
#3             2       cart          c
#4             2       view          d

或者，如果您还需要保留采购行，请使用两个groupbys，第二个选项为shift：

mask = ~(a['event_type'].eq('purchase')
          .groupby(a['user_session']).cummax()
          .groupby(a['user_session']).shift()
          .fillna(False))

a[mask]
#   user_session event_type product_id
#0             1       view          a
#1             1       cart          b
#2             2       cart          b
#3             2       cart          c
#4             2       view          d
#5             2   purchase          d

尝试：

输出：

   user_session event_type product_id
0             1       view          a
1             1       cart          b
2             2       cart          b
3             2       cart          c
4             2       view          d
5             2   purchase          d

如果您不想参加第一次

购买

活动，请将

apply（lambda…

替换为

.cumsum（）

此功能运行得非常快-非常感谢！非常感谢你！

mask = ~(a['event_type'].eq('purchase').groupby(a['user_session']).cummax())

a[mask]
#   user_session event_type product_id
#0             1       view          a
#1             1       cart          b
#2             2       cart          b
#3             2       cart          c
#4             2       view          d

mask = ~(a['event_type'].eq('purchase')
          .groupby(a['user_session']).cummax()
          .groupby(a['user_session']).shift()
          .fillna(False))

a[mask]
#   user_session event_type product_id
#0             1       view          a
#1             1       cart          b
#2             2       cart          b
#3             2       cart          c
#4             2       view          d
#5             2   purchase          d

to_remove = (a['event_type'].eq('purchase')
                .groupby(a['user_session'])
                .apply(lambda x: x.shift(fill_value=0).cumsum())
            )
a[to_remove == 0]

   user_session event_type product_id
0             1       view          a
1             1       cart          b
2             2       cart          b
3             2       cart          c
4             2       view          d
5             2   purchase          d