Python 熊猫连续事件计数的一种解决方案

Python 熊猫连续事件计数的一种解决方案,python,pandas,dataframe,Python,Pandas,Dataframe,我有下面的示例数据,我想对每次尝试的动作(第2页)和动作(第3页)进行一次计数(如果可用)。我已经附上了我的意思是结果。我想使用熊猫数据帧,但我不知道它是如何实现的 import pandas as pd df = pd.DataFrame({'action': ['enter', 'next', 'prev', 'error', 'exit'], 'page_number': [0, 1, 2, 3]}) 例如,我们有这些数据 action p

我有下面的示例数据,我想对每次尝试的动作(第2页)和动作(第3页)进行一次计数(如果可用)。我已经附上了我的意思是结果。我想使用熊猫数据帧,但我不知道它是如何实现的

import pandas as pd
df = pd.DataFrame({'action': ['enter', 'next', 'prev', 'error', 'exit'], 
                   'page_number': [0, 1, 2, 3]})
例如,我们有这些数据

action     page_number
enter      
next        1
prev        2
next        1
next        2
exit        3
enter       
next        1
error       
next        1
error
error
error
next        2
prev        3
prev        2
next        1
prev        2
prev        1
exit        0
enter 
exit
我想要达到的结果是(第1课时):

下一步(页码=1)在第一个会话中出现两次,但我只想数一次。即使它在第一个会话中发生了3次,但我只想计算一次。我希望在所有操作的所有会话中都使用此规则。每个操作在每个会话中只计算一次


提前感谢您提供的任何指导

首先,我们创建一个新专栏“扩展的行动”,然后按此新专栏分组:

df['extended_action'] = (df.action + ' (page number = ' + df.page_number.astype(str) + ')').where(df.action.isin(('next', 'prev')), df.action)
df.groupby('extended_action').extended_action.count()
结果(对于整个示例数据帧):


您可以重命名groupby之后的列,以获得请求格式的输出:

df.groupby('extended_action',as_index=False).action.count().rename(columns={'extended_action': 'action', 'action': 'count'})
结果:

                   action  count
0                   enter      3
1                   error      4
2                    exit      3
3  next (page number = 1)      5
4  next (page number = 2)      2
5  prev (page number = 1)      1
6  prev (page number = 2)      3
7  prev (page number = 3)      1
session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    2
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     4
         exit                      1
         next (page number = 1)    3
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    2
         prev (page number = 3)    1
3        enter                     1
         exit                      1
session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    1
         prev (page number = 3)    1
3        enter                     1
         exit                      1

根据以下评论更新:

如果需要每个会话的计数(会话持续时间为从
enter
exit
),则需要插入会话编号列,并按
session
extended\u action
分组:

df['session'] = df.action.eq('enter').cumsum()
df.groupby(['session','extended_action']).extended_action.count()
结果:

                   action  count
0                   enter      3
1                   error      4
2                    exit      3
3  next (page number = 1)      5
4  next (page number = 2)      2
5  prev (page number = 1)      1
6  prev (page number = 2)      3
7  prev (page number = 3)      1
session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    2
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     4
         exit                      1
         next (page number = 1)    3
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    2
         prev (page number = 3)    1
3        enter                     1
         exit                      1
session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    1
         prev (page number = 3)    1
3        enter                     1
         exit                      1
根据以下更改的问题和评论更新2: 如果只想对每个事件计数一次,则只需在分组之前
删除重复项

df.drop_duplicates().groupby(['session','extended_action']).extended_action.count()
结果:

                   action  count
0                   enter      3
1                   error      4
2                    exit      3
3  next (page number = 1)      5
4  next (page number = 2)      2
5  prev (page number = 1)      1
6  prev (page number = 2)      3
7  prev (page number = 3)      1
session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    2
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     4
         exit                      1
         next (page number = 1)    3
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    2
         prev (page number = 3)    1
3        enter                     1
         exit                      1
session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    1
         prev (page number = 3)    1
3        enter                     1
         exit                      1

谢谢你的回答。但是对于action=next(页码=1),示例结果中的数字是2而不是5。换句话说,我只想为每个会话(在进入和退出操作之间)计算一次。哦,您在问题中没有提到这一点,并且您的示例数据与生成它们的代码不匹配,所需的结果也与示例数据不匹配。我根据您问题中显示的示例数据更新了我的答案,以计算每个会话。添加会话列的想法非常好。非常感谢。但是我编辑了问题中的结果,并写了一些解释。请读一下。顺便说一句,谢谢,当我运行上面的代码时,我看不到扩展_操作的数量@StefI根据要求更新了我的答案。恐怕我不太明白你所说的“我看不到扩展操作数”是什么意思