Python 熊猫连续事件计数的一种解决方案
我有下面的示例数据,我想对每次尝试的动作(第2页)和动作(第3页)进行一次计数(如果可用)。我已经附上了我的意思是结果。我想使用熊猫数据帧,但我不知道它是如何实现的Python 熊猫连续事件计数的一种解决方案,python,pandas,dataframe,Python,Pandas,Dataframe,我有下面的示例数据,我想对每次尝试的动作(第2页)和动作(第3页)进行一次计数(如果可用)。我已经附上了我的意思是结果。我想使用熊猫数据帧,但我不知道它是如何实现的 import pandas as pd df = pd.DataFrame({'action': ['enter', 'next', 'prev', 'error', 'exit'], 'page_number': [0, 1, 2, 3]}) 例如,我们有这些数据 action p
import pandas as pd
df = pd.DataFrame({'action': ['enter', 'next', 'prev', 'error', 'exit'],
'page_number': [0, 1, 2, 3]})
例如,我们有这些数据
action page_number
enter
next 1
prev 2
next 1
next 2
exit 3
enter
next 1
error
next 1
error
error
error
next 2
prev 3
prev 2
next 1
prev 2
prev 1
exit 0
enter
exit
我想要达到的结果是(第1课时):
下一步(页码=1)在第一个会话中出现两次,但我只想数一次。即使它在第一个会话中发生了3次,但我只想计算一次。我希望在所有操作的所有会话中都使用此规则。每个操作在每个会话中只计算一次
提前感谢您提供的任何指导首先,我们创建一个新专栏“扩展的行动”,然后按此新专栏分组:
df['extended_action'] = (df.action + ' (page number = ' + df.page_number.astype(str) + ')').where(df.action.isin(('next', 'prev')), df.action)
df.groupby('extended_action').extended_action.count()
结果(对于整个示例数据帧):
您可以重命名groupby之后的列,以获得请求格式的输出:
df.groupby('extended_action',as_index=False).action.count().rename(columns={'extended_action': 'action', 'action': 'count'})
结果:
action count
0 enter 3
1 error 4
2 exit 3
3 next (page number = 1) 5
4 next (page number = 2) 2
5 prev (page number = 1) 1
6 prev (page number = 2) 3
7 prev (page number = 3) 1
session extended_action
1 enter 1
exit 1
next (page number = 1) 2
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 4
exit 1
next (page number = 1) 3
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 2
prev (page number = 3) 1
3 enter 1
exit 1
session extended_action
1 enter 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 1
prev (page number = 3) 1
3 enter 1
exit 1
根据以下评论更新: 如果需要每个会话的计数(会话持续时间为从
enter
到exit
),则需要插入会话编号列,并按session
和extended\u action
分组:
df['session'] = df.action.eq('enter').cumsum()
df.groupby(['session','extended_action']).extended_action.count()
结果:
action count
0 enter 3
1 error 4
2 exit 3
3 next (page number = 1) 5
4 next (page number = 2) 2
5 prev (page number = 1) 1
6 prev (page number = 2) 3
7 prev (page number = 3) 1
session extended_action
1 enter 1
exit 1
next (page number = 1) 2
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 4
exit 1
next (page number = 1) 3
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 2
prev (page number = 3) 1
3 enter 1
exit 1
session extended_action
1 enter 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 1
prev (page number = 3) 1
3 enter 1
exit 1
根据以下更改的问题和评论更新2:
如果只想对每个事件计数一次,则只需在分组之前删除重复项
:
df.drop_duplicates().groupby(['session','extended_action']).extended_action.count()
结果:
action count
0 enter 3
1 error 4
2 exit 3
3 next (page number = 1) 5
4 next (page number = 2) 2
5 prev (page number = 1) 1
6 prev (page number = 2) 3
7 prev (page number = 3) 1
session extended_action
1 enter 1
exit 1
next (page number = 1) 2
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 4
exit 1
next (page number = 1) 3
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 2
prev (page number = 3) 1
3 enter 1
exit 1
session extended_action
1 enter 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 1
prev (page number = 3) 1
3 enter 1
exit 1
谢谢你的回答。但是对于action=next(页码=1),示例结果中的数字是2而不是5。换句话说,我只想为每个会话(在进入和退出操作之间)计算一次。哦,您在问题中没有提到这一点,并且您的示例数据与生成它们的代码不匹配,所需的结果也与示例数据不匹配。我根据您问题中显示的示例数据更新了我的答案,以计算每个会话。添加会话列的想法非常好。非常感谢。但是我编辑了问题中的结果,并写了一些解释。请读一下。顺便说一句,谢谢,当我运行上面的代码时,我看不到扩展_操作的数量@StefI根据要求更新了我的答案。恐怕我不太明白你所说的“我看不到扩展操作数”是什么意思