Python将数据帧条件计数与日期时间比较条件相结合 背景
我有两个数据帧。第一个是df_player_snapshot_master,它具有锦标赛中不同时间的单个扑克玩家的快照数据,如下所示:Python将数据帧条件计数与日期时间比较条件相结合 背景,python,pandas,dataframe,countif,python-datetime,Python,Pandas,Dataframe,Countif,Python Datetime,我有两个数据帧。第一个是df_player_snapshot_master,它具有锦标赛中不同时间的单个扑克玩家的快照数据,如下所示: snapshotDateTime playerID deactivationRecord 0 2021-05-06 09:28:24.987995 1679613 False 1 2021-05-06 09:28:24.987995 2660567 False 2 2021-05-06 09:2
snapshotDateTime playerID deactivationRecord
0 2021-05-06 09:28:24.987995 1679613 False
1 2021-05-06 09:28:24.987995 2660567 False
2 2021-05-06 09:28:24.987995 2668394 False
3 2021-05-06 09:28:24.987995 2280604 False
4 2021-05-06 09:28:24.987995 2018271 False
intervalStartDateTime intervalEndDateTime playersDeactivatedThisInterval
0 2021-05-06 09:28:24.987995 2021-05-06 09:28:38.605930 NaN
1 2021-05-06 09:28:38.605930 2021-05-06 09:28:47.860595 NaN
2 2021-05-06 09:28:47.860595 2021-05-06 09:28:57.187734 NaN
3 2021-05-06 09:28:57.187734 2021-05-06 09:29:07.187734 NaN
第二个数据帧,df_tourbance_summary_master
,通过快照在锦标赛级别汇总此数据,如下所示:
snapshotDateTime playerID deactivationRecord
0 2021-05-06 09:28:24.987995 1679613 False
1 2021-05-06 09:28:24.987995 2660567 False
2 2021-05-06 09:28:24.987995 2668394 False
3 2021-05-06 09:28:24.987995 2280604 False
4 2021-05-06 09:28:24.987995 2018271 False
intervalStartDateTime intervalEndDateTime playersDeactivatedThisInterval
0 2021-05-06 09:28:24.987995 2021-05-06 09:28:38.605930 NaN
1 2021-05-06 09:28:38.605930 2021-05-06 09:28:47.860595 NaN
2 2021-05-06 09:28:47.860595 2021-05-06 09:28:57.187734 NaN
3 2021-05-06 09:28:57.187734 2021-05-06 09:29:07.187734 NaN
我的目标是通过计算df_玩家_快照_master
中的记录数来填充df_锦标赛_摘要_master['players disactivatedthis interval']
,其中df_玩家_快照_master['snapshotDateTime']
等于df_锦标赛_摘要_master['intervalStartDateTime']
和df_player_snapshot_master['deactivationRecord']
为真。所以本质上是一个COUNTIFS练习,在两个数据帧之间,有多个条件
我试过的
我将解决方案转化为我的情况,结果产生了以下代码:
tournament_info = df_tournament_summary_master[['intervalStartDateTime']].values
deacs = []
for iSDT in tournament_info:
deacs.append(len(df_player_snapshot_master[(df_player_snapshot_master['snapshotDateTime']==iSDT) &
(df_player_snapshot_master['deactivationRecord']==True)])
)
df_tournament_summary_master['playersDeactivatedThisInterval'] = deacs
但它不断抛出一个错误:
ValueError: ('Lengths must match to compare', (332,), (1,))
我为代码建模的文章确实提到了转换为日期时间的需要,这似乎是个问题,但当我检查两个数据帧/系列的类型时,它们似乎都已经是日期时间了:
In:
display(tournament_info)
Out:
array([['2021-05-06T09:28:24.987995000'],
['2021-05-06T09:28:38.605930000'],
['2021-05-06T09:28:47.860595000'],
['2021-05-06T09:28:57.187734000']], dtype='datetime64[ns]')
In:
df_player_snapshot_master.dtypes
Out:
snapshotDateTime datetime64[ns]
playerID object
deactivationRecord bool
dtype: object
当然,我看过其他通过比较日期来过滤数据帧的帖子,它们似乎都使用了不同于我想使用的for
循环的解决方案,所以我不确定如何重新定位和调整它们的建议
问题
是什么导致我得到的ValueError
如何使用日期比较条件使用条件计数填充df\u锦标赛\u摘要\u主控['playersDeactivatedThisInterval']
您可以通过在snapshotDateTime
上的groupby
之后计算deactivationRecord
的总和,然后合并数据帧来完成此操作。示例数据的结果并不令人兴奋,但它应该可以工作:
time_groups = df_player_snapshot_master.groupby('snapshotDateTime')['deactivationRecord'].sum()
df = pd.merge(df_tournament_summary_master, time_groups, how='left', left_on=['intervalStartDateTime'], right_on=['snapshotDateTime'])
df = df.drop(['playersDeactivatedThisInterval'], axis = 1).rename(columns={"deactivationRecord": "playersDeactivatedThisInterval"})
结果:
间隔开始日期时间
间隔日期时间
球员们取消了本次中场休息
0
2021-05-06 09:28:24.987995
2021-05-06 09:28:38.605930
0
1.
2021-05-06 09:28:38.605930
2021-05-06 09:28:47.860595
楠
2.
2021-05-06 09:28:47.860595
2021-05-06 09:28:57.187734
楠
3.
2021-05-06 09:28:57.187734
2021-05-06 09:29:07.187734
楠
谢谢你的回答。这确实解决了问题。我还有其他类似的应用程序,虽然我正在进行条件计数,但不确定如何将其推广到一个条件、三个条件,或者当这些条件具有非布尔数据类型时。这就是为什么我引用的链接中的for
循环格式如此吸引人的原因。这是否容易推广到其他类似场景?