Python将数据帧条件计数与日期时间比较条件相结合 背景

Python将数据帧条件计数与日期时间比较条件相结合 背景,python,pandas,dataframe,countif,python-datetime,Python,Pandas,Dataframe,Countif,Python Datetime,我有两个数据帧。第一个是df_player_snapshot_master,它具有锦标赛中不同时间的单个扑克玩家的快照数据,如下所示: snapshotDateTime playerID deactivationRecord 0 2021-05-06 09:28:24.987995 1679613 False 1 2021-05-06 09:28:24.987995 2660567 False 2 2021-05-06 09:2

我有两个数据帧。第一个是df_player_snapshot_master,它具有锦标赛中不同时间的单个扑克玩家的快照数据,如下所示:

    snapshotDateTime            playerID    deactivationRecord
0   2021-05-06 09:28:24.987995  1679613     False
1   2021-05-06 09:28:24.987995  2660567     False
2   2021-05-06 09:28:24.987995  2668394     False
3   2021-05-06 09:28:24.987995  2280604     False
4   2021-05-06 09:28:24.987995  2018271     False
    intervalStartDateTime       intervalEndDateTime         playersDeactivatedThisInterval
0   2021-05-06 09:28:24.987995  2021-05-06 09:28:38.605930  NaN
1   2021-05-06 09:28:38.605930  2021-05-06 09:28:47.860595  NaN
2   2021-05-06 09:28:47.860595  2021-05-06 09:28:57.187734  NaN
3   2021-05-06 09:28:57.187734  2021-05-06 09:29:07.187734  NaN
第二个数据帧,
df_tourbance_summary_master
,通过快照在锦标赛级别汇总此数据,如下所示:

    snapshotDateTime            playerID    deactivationRecord
0   2021-05-06 09:28:24.987995  1679613     False
1   2021-05-06 09:28:24.987995  2660567     False
2   2021-05-06 09:28:24.987995  2668394     False
3   2021-05-06 09:28:24.987995  2280604     False
4   2021-05-06 09:28:24.987995  2018271     False
    intervalStartDateTime       intervalEndDateTime         playersDeactivatedThisInterval
0   2021-05-06 09:28:24.987995  2021-05-06 09:28:38.605930  NaN
1   2021-05-06 09:28:38.605930  2021-05-06 09:28:47.860595  NaN
2   2021-05-06 09:28:47.860595  2021-05-06 09:28:57.187734  NaN
3   2021-05-06 09:28:57.187734  2021-05-06 09:29:07.187734  NaN
我的目标是通过计算
df_玩家_快照_master
中的记录数来填充
df_锦标赛_摘要_master['players disactivatedthis interval']
,其中
df_玩家_快照_master['snapshotDateTime']
等于
df_锦标赛_摘要_master['intervalStartDateTime']
df_player_snapshot_master['deactivationRecord']
。所以本质上是一个COUNTIFS练习,在两个数据帧之间,有多个条件

我试过的 我将解决方案转化为我的情况,结果产生了以下代码:

tournament_info = df_tournament_summary_master[['intervalStartDateTime']].values

deacs = []

for iSDT in tournament_info:
    deacs.append(len(df_player_snapshot_master[(df_player_snapshot_master['snapshotDateTime']==iSDT) &
                                  (df_player_snapshot_master['deactivationRecord']==True)])
                          )

df_tournament_summary_master['playersDeactivatedThisInterval'] = deacs
但它不断抛出一个错误:

ValueError: ('Lengths must match to compare', (332,), (1,))
我为代码建模的文章确实提到了转换为日期时间的需要,这似乎是个问题,但当我检查两个数据帧/系列的类型时,它们似乎都已经是日期时间了:

In:
display(tournament_info)

Out:
array([['2021-05-06T09:28:24.987995000'],
       ['2021-05-06T09:28:38.605930000'],
       ['2021-05-06T09:28:47.860595000'],
       ['2021-05-06T09:28:57.187734000']], dtype='datetime64[ns]')

In:
df_player_snapshot_master.dtypes

Out:
snapshotDateTime           datetime64[ns]
playerID                           object
deactivationRecord                   bool
dtype: object
当然,我看过其他通过比较日期来过滤数据帧的帖子,它们似乎都使用了不同于我想使用的
for
循环的解决方案,所以我不确定如何重新定位和调整它们的建议

问题
  • 是什么导致我得到的ValueError
  • 如何使用日期比较条件使用条件计数填充
    df\u锦标赛\u摘要\u主控['playersDeactivatedThisInterval']

  • 您可以通过在
    snapshotDateTime
    上的
    groupby
    之后计算
    deactivationRecord
    总和,然后合并数据帧来完成此操作。示例数据的结果并不令人兴奋,但它应该可以工作:

    time_groups = df_player_snapshot_master.groupby('snapshotDateTime')['deactivationRecord'].sum()
    df = pd.merge(df_tournament_summary_master, time_groups, how='left', left_on=['intervalStartDateTime'], right_on=['snapshotDateTime'])
    df = df.drop(['playersDeactivatedThisInterval'], axis = 1).rename(columns={"deactivationRecord": "playersDeactivatedThisInterval"})
    
    结果:

    间隔开始日期时间 间隔日期时间 球员们取消了本次中场休息 0 2021-05-06 09:28:24.987995 2021-05-06 09:28:38.605930 0 1. 2021-05-06 09:28:38.605930 2021-05-06 09:28:47.860595 楠 2. 2021-05-06 09:28:47.860595 2021-05-06 09:28:57.187734 楠 3. 2021-05-06 09:28:57.187734 2021-05-06 09:29:07.187734 楠
    谢谢你的回答。这确实解决了问题。我还有其他类似的应用程序,虽然我正在进行条件计数,但不确定如何将其推广到一个条件、三个条件,或者当这些条件具有非布尔数据类型时。这就是为什么我引用的链接中的
    for
    循环格式如此吸引人的原因。这是否容易推广到其他类似场景?