Python 3.x 两个日期之间的天数差,直到满足条件为止
我很难计算出连续的天数,直到找到一个条件。 下表给出了WAREPython 3.x 两个日期之间的天数差,直到满足条件为止,python-3.x,pandas,Python 3.x,Pandas,我很难计算出连续的天数,直到找到一个条件。 下表给出了WAREGap done是我用解决方案表单获得的凌乱表格,以及预期Gap我想要获得的输出 +--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+ | Player | Result |
Gap done
是我用解决方案表单获得的凌乱表格,以及预期Gap
我想要获得的输出
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| Player | Result | Date | Gap done | Expected Gap |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| K2000 | Lose | 2015-11-13 13:42:00 | Nan | Nan/0 |
| K2000 | Lose | 2016-03-23 16:40:00 | 131.0 | 131.0 |
| K2000 | Lose | 2016-05-16 19:17:00 | 54.0 | 185.0 |
| K2000 | Win | 2016-06-09 19:36:00 | 54.0 | 239.0 #he always lose before |
| K2000 | Win | 2016-06-30 14:05:00 | 54.0 | 54.0 #because he won last time, it's 54 days btw this current date and the last time he won. |
| K2000 | Lose | 2016-07-29 16:20:00 | 29.0 | 29.0 |
| K2000 | Win | 2016-10-08 17:48:00 | 29.0 | 58.0 |
| Kssis | Lose | 2007-02-25 15:05:00 | Nan | Nan/0 |
| Kssis | Lose | 2007-04-25 16:07:00 | 59.0 | 59.0 |
| Kssis | Not ranked | 2007-06-01 16:54:00 | 37.0 | 96.0 |
| Kssis | Lose | 2007-09-09 14:33:00 | 99.0 | 195.0 |
| Kssis | Lose | 2008-04-06 16:27:00 | 210.0 | 405.0 |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
解决方案的问题在于它实际上并没有计算日期。本例中的日期有可能总是以1天为间隔
我当然适应了
def sum_days_in_row_with_condition(g):
sorted_g = g.sort_values(by='date', ascending=True)
condition = sorted_g['Result'] == 'Win'
sorted_g['days-in-a-row'] = g.date.diff().dt.days.where(~condition).ffill()
return sorted_g
但正如我向你展示的,这很混乱
所以我考虑了一个解决方案,但它需要全局变量(函数外),这有点挑剔
有人能用一种更简单的方法来解决这个问题吗
Pandas版本:0.23.4 Python版本:3.7.4IIUC,您需要找到布尔掩码
m1
,其中win
还有前一行win
。从m1
创建一个groupIDs
以分离组win
。将他们分成小组并cumsum
m = df.Result.eq('Win')
m1 = m & m.shift()
s = m1.ne(m1.shift()).cumsum()
df['Expected Gap'] = df.groupby(['Player', s])['Gap done'].cumsum()
Out[808]:
Player Result Date Gap done Expected Gap
0 K2000 Lose 2015-11-13 13:42:00 NaN NaN
1 K2000 Lose 2016-03-23 16:40:00 131.0 131.0
2 K2000 Lose 2016-05-16 19:17:00 54.0 185.0
3 K2000 Win 2016-06-09 19:36:00 54.0 239.0
4 K2000 Win 2016-06-30 14:05:00 54.0 54.0
5 K2000 Lose 2016-07-29 16:20:00 29.0 29.0
6 K2000 Win 2016-10-08 17:48:00 29.0 58.0
7 Kssis Lose 2007-02-25 15:05:00 NaN NaN
8 Kssis Lose 2007-04-25 6:07:00 59.0 59.0
9 Kssis Not-ranked 2007-06-01 16:54:00 37.0 96.0
10 Kssis Lose 2007-09-09 14:33:00 99.0 195.0
11 Kssis Lose 2008-04-06 16:27:00 210.0 405.0
多谢各位。我想补充一点。第二行和第三行给出了意想不到的结果。我并不是说你的计算与我的主题中的示例不符。我的意思是,它给了我使用的真实数据意想不到的结果。因此,我只替换了两行:
m=df.Result.eq('Win')
和s=m.shift().cumsum()
。然后,老实说,在列的第一行出现了一些奇怪的结果,如-1
,但它是可以纠正的。不管怎样,这是一条很棒的赛道,再次谢谢。不客气。很有趣。很好,你找到了答案:)