Python 3.x 两个日期之间的天数差,直到满足条件为止

Python 3.x 两个日期之间的天数差,直到满足条件为止,python-3.x,pandas,Python 3.x,Pandas,我很难计算出连续的天数,直到找到一个条件。 下表给出了WAREGap done是我用解决方案表单获得的凌乱表格,以及预期Gap我想要获得的输出 +--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+ | Player | Result |

我很难计算出连续的天数,直到找到一个条件。 下表给出了WARE
Gap done
是我用解决方案表单获得的凌乱表格,以及
预期Gap
我想要获得的输出

+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| Player |   Result   |        Date         | Gap done |                                         Expected Gap                                         |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| K2000  | Lose       | 2015-11-13 13:42:00 | Nan      | Nan/0                                                                                        |
| K2000  | Lose       | 2016-03-23 16:40:00 | 131.0    | 131.0                                                                                        |
| K2000  | Lose       | 2016-05-16 19:17:00 | 54.0     | 185.0                                                                                        |
| K2000  | Win        | 2016-06-09 19:36:00 | 54.0     | 239.0 #he always lose before                                                                 |
| K2000  | Win        | 2016-06-30 14:05:00 | 54.0     | 54.0 #because he won last time, it's 54 days btw this current date and the last time he won. |
| K2000  | Lose       | 2016-07-29 16:20:00 | 29.0     | 29.0                                                                                         |
| K2000  | Win        | 2016-10-08 17:48:00 | 29.0     | 58.0                                                                                         |
| Kssis  | Lose       | 2007-02-25 15:05:00 | Nan      | Nan/0                                                                                        |
| Kssis  | Lose       | 2007-04-25 16:07:00 | 59.0     | 59.0                                                                                         |
| Kssis  | Not ranked | 2007-06-01 16:54:00 | 37.0     | 96.0                                                                                         |
| Kssis  | Lose       | 2007-09-09 14:33:00 | 99.0     | 195.0                                                                                        |
| Kssis  | Lose       | 2008-04-06 16:27:00 | 210.0    | 405.0                                                                                        |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
解决方案的问题在于它实际上并没有计算日期。本例中的日期有可能总是以1天为间隔

我当然适应了

def sum_days_in_row_with_condition(g):
    sorted_g = g.sort_values(by='date', ascending=True)
    condition = sorted_g['Result'] == 'Win'
    sorted_g['days-in-a-row'] = g.date.diff().dt.days.where(~condition).ffill()
    return sorted_g
但正如我向你展示的,这很混乱

所以我考虑了一个解决方案,但它需要全局变量(函数外),这有点挑剔

有人能用一种更简单的方法来解决这个问题吗



Pandas版本:0.23.4 Python版本:3.7.4

IIUC,您需要找到布尔掩码
m1
,其中
win
还有前一行
win
。从
m1
创建一个groupID
s
以分离组
win
。将他们分成小组并
cumsum

m = df.Result.eq('Win')
m1 = m & m.shift()
s = m1.ne(m1.shift()).cumsum()
df['Expected Gap'] = df.groupby(['Player', s])['Gap done'].cumsum()

Out[808]:
   Player      Result                 Date  Gap done  Expected Gap
0   K2000        Lose  2015-11-13 13:42:00      NaN           NaN
1   K2000        Lose  2016-03-23 16:40:00    131.0         131.0
2   K2000        Lose  2016-05-16 19:17:00     54.0         185.0
3   K2000         Win  2016-06-09 19:36:00     54.0         239.0
4   K2000         Win  2016-06-30 14:05:00     54.0          54.0
5   K2000        Lose  2016-07-29 16:20:00     29.0          29.0
6   K2000         Win  2016-10-08 17:48:00     29.0          58.0
7   Kssis        Lose  2007-02-25 15:05:00      NaN           NaN
8   Kssis        Lose   2007-04-25 6:07:00     59.0          59.0
9   Kssis  Not-ranked  2007-06-01 16:54:00     37.0          96.0
10  Kssis        Lose  2007-09-09 14:33:00     99.0         195.0
11  Kssis        Lose  2008-04-06 16:27:00    210.0         405.0

多谢各位。我想补充一点。第二行和第三行给出了意想不到的结果。我并不是说你的计算与我的主题中的示例不符。我的意思是,它给了我使用的真实数据意想不到的结果。因此,我只替换了两行:
m=df.Result.eq('Win')
s=m.shift().cumsum()
。然后,老实说,在列
的第一行出现了一些奇怪的结果,如
-1
,但它是可以纠正的。不管怎样,这是一条很棒的赛道,再次谢谢。不客气。很有趣。很好,你找到了答案:)