Python 有条件回填混凝土柱

Python 有条件回填混凝土柱,python,pandas,Python,Pandas,我有以下数据帧: DATE ID STATUS 0 2014-01-01 1 INPROGRESS 1 2013-03-01 1 ENDED 2 2015-05-01 2 INPROGRESS 3 2012-05-01 1 STARTED 4 2011-05-01 2 STARTED 5 2011-03-01 3 STARTED 6 2011-04-01 3 ENDED

我有以下数据帧:

     DATE       ID      STATUS
0  2014-01-01   1  INPROGRESS
1  2013-03-01   1       ENDED
2  2015-05-01   2  INPROGRESS
3  2012-05-01   1     STARTED
4  2011-05-01   2     STARTED
5  2011-03-01   3     STARTED
6  2011-04-01   3       ENDED
7  2011-06-01   3  INPROGRESS
8  2011-09-01   3     STARTED
keymapping = {'STARTED':0, 'INPROGRESS':1, 'ENDED':2}
df['STATUS_ID'] = df.STATUS.map(keymapping)
df.set_index(['ID', 'DATE'], inplace=True)
df.sort_index(inplace=True)
下面是构建它的代码:

>>> df1 = pd.DataFrame(columns=["DATE", "ID", "STATUS"])
>>> df1["DATE"] = ['2014-01-01', '2013-03-01', '2015-05-01', '2012-05-01', '2011-05-01', '2011-03-01', '2011-04-01', '2011-06-01', '2011-09-01']
>>> df1["ID"] = [1,1,2,1,2,3,3,3,3]
>>> df1["STATUS"] = ['INPROGRESS', 'ENDED', 'INPROGRESS', 'STARTED', 'STARTED', 'STARTED','ENDED', 'INPROGRESS', 'STARTED']
对于每个ID组,“状态”列表示可以执行的任务:

开始、进行或结束

在这个精确的时间顺序中(开始不应该在结束之后出现等等)

通过按ID分组并按ID 3的日期排序:

df1.sort_values('DATE')[df1['ID']==3]

     DATE        ID      STATUS
 5  2011-03-01   3     STARTED
 6  2011-04-01   3       ENDED
 7  2011-06-01   3  INPROGRESS
 8  2011-09-01   3     STARTED
不,我需要“修复”状态列,以遵循上面基于上一个状态定义的顺序。对于ID 3,最后一个状态为started,因此应按如下方式将所有内容回填到started状态:

     DATE        ID      STATUS
 5  2011-03-01   3     STARTED
 6  2011-04-01   3     STARTED
 7  2011-06-01   3     STARTED
 8  2011-09-01   3     STARTED
对于ID 1:

df1.sort_values('DATE')[df1['ID']==1]
     DATE  ID      STATUS
3  2012-05-01   1     STARTED
1  2013-03-01   1       ENDED
0  2014-01-01   1  INPROGRESS
最后两个状态都在进行中,第一个状态保持为开始状态,如下所示:

df1.sort_values('DATE')[df1['ID']==1]
     DATE  ID      STATUS
3  2012-05-01   1     STARTED
1  2013-03-01   1  INPROGRESS
0  2014-01-01   1  INPROGRESS
ID 2的顺序正确

你知道我怎样才能对熊猫做到这一点吗? 我正在尝试按ID分组,我正在考虑根据最后一个状态进行回填,但我不知道如何在适当的时候停止回填


谢谢

一个经典的方法是忘记你的状态是标签:而是将它们视为严格递增的数字,如开始1、进行中2和结束3。使用这样的列,您现在可以检查每个组中这些数字的单调性,然后进行回填,直到您看到单调性出现中断

准备数据帧:

     DATE       ID      STATUS
0  2014-01-01   1  INPROGRESS
1  2013-03-01   1       ENDED
2  2015-05-01   2  INPROGRESS
3  2012-05-01   1     STARTED
4  2011-05-01   2     STARTED
5  2011-03-01   3     STARTED
6  2011-04-01   3       ENDED
7  2011-06-01   3  INPROGRESS
8  2011-09-01   3     STARTED
keymapping = {'STARTED':0, 'INPROGRESS':1, 'ENDED':2}
df['STATUS_ID'] = df.STATUS.map(keymapping)
df.set_index(['ID', 'DATE'], inplace=True)
df.sort_index(inplace=True)
现在,按ID分组并使用
transform
获取整个索引中每个组的最后一个值,以便您可以将其作为新列分配给数据帧:

df['STATUS_LAST'] = df.groupby(level=0, as_index=False).STATUS_ID.transform('last')

df
Out[63]: 
                   STATUS  STATUS_ID  STATUS_LAST
ID DATE                                          
1  2012-05-01     STARTED          0            1
   2013-03-01       ENDED          2            1
   2014-01-01  INPROGRESS          1            1
2  2011-05-01     STARTED          0            1
   2015-05-01  INPROGRESS          1            1
3  2011-03-01     STARTED          0            0
   2011-04-01       ENDED          2            0
   2011-06-01  INPROGRESS          1            0
   2011-09-01     STARTED          0            0
最后,使用
STATUS\u ID
相对于last的递增单调性应用回填,即当if小于或等于
STATUS\u last
时,
STATUS\u ID
的每个值都有效:

df.STATUS_ID = df.STATUS_ID.where(df.STATUS_ID <= df.STATUS_LAST, df.STATUS_LAST)
df.STATUS_ID
Out[65]: 
ID  DATE      
1   2012-05-01    0
    2013-03-01    1
    2014-01-01    1
2   2011-05-01    0
    2015-05-01    1
3   2011-03-01    0
    2011-04-01    0
    2011-06-01    0
    2011-09-01    0