Pandas 通过比较表中另一列的先前值来创建新列

Pandas 通过比较表中另一列的先前值来创建新列,pandas,pandas-groupby,Pandas,Pandas Groupby,我有一个如下所示的数据框 Doctor Start B_ID Session Finish A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00 A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00 A 2020-01-18 13:00:00 3 S1

我有一个如下所示的数据框

Doctor   Start                B_ID  Session      Finish
    A   2020-01-18 12:00:00     1    S1         2020-01-18 12:33:00
    A   2020-01-18 12:30:00     2    S1         2020-01-18 12:52:00
    A   2020-01-18 13:00:00     3    S1         2020-01-18 13:23:00
    A   2020-01-18 13:00:00     4    S1         2020-01-18 13:37:00
    A   2020-01-18 13:30:00     5    S1         2020-01-18 13:56:00
    A   2020-01-18 14:00:00     6    S1         2020-01-18 14:15:00
    A   2020-01-18 14:00:00     7    S1         2020-01-18 14:28:00
    A   2020-01-18 14:00:00     8    S1         2020-01-18 14:40:00
    A   2020-01-18 14:00:00     9    S1         2020-01-18 15:01:00
    A   2020-01-19 12:00:00    12    S2         2020-01-19 12:20:00
    A   2020-01-19 12:30:00    13    S2         2020-01-19 12:40:00 
    A   2020-01-19 13:00:00    14    S2         2020-01-19 13:20:00
    A   2020-01-19 13:30:00    15    S2         2020-01-19 13:40:00
    A   2020-01-19 14:00:00    16    S2         2020-01-19 14:10:00
    A   2020-01-19 14:00:00    17    S2         2020-01-19 14:20:00
    A   2020-01-19 14:00:00    19    S2         2020-01-19 14:40:00
从上面的数据框中,我想基于同一会话的上一次完成和当前开始时间创建一个名为“实际开始时间”的列

台阶

if previous finish > current start:
   df['Actual_start'] = df['previous finish']
else:
    df['Actual_start'] = df['Start']
预期产出:

Doctor   Start                B_ID  Session      Finish                 Actual_start
    A   2020-01-18 12:00:00     1    S1         2020-01-18 12:33:00   2020-01-18 12:00:00
    A   2020-01-18 12:30:00     2    S1         2020-01-18 12:52:00   2020-01-18 12:33:00
    A   2020-01-18 13:00:00     3    S1         2020-01-18 13:23:00   2020-01-18 13:00:00
    A   2020-01-18 13:00:00     4    S1         2020-01-18 13:37:00   2020-01-18 13:23:00
    A   2020-01-18 13:30:00     5    S1         2020-01-18 13:56:00   2020-01-18 13:37:00
    A   2020-01-18 14:00:00     6    S1         2020-01-18 14:15:00   2020-01-18 14:00:00
    A   2020-01-18 14:00:00     7    S1         2020-01-18 14:28:00   2020-01-18 14:15:00
    A   2020-01-18 14:00:00     8    S1         2020-01-18 14:40:00   2020-01-18 14:28:00
    A   2020-01-18 14:00:00     9    S1         2020-01-18 15:01:00   2020-01-18 14:40:00
    A   2020-01-19 12:00:00    12    S2         2020-01-19 12:20:00   2020-01-19 12:00:00
    A   2020-01-19 12:30:00    13    S2         2020-01-19 12:40:00   2020-01-19 12:30:00
    A   2020-01-19 13:00:00    14    S2         2020-01-19 13:20:00   2020-01-19 13:00:00
    A   2020-01-19 13:30:00    15    S2         2020-01-19 13:40:00   2020-01-19 13:30:00
    A   2020-01-19 14:00:00    16    S2         2020-01-19 14:10:00   2020-01-19 14:00:00
    A   2020-01-19 14:00:00    17    S2         2020-01-19 14:20:00   2020-01-19 14:10:00
    A   2020-01-19 14:00:00    19    S2         2020-01-19 14:40:00   2020-01-19 14:20:00
使用group by列的第一个值,并将其替换为:



@阿里-你认为最后3个值是多少?我对它进行了测试,结果似乎是正确的,因为在最后3个值中移位的值不像
start
列,所以不是掩码中的
True
。你能核实一下吗?让我们看看。
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])

s = df.groupby('Session')['Finish'].shift()
df['Actual_start'] = np.where(s.gt(df['Start']), s, df['Start'])
print (df)
   Doctor               Start  B_ID Session              Finish  \
0       A 2020-01-18 12:00:00     1      S1 2020-01-18 12:33:00   
1       A 2020-01-18 12:30:00     2      S1 2020-01-18 12:52:00   
2       A 2020-01-18 13:00:00     3      S1 2020-01-18 13:23:00   
3       A 2020-01-18 13:00:00     4      S1 2020-01-18 13:37:00   
4       A 2020-01-18 13:30:00     5      S1 2020-01-18 13:56:00   
5       A 2020-01-18 14:00:00     6      S1 2020-01-18 14:15:00   
6       A 2020-01-18 14:00:00     7      S1 2020-01-18 14:28:00   
7       A 2020-01-18 14:00:00     8      S1 2020-01-18 14:40:00   
8       A 2020-01-18 14:00:00     9      S1 2020-01-18 15:01:00   
9       A 2020-01-19 12:00:00    12      S2 2020-01-19 12:20:00   
10      A 2020-01-19 12:30:00    13      S2 2020-01-19 12:40:00   
11      A 2020-01-19 13:00:00    14      S2 2020-01-19 13:20:00   
12      A 2020-01-19 13:30:00    15      S2 2020-01-19 13:40:00   
13      A 2020-01-19 14:00:00    16      S2 2020-01-19 14:10:00   
14      A 2020-01-19 14:00:00    17      S2 2020-01-19 14:20:00   
15      A 2020-01-19 14:00:00    19      S2 2020-01-19 14:40:00   

          Actual_start  
0  2020-01-18 12:00:00  
1  2020-01-18 12:33:00  
2  2020-01-18 13:00:00  
3  2020-01-18 13:23:00  
4  2020-01-18 13:37:00  
5  2020-01-18 14:00:00  
6  2020-01-18 14:15:00  
7  2020-01-18 14:28:00  
8  2020-01-18 14:40:00  
9  2020-01-19 12:00:00  
10 2020-01-19 12:30:00  
11 2020-01-19 13:00:00  
12 2020-01-19 13:30:00  
13 2020-01-19 14:00:00  
14 2020-01-19 14:10:00  
15 2020-01-19 14:20:00