如何在Python中计算组上的移位列
我有以下数据帧:如何在Python中计算组上的移位列,python,python-3.x,pandas,pandas-groupby,shift,Python,Python 3.x,Pandas,Pandas Groupby,Shift,我有以下数据帧: Circuit-ID DATETIME LATE? 78899 07/06/2018 15:30 1 78899 08/06/2018 17:30 0 78899 09/06/2018 20:30 1 23544 12/07/2017 23:30 1 23544 13/07/2017 19:30 0 23544 14/07/2017 20:30 1 我需要计算DATETIME和LATE的移位值?列以
Circuit-ID DATETIME LATE?
78899 07/06/2018 15:30 1
78899 08/06/2018 17:30 0
78899 09/06/2018 20:30 1
23544 12/07/2017 23:30 1
23544 13/07/2017 19:30 0
23544 14/07/2017 20:30 1
我需要计算DATETIME和LATE的移位值?列以获得以下结果:
Circuit DATETIME LATE? DATETIME-1 LATE-1
78899 07/06/2018 15:30 1 NA NA
78899 08/06/2018 17:30 0 07/06/2018 15:30 1
78899 09/06/2018 20:30 1 08/06/2018 17:30 0
23544 12/07/2017 23:30 1 NA NA
23544 13/07/2017 19:30 0 12/07/2017 23:30 1
23544 14/07/2017 20:30 1 13/07/2017 19:30 0
我尝试了以下代码:
df.groupby(['circuit ID, DATETILE', LATE? ]) \
.apply(lambda x : x.sort_values(by=['circuit ID, 'DATETILE', 'LATE?'], ascending = [True, True, True]))['LATE?'] \
.transform(lambda x:x.shift()) \
.reset_index(name= 'LATE-1')
但我在某些行上不断得到错误的结果,其中第一个移位值与Nan不同。
你能指出一种更干净的方法来获得想要的结果吗 使用
groupby
和shift
,然后重新加入:
df.join(df.groupby('Circuit-ID').shift().add_suffix('-1'))
Circuit-ID DATETIME LATE? DATETIME-1 LATE?-1
0 78899 07/06/2018 15:30 1 NaN NaN
1 78899 08/06/2018 17:30 0 07/06/2018 15:30 1.0
2 78899 09/06/2018 20:30 1 08/06/2018 17:30 0.0
3 23544 12/07/2017 23:30 1 NaN NaN
4 23544 13/07/2017 19:30 0 12/07/2017 23:30 1.0
5 23544 14/07/2017 20:30 1 13/07/2017 19:30 0.0
类似的解决方案使用concat
连接:
pd.concat([df, df.groupby('Circuit-ID').shift().add_suffix('-1')], axis=1)
Circuit-ID DATETIME LATE? DATETIME-1 LATE?-1
0 78899 07/06/2018 15:30 1 NaN NaN
1 78899 08/06/2018 17:30 0 07/06/2018 15:30 1.0
2 78899 09/06/2018 20:30 1 08/06/2018 17:30 0.0
3 23544 12/07/2017 23:30 1 NaN NaN
4 23544 13/07/2017 19:30 0 12/07/2017 23:30 1.0
5 23544 14/07/2017 20:30 1 13/07/2017 19:30 0.0