Python 如何删除副本并保留最后一个时间戳_Python_Pandas_Dataframe_Timestamp

Python 如何删除副本并保留最后一个时间戳

python pandas dataframe

Python 如何删除副本并保留最后一个时间戳,python,pandas,dataframe,timestamp,Python,Pandas,Dataframe,Timestamp,我想删除重复项并保留最后一个时间戳。要删除的重复项是customer\u id和var\u name。这是我的数据 customer_id value var_name timestamp 1 1 apple 2018-03-22 00:00:00.000 2 3 apple 2018-03-23 08:00:00.000 2

我想删除重复项并保留最后一个时间戳。要删除的重复项是

customer\u id

和

var\u name

。这是我的数据

    customer_id  value   var_name     timestamp
    1            1       apple        2018-03-22 00:00:00.000        
    2            3       apple        2018-03-23 08:00:00.000
    2            4       apple        2018-03-24 08:00:00.000
    1            1       orange       2018-03-22 08:00:00.000
    2            3       orange       2018-03-24 08:00:00.000
    2            5       orange       2018-03-23 08:00:00.000

因此，结果将是

    customer_id  value   var_name     timestamp
    1            1       apple        2018-03-22 00:00:00.000        
    2            4       apple        2018-03-24 08:00:00.000
    1            1       orange       2018-03-22 08:00:00.000
    2            3       orange       2018-03-24 08:00:00.000

我认为需要：

如果不需要排序-排序很重要：

df = df.loc[df.groupby(['customer_id','var_name'], sort=False)['timestamp'].idxmax()]
print (df)
   customer_id  value var_name           timestamp
0            1      1    apple 2018-03-22 00:00:00
2            2      4    apple 2018-03-24 08:00:00
3            1      1   orange 2018-03-22 08:00:00
4            2      3   orange 2018-03-24 08:00:00

我没有考虑按时间戳排序：（解决方案2的复杂性比解决方案1高吗？我认为第二个解决方案应该慢一点。在我的情况下，要慢得多

df = df.loc[df.groupby(['customer_id','var_name'], sort=False)['timestamp'].idxmax()]
print (df)
   customer_id  value var_name           timestamp
0            1      1    apple 2018-03-22 00:00:00
2            2      4    apple 2018-03-24 08:00:00
3            1      1   orange 2018-03-22 08:00:00
4            2      3   orange 2018-03-24 08:00:00