Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何删除副本并保留最后一个时间戳_Python_Pandas_Dataframe_Timestamp - Fatal编程技术网

Python 如何删除副本并保留最后一个时间戳

Python 如何删除副本并保留最后一个时间戳,python,pandas,dataframe,timestamp,Python,Pandas,Dataframe,Timestamp,我想删除重复项并保留最后一个时间戳。要删除的重复项是customer\u id和var\u name。这是我的数据 customer_id value var_name timestamp 1 1 apple 2018-03-22 00:00:00.000 2 3 apple 2018-03-23 08:00:00.000 2

我想删除重复项并保留最后一个时间戳。要删除的重复项是
customer\u id
var\u name
。这是我的数据

    customer_id  value   var_name     timestamp
    1            1       apple        2018-03-22 00:00:00.000        
    2            3       apple        2018-03-23 08:00:00.000
    2            4       apple        2018-03-24 08:00:00.000
    1            1       orange       2018-03-22 08:00:00.000
    2            3       orange       2018-03-24 08:00:00.000
    2            5       orange       2018-03-23 08:00:00.000
因此,结果将是

    customer_id  value   var_name     timestamp
    1            1       apple        2018-03-22 00:00:00.000        
    2            4       apple        2018-03-24 08:00:00.000
    1            1       orange       2018-03-22 08:00:00.000
    2            3       orange       2018-03-24 08:00:00.000
我认为需要:

如果不需要排序-排序很重要:

df = df.loc[df.groupby(['customer_id','var_name'], sort=False)['timestamp'].idxmax()]
print (df)
   customer_id  value var_name           timestamp
0            1      1    apple 2018-03-22 00:00:00
2            2      4    apple 2018-03-24 08:00:00
3            1      1   orange 2018-03-22 08:00:00
4            2      3   orange 2018-03-24 08:00:00

我没有考虑按时间戳排序:(解决方案2的复杂性比解决方案1高吗?我认为第二个解决方案应该慢一点。在我的情况下,要慢得多
df = df.loc[df.groupby(['customer_id','var_name'], sort=False)['timestamp'].idxmax()]
print (df)
   customer_id  value var_name           timestamp
0            1      1    apple 2018-03-22 00:00:00
2            2      4    apple 2018-03-24 08:00:00
3            1      1   orange 2018-03-22 08:00:00
4            2      3   orange 2018-03-24 08:00:00