Python 在一列上放置重复项,断开另一列的连接

Python 在一列上放置重复项,断开另一列的连接,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有以下数据框: x = pd.DataFrame({ "item" : ["a", "a", "a", "b", "c", "c"], "vote" : [1, 0, 1, 1, 0, 0], "timestamp" : ["2020-06-07 11:04:26", "2020-06-07 11:03:37", "2020-06-07 11:09:18", "2020-06-07 11:04:40", "2020-06-07 11:09:11", "2020-06-0

我有以下数据框:

x = pd.DataFrame({
    "item" : ["a", "a", "a", "b", "c", "c"],
    "vote" : [1, 0, 1, 1, 0, 0],
    "timestamp" : ["2020-06-07 11:04:26", "2020-06-07 11:03:37", "2020-06-07 11:09:18", "2020-06-07 11:04:40", "2020-06-07 11:09:11", "2020-06-07 11:09:23"]
})

item   vote   timestamp
a      1      2020-06-07 11:04:26
a      0      2020-06-07 11:03:37
a      1      2020-06-07 11:09:18
b      1      2020-06-07 11:04:40      
c      0      2020-06-07 11:09:11
c      0      2020-06-07 11:09:23
我如何在项目列中删除重复项,并使用
timestamp
列作为分界点:保留最新的? 最终的数据帧应如下所示:

item   vote   timestamp
a      1      2020-06-07 11:09:18
b      1      2020-06-07 11:04:40      
c      0      2020-06-07 11:09:23

您可以在删除重复项之前调用“项”和“时间戳”上的
排序\u值

x.sort_values(['item', 'timestamp']).drop_duplicates('item', keep='last')

  item  vote            timestamp
2    a     1  2020-06-07 11:09:18
3    b     1  2020-06-07 11:04:40
5    c     0  2020-06-07 11:09:23
指定
keep='last'
意味着除了最后一行之外的所有行都将被丢弃,这是因为我们在上一步中根据时间戳进行了排序


另一种方式

  x['timestamp']=pd.to_datetime(x['timestamp'])#Coerce timestamp to datetime
  x.set_index('timestamp', inplace=True)#set timestamp as index
  x2=x.groupby([x.index.date,x['item']])['vote'].agg(vote='last').reset_index()
  x2.columns=['timestamp','item','vote']

您需要根据时间戳进行排序,然后将带有子集的副本放到item@YOBEN_S谢谢你,我的朋友,你也是;-)保持安全和健康~
  x['timestamp']=pd.to_datetime(x['timestamp'])#Coerce timestamp to datetime
  x.set_index('timestamp', inplace=True)#set timestamp as index
  x2=x.groupby([x.index.date,x['item']])['vote'].agg(vote='last').reset_index()
  x2.columns=['timestamp','item','vote']