Python 在一列上放置重复项，断开另一列的连接_Python_Python 3.x_Pandas

Python 在一列上放置重复项，断开另一列的连接

python python-3.x pandas

Python 在一列上放置重复项，断开另一列的连接,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有以下数据框： x = pd.DataFrame({ "item" : ["a", "a", "a", "b", "c", "c"], "vote" : [1, 0, 1, 1, 0, 0], "timestamp" : ["2020-06-07 11:04:26", "2020-06-07 11:03:37", "2020-06-07 11:09:18", "2020-06-07 11:04:40", "2020-06-07 11:09:11", "2020-06-0

我有以下数据框：

x = pd.DataFrame({
    "item" : ["a", "a", "a", "b", "c", "c"],
    "vote" : [1, 0, 1, 1, 0, 0],
    "timestamp" : ["2020-06-07 11:04:26", "2020-06-07 11:03:37", "2020-06-07 11:09:18", "2020-06-07 11:04:40", "2020-06-07 11:09:11", "2020-06-07 11:09:23"]
})

item   vote   timestamp
a      1      2020-06-07 11:04:26
a      0      2020-06-07 11:03:37
a      1      2020-06-07 11:09:18
b      1      2020-06-07 11:04:40      
c      0      2020-06-07 11:09:11
c      0      2020-06-07 11:09:23

我如何在项目列中删除重复项，并使用

timestamp

列作为分界点：保留最新的？最终的数据帧应如下所示：

item   vote   timestamp
a      1      2020-06-07 11:09:18
b      1      2020-06-07 11:04:40      
c      0      2020-06-07 11:09:23

您可以在删除重复项之前调用“项”和“时间戳”上的

排序\u值

：

x.sort_values(['item', 'timestamp']).drop_duplicates('item', keep='last')

  item  vote            timestamp
2    a     1  2020-06-07 11:09:18
3    b     1  2020-06-07 11:04:40
5    c     0  2020-06-07 11:09:23

指定

keep='last'

意味着除了最后一行之外的所有行都将被丢弃，这是因为我们在上一步中根据时间戳进行了排序

另一种方式

  x['timestamp']=pd.to_datetime(x['timestamp'])#Coerce timestamp to datetime
  x.set_index('timestamp', inplace=True)#set timestamp as index
  x2=x.groupby([x.index.date,x['item']])['vote'].agg(vote='last').reset_index()
  x2.columns=['timestamp','item','vote']

您需要根据时间戳进行排序，然后将带有子集的副本放到item@YOBEN_S谢谢你，我的朋友，你也是；-）保持安全和健康~

  x['timestamp']=pd.to_datetime(x['timestamp'])#Coerce timestamp to datetime
  x.set_index('timestamp', inplace=True)#set timestamp as index
  x2=x.groupby([x.index.date,x['item']])['vote'].agg(vote='last').reset_index()
  x2.columns=['timestamp','item','vote']