Python 3.x Python数据帧-删除重复值？_Python 3.x_Pandas

Python 3.x Python数据帧-删除重复值？

python-3.x pandas

Python 3.x Python数据帧-删除重复值？,python-3.x,pandas,Python 3.x,Pandas,我有一个程序可以完全运行，但不幸的是，由于基本数据的结构，它返回了重复的数据。结果如下所示： Date Amount Source Type 7/16/2019 10 A B 7/17/2019 10 A B 7/15/2019 10 A B 7/15/2019 10 B B I'd like to return: Date A

我有一个程序可以完全运行，但不幸的是，由于基本数据的结构，它返回了重复的数据。结果如下所示：

   Date      Amount   Source   Type
  7/16/2019  10        A       B
  7/17/2019  10        A       B
  7/15/2019  10        A       B
  7/15/2019  10        B       B

I'd like to return:
   Date      Amount   Source   Type
  7/17/2019   10        A       B
  7/15/2019   10        B       B

之所以选择2019年7月17日，是因为这是我们从来源A和类型B收到10的最后日期

我试过：

df.drop_duplicates(subset='a','b','date', keep="last")

但它不太管用。有更好的方法吗

这起作用了

df[df.Date.eq(df.groupby(['Source','Type'])['Date'].transform('max'))]

如下列文件所述：

df.index.duplicated（keep='first'）

返回包含真/假值的索引。如果值重复，则为True，否则为False。然后，

~df.index.duplicated（keep='first'）

在值不重复的地方返回True

最后，

df.loc[non\u duplicate\u index]

是一种切片方法，它返回df行，其中

non\u duplicate\u index

为真。

如中所述：

df.index.duplicated（keep='first'）

返回包含真/假值的索引。如果值重复，则为True，否则为False。然后，

~df.index.duplicated（keep='first'）

在值不重复的地方返回True

最后，

df.loc[non\u duplicate\u index]

是一种切片方法，它返回df行，其中

non\u duplicate\u index

为真。

删除重复项也可以
df.sort_values('Date').drop_duplicates(subset=['Source','Type'], keep="last") 
Out[566]: 
        Date  Amount Source Type
3 2019-07-15      10      B    B
1 2019-07-17      10      A    B

drop_duplicates
也可以
df.sort_values('Date').drop_duplicates(subset=['Source','Type'], keep="last") 
Out[566]: 
        Date  Amount Source Type
3 2019-07-15      10      B    B
1 2019-07-17      10      A    B

df[df.Date.eq（df.groupby（['Source'，'Type']）['Date'].transform（'max'））]
两者都抛出了这个错误，我没有处理太多：valueError:无法从副本重新编制索引axis@user2679225df[df.Date.eq（df.groupby（['Source'，'Type']）['Date'].transform（'max'）.values）]df[df.Date.eq（df.groupby（['Source'，'Type']）['Date].transform（'max'））]
两者都会抛出我没有处理过的错误：valueError:无法从副本重新编制索引axis@user2679225df[df.Date.eq（df.groupby（['Source'，'Type']）['Date'].transform（'max'）.values）]