删除重复的python数据帧_Python_Pandas

删除重复的python数据帧

python pandas

删除重复的python数据帧,python,pandas,Python,Pandas,我有一个数据帧，其中有一些重复项需要删除。在下面的数据框中，如果月份、年份和类型都相同，则应保留销售额最高的行。例如： df = pd.DataFrame({'month': [1, 1, 7, 10], 'year': [2012, 2012, 2013, 2014], 'type':['C','C','S','C'], 'sale': [55, 40, 84, 31]}) 删除重复

我有一个数据帧，其中有一些重复项需要删除。在下面的数据框中，如果月份、年份和类型都相同，则应保留销售额最高的行。例如：

df = pd.DataFrame({'month': [1, 1, 7, 10],
                   'year': [2012, 2012, 2013, 2014],
                  'type':['C','C','S','C'],
                  'sale': [55, 40, 84, 31]})

删除重复项并保留列“sale”的最高值后，应如下所示：

df_2 = pd.DataFrame({'month': [1, 7, 10],
                   'year': [2012, 2013, 2014],
                  'type':['C','S','C'],
                  'sale': [55, 84, 31]})

您可以使用：

(df.sort_values('sale',ascending=False)
   .drop_duplicates(['month','year','type']).sort_index())

您可以使用：

(df.sort_values('sale',ascending=False)
   .drop_duplicates(['month','year','type']).sort_index())

您可以分组并获取销售的最大值：

df.groupby(['month', 'year', 'type']).max().reset_index()
    month   year    type    sale
0      1    2012      C      55
1      7    2013      S      84
2      10   2014      C      31

如果您有另一列，如

other

，则必须指定取

max

的列，方法如下：

df.groupby(['month', 'year', 'type'])[['sale']].max().reset_index()
    month   year    type    sale
0      1    2012      C      55
1      7    2013      S      84
2      10   2014      C      31

您可以分组并获取销售的最大值：

df.groupby(['month', 'year', 'type']).max().reset_index()
    month   year    type    sale
0      1    2012      C      55
1      7    2013      S      84
2      10   2014      C      31

如果您有另一列，如

other

，则必须指定取

max

的列，方法如下：

df.groupby(['month', 'year', 'type'])[['sale']].max().reset_index()
    month   year    type    sale
0      1    2012      C      55
1      7    2013      S      84
2      10   2014      C      31

drop_duplicates（子集=['month'，'year'，'type']，keep='first'）df.drop_duplicates（子集=['month'，'year'，'type']，keep='first'）