Python 删除重复项时，GroupBy之后将丢失索引_Python_Pandas_Dataframe_Group By_Pandas Groupby

Python 删除重复项时，GroupBy之后将丢失索引

python pandas dataframe

Python 删除重复项时，GroupBy之后将丢失索引,python,pandas,dataframe,group-by,pandas-groupby,Python,Pandas,Dataframe,Group By,Pandas Groupby,我正在尝试保留所有行，除了具有非最大数量的重复行。因此，最终我将拥有所有不重复的行输入预期产出：这给了我一个没有索引值w.r.t df的序列。如何获得预期的输出？您可以避免使用groupby并使用排序\u值和删除重复项来保留索引： df.sort_values('amount', ascending=False).drop_duplicates('name').sort_index() name amount 2 a 5000 4 b 2000

我正在尝试保留所有行，除了具有非最大数量的重复行。因此，最终我将拥有所有不重复的行

输入

预期产出：

这给了我一个没有索引值w.r.t df的序列。

如何获得预期的输出？

您可以避免使用

groupby

并使用

排序\u值和删除重复项来保留索引：
df.sort_values('amount', ascending=False).drop_duplicates('name').sort_index()


   name  amount
2     a    5000
4     b    2000
5     c    3000
6     d    4000
7     e    5000
8     f    6000
9     g    7000
11    h   10000

您可以通过以下方法避免最后一次调用排序索引：
df[~df.sort_values('amount', ascending=False).name.duplicated()]

   name  amount
2     a    5000
4     b    2000
5     c    3000
6     d    4000
7     e    5000
8     f    6000
9     g    7000
11    h   10000

了解到布尔索引将重新索引数据帧。您必须接受UserWarning
，但是：

特例

因为您的数据似乎已经排序，所以您只需执行以下操作即可
df[~df.duplicated('name', keep='last')]

   name  amount
2     a    5000
4     b    2000
5     c    3000
6     d    4000
7     e    5000
8     f    6000
9     g    7000
11    h   10000

但是，这通常不起作用。
可能会检查idxmax

df.loc[df.groupby('name').amount.idxmax()]
   name  amount
2     a    5000
4     b    2000
5     c    3000
6     d    4000
7     e    5000
8     f    6000
9     g    7000
11    h   10000

更好的解决方案是不使用groupby:df.sort\u值（'amount'，升序=False）。drop\u重复项（'name'）
UserWarning: Boolean Series key will be reindexed to match DataFrame index.

df[~df.duplicated('name', keep='last')]

   name  amount
2     a    5000
4     b    2000
5     c    3000
6     d    4000
7     e    5000
8     f    6000
9     g    7000
11    h   10000

df.loc[df.groupby('name').amount.idxmax()]
   name  amount
2     a    5000
4     b    2000
5     c    3000
6     d    4000
7     e    5000
8     f    6000
9     g    7000
11    h   10000