Python 使用pandas groupby时保留原始索引_Python_Pandas_Pandas Groupby

Python 使用pandas groupby时保留原始索引

python pandas

Python 使用pandas groupby时保留原始索引,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有以下数据框，我想按年份分组并返回最大值（但保持索引值不变）：使用pandas groupby时，我可以按年份对它们进行分组，但无法获得我想要的日期： func = lambda x: x.year df["high"].groupby(func).max() # date high # 2019 150 # 2020 100 我想要的输出是使用pandas groupby并获得： # NOTE : the date index is like the origina

我有以下数据框，我想按年份分组并返回最大值（但保持索引值不变）：

使用pandas groupby时，我可以按年份对它们进行分组，但无法获得我想要的日期：

func = lambda x: x.year
df["high"].groupby(func).max()

# date    high
# 2019    150
# 2020    100

我想要的输出是使用pandas groupby并获得：

 # NOTE : the date index is like the original

 # date         high
 # 2019-04-01   150
 # 2020-01-01   100

sort_值

然后使用

tail执行groupby

df.sort_values('high').groupby(df.index.year).tail(1)
            high
date            
2020-01-01   100
2019-04-01   150

执行df[“high”].groupby（func）.max（）
时，它不是数据帧groupby，因此输出不会携带数据帧索引
对值进行排序
，然后使用尾部执行groupby

df.sort_values('high').groupby(df.index.year).tail(1)
            high
date            
2020-01-01   100
2019-04-01   150

执行df[“high”].groupby（func）.max（）
时，它不是数据帧groupby，因此输出不会携带数据帧索引
另一种方法是使用idxmax
和loc
访问：
df.loc[df.groupby(df.index.year).high.idxmax()]

输出：
            high
date            
2019-04-01   150
2020-01-01   100

另一种方法是使用idxmax
和loc
访问：
df.loc[df.groupby(df.index.year).high.idxmax()]

输出：
            high
date            
2019-04-01   150
2020-01-01   100

您还可以使用nlargest
和droplevel

func = lambda x: x.year

df["high"].groupby(func).nlargest(1).droplevel(0)

Out[7]:
date
2019-04-01    150
2020-01-01    100
Name: high, dtype: int64

您还可以使用nlargest
和droplevel

func = lambda x: x.year

df["high"].groupby(func).nlargest(1).droplevel(0)

Out[7]:
date
2019-04-01    150
2020-01-01    100
Name: high, dtype: int64

为什么这样做有效？groupby对象代表什么？@thomas.mac当您func=lambda x:x.year df[“high”].groupby（func）.max（）时，您传递的是serise groupby而不是dataframe groupby，因此它不会传递数据帧index@thomas.mac为什么这样做有效？groupby对象代表什么？@thomas.mac当您func=lambda x:x.year df[“high”].groupby（func）.max（）时，您传递的是serise groupby而不是dataframe groupby，因此它不会传递数据帧index@thomas.mac