使用Python中的Pandas，为每个组选择值最高的行_Python_Pandas

使用Python中的Pandas，为每个组选择值最高的行

python pandas

使用Python中的Pandas，为每个组选择值最高的行,python,pandas,Python,Pandas,对于熊猫，对于以下数据集 author1,category1,10.00 author1,category2,15.00 author1,category3,12.00 author2,category1,5.00 author2,category2,6.00 author2,category3,4.00 author2,category4,9.00 author3,category1,7.00 author3,category2,4.00 author3,category3,7.00 我想为

对于熊猫，对于以下数据集

author1,category1,10.00
author1,category2,15.00
author1,category3,12.00
author2,category1,5.00
author2,category2,6.00
author2,category3,4.00
author2,category4,9.00
author3,category1,7.00
author3,category2,4.00
author3,category3,7.00

我想为每个作者获得最高值

author1,category2,15.00
author2,category4,9.00
author3,category1,7.00
author3,category3,7.00

对不起，我是个笨蛋

import pandas as pd

df = pd.read_csv("in.csv", names=("Author","Cat","Val"))

print(df.groupby(['Author'])['Val'].max())

要获取df，请执行以下操作：

inds = df.groupby(['Author'])['Val'].transform(max) == df['Val']
df = df[inds]
df.reset_index(drop=True, inplace=True)
print(df)
    Author        Cat  Val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7

要获取df，请执行以下操作：

inds = df.groupby(['Author'])['Val'].transform(max) == df['Val']
df = df[inds]
df.reset_index(drop=True, inplace=True)
print(df)
    Author        Cat  Val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7

由于您也想检索category列，所以列val上的标准.agg不会提供您想要的内容。此外，由于author3中有两个值为7，因此@Padraic Cunningham using.max的方法将只返回一个实例，而不是两个实例。您可以定义自定义的应用函数来完成任务

import pandas as pd

# your data, assume columns names are: author, cat, val
# ===============================
print(df)


    author        cat  val
0  author1  category1   10
1  author1  category2   15
2  author1  category3   12
3  author2  category1    5
4  author2  category2    6
5  author2  category3    4
6  author2  category4    9
7  author3  category1    7
8  author3  category2    4
9  author3  category3    7

# processing
# ====================================
def func(group):
    return group.loc[group['val'] == group['val'].max()]

df.groupby('author', as_index=False).apply(func).reset_index(drop=True)


    author        cat  val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7

import pandas as pd

# your data, assume columns names are: author, cat, val
# ===============================
print(df)


    author        cat  val
0  author1  category1   10
1  author1  category2   15
2  author1  category3   12
3  author2  category1    5
4  author2  category2    6
5  author2  category3    4
6  author2  category4    9
7  author3  category1    7
8  author3  category2    4
9  author3  category3    7

# processing
# ====================================
def func(group):
    return group.loc[group['val'] == group['val'].max()]

df.groupby('author', as_index=False).apply(func).reset_index(drop=True)


    author        cat  val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7

非常感谢，这确实有效，但在我看来比李建勋提议的更为神秘：我不明白为什么它能获得df位。非常感谢，这确实有效，但在我看来比李建勋提议的更为神秘：我不明白为什么它能获得df位。谢谢你，李建勋。我需要问一个后续问题，关于如何仅获取作者只有一个最昂贵类别的行。@Mike所说的最昂贵，是指具有最高值的类别吗？如果是这样的话，我认为您只需要从最后一个数据帧中选择author和cat列。我在这里发了另一个问题：谢谢你，李建勋。我需要问一个后续问题，关于如何仅获取作者只有一个最昂贵类别的行。@Mike所说的最昂贵，是指具有最高值的类别吗？如果是这样的话，我认为您只需要从最后一个数据帧中选择author和cat列。我在这里发布了我的另一个问题：