Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/gwt/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为每个组选择最大值_Python_Pandas - Fatal编程技术网

Python 为每个组选择最大值

Python 为每个组选择最大值,python,pandas,Python,Pandas,所以我有一个包含多列和id列的熊猫数据框 df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD')) df['id'] = ['CA', 'CA', 'CA', 'FL', 'FL', 'FL'] df['technique'] = ['one', 'two', 'three', 'one', 'two', 'three'] df 我想按id列分组并选择概率最高的行。所以它可能看起来像这样 id highest_prob

所以我有一个包含多列和id列的熊猫数据框

df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'))
df['id'] = ['CA', 'CA', 'CA', 'FL', 'FL', 'FL']
df['technique'] = ['one', 'two', 'three', 'one', 'two', 'three']
df
我想按id列分组并选择概率最高的行。所以它可能看起来像这样

id   highest_prob   technique
CA   B               three 
FL   C               one
我试过这样的方法,但那只能让我半途而废

df.groupby('id', as_index=False)[['A','B','C','D']].max() 
有人对我如何获得想要的结果有什么建议吗

设置

np.random.seed(0)  # Add seed to reproduce results. 
df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'))
df['id'] = ['CA', 'CA', 'CA', 'FL', 'FL', 'FL']
df['technique'] = ['one', 'two', 'three', 'one', 'two', 'three']
您可以
melt
,使用
sort\u值进行排序
,并使用
drop\u duplicates
删除重复项:

(df.melt(['id', 'technique'])
   .sort_values(['id', 'value'], ascending=[True, False])
   .drop_duplicates('id')
   .drop('value', 1)
   .reset_index(drop=True)
   .rename({'variable': 'highest_prob'}, axis=1))

   id technique highest_prob
0  CA       one            D
1  FL       two            A

另一种解决方案是使用
melt
groupby

v = df.melt(['id', 'technique'])
(v.iloc[v.groupby('id').value.idxmax()]
  .drop('value', 1)
  .reset_index(drop=True)
  .rename({'variable': 'highest_prob'}, axis=1))

   id technique highest_prob
0  CA       one            D
1  FL       two            A