Python 3.x Pandas groupby和short value并以python中唯一的排名获得前3名?
我有这样的数据框Python 3.x Pandas groupby和short value并以python中唯一的排名获得前3名?,python-3.x,pandas,pandas-groupby,Python 3.x,Pandas,Pandas Groupby,我有这样的数据框 Val1 Val2 0 a 1.0 1 a 1.0 2 a 0.98 3 a 0.78 4 a 0.70 5 b 0.97 6 b 0.67 7 b 0.75 8 b
Val1 Val2
0 a 1.0
1 a 1.0
2 a 0.98
3 a 0.78
4 a 0.70
5 b 0.97
6 b 0.67
7 b 0.75
8 b 1.0
我想在Val1上做groupby,然后按降序排列val2,并获取每个组的最高唯一值
像这样
Val1 Val2
0 a 1.0 ----------- top1 of a
1 a 1.0 ----------- top1 of a
2 a 0.98 ----------- top2 of a
3 a 0.78 ------------ top3 of a
5 b 0.97
7 b 0.75
6 b 0.67
<> p>因此,如果字段中的值是相同的,则应该只考虑它的前第一名。
我试过这个
result_CI.sort_values(['Val2'],ascending=False).groupby('Val1').head(3)
但它并没有给出预期值,因为据我所知,head只是从顶部取了3个值。而且我也试过了。最大的也没有给我预期的结果。你可以做:
df[df.groupby('Val1')['Val2'].rank(method='dense',ascending=False)<=3]
#or df[df.groupby('Val1')['Val2'].apply(lambda x: x.rank(method='dense',ascending=False)<=3)]
已经有答案了,但只是增加了另一种方法
import pandas as pd
import numpy as np
c = ['Val1','Val2']
v = [
['b',1.0],
['a',1.0],
['a',1.0],
['a',0.98],
['a',0.78],
['a',0.70],
['b',0.97],
['b',0.67],
['b',0.75],
]
df = pd.DataFrame(v,columns=c)
##### Output ####
Val1 Val2
0 b 1.00
1 a 1.00
2 a 1.00
3 a 0.98
4 a 0.78
5 a 0.70
6 b 0.97
7 b 0.67
8 b 0.75
k = df.groupby(['Val1']).apply(pd.DataFrame.sort_values, 'Val2',ascending=False)
print(k)
##### Output ####
Val1 Val2
Val1
a 1 a 1.00
2 a 1.00
3 a 0.98
4 a 0.78
5 a 0.70
b 0 b 1.00
6 b 0.97
8 b 0.75
7 b 0.67
import pandas as pd
import numpy as np
c = ['Val1','Val2']
v = [
['b',1.0],
['a',1.0],
['a',1.0],
['a',0.98],
['a',0.78],
['a',0.70],
['b',0.97],
['b',0.67],
['b',0.75],
]
df = pd.DataFrame(v,columns=c)
##### Output ####
Val1 Val2
0 b 1.00
1 a 1.00
2 a 1.00
3 a 0.98
4 a 0.78
5 a 0.70
6 b 0.97
7 b 0.67
8 b 0.75
k = df.groupby(['Val1']).apply(pd.DataFrame.sort_values, 'Val2',ascending=False)
print(k)
##### Output ####
Val1 Val2
Val1
a 1 a 1.00
2 a 1.00
3 a 0.98
4 a 0.78
5 a 0.70
b 0 b 1.00
6 b 0.97
8 b 0.75
7 b 0.67