Python根据两列的分组进行排序,每列对每个输入都不同
我有以下数据帧:Python根据两列的分组进行排序,每列对每个输入都不同,python,pandas,dataframe,pandas-groupby,ranking,Python,Pandas,Dataframe,Pandas Groupby,Ranking,我有以下数据帧: Signature Genes Labels Scores Annotation CELF1 AARS 0 -5.439356884 EMPTY CELF1 AATF 0 -5.882719549 EMPTY CELF1 ABCF1 0 -6.011462342 EMPTY HNRNPC AARS 0 -6.166240
Signature Genes Labels Scores Annotation
CELF1 AARS 0 -5.439356884 EMPTY
CELF1 AATF 0 -5.882719549 EMPTY
CELF1 ABCF1 0 -6.011462342 EMPTY
HNRNPC AARS 0 -6.166240409 EMPTY
HNRNPC AATF 0 -6.432658981 EMPTY
HNRNPC ABCF1 0 -6.476526092 EMPTY
FUS AARS 0 -5.646015964 EMPTY
FUS AATF 0 -6.224914841 EMPTY
FUS ABCF1 0 -6.395334389 EMPTY
我想在签名栏中根据“分数”列对我的“分数”列进行排名,根据分数列对“基因”进行排名,以便
Signature Genes Labels Scores Annotation Rank
CELF1 AARS 0 -5.439356884 EMPTY 1
CELF1 AATF 0 -5.882719549 EMPTY 2
CELF1 ABCF1 0 -6.011462342 EMPTY 3
HNRNPC AARS 0 -6.166240409 EMPTY 1
HNRNPC AATF 0 -6.432658981 EMPTY 2
HNRNPC ABCF1 0 -6.476526092 EMPTY 3
FUS AARS 0 -5.646015964 EMPTY 1
FUS AATF 0 -6.224914841 EMPTY 2
FUS ABCF1 0 -6.395334389 EMPTY 3
我是根据《邮报》跟进的。我的代码是这样的:
data=pd.read_csv("trial1.csv",sep='\t')
data['max_score'] = data.groupby(['Signature','Genes'])['Scores'].transform('max').astype(float)
data['rank']=data.groupby('Signature')['max_score'].rank()
但是,我的分数根据绝对值进行排名,如下所示:
Signature Genes Labels Scores Annotation Rank
CELF1 ABCF1 0 -6.011462342 EMPTY 1
CELF1 AATF 0 -5.882719549 EMPTY 2
CELF1 AARS 0 -5.439356884 EMPTY 3
HNRNPC ABCF1 0 -6.476526092 EMPTY 1
HNRNPC AATF 0 -6.432658981 EMPTY 2
HNRNPC AARS 0 -6.166240409 EMPTY 3
FUS ABCF1 0 -6.395334389 EMPTY 1
FUS AATF 0 -6.224914841 EMPTY 2
FUS AARS 0 -5.646015964 EMPTY 3
排名不是按绝对值排序。它是按升序排序的,这是它的默认值。您只需将对
rank()
的调用更改为rank(升序=False)
。看