R 为什么该函数不为每组簇列的得分列提供最高值,即排名靠前?

R 为什么该函数不为每组簇列的得分列提供最高值,即排名靠前?,r,rank,tapply,R,Rank,Tapply,我有一个dataframedt,如下所示 kmeans sd1 sd2 score gene B4GALNT1 1 1.138399 0.9302788 0.59238585 B4GALNT1 GATA2 1 1.31817 0.9869005 0.70160114 GATA2 KBTBD8 1 0.2799195 0.25295

我有一个
dataframe
dt,如下所示

            kmeans  sd1         sd2         score       gene
B4GALNT1    1       1.138399    0.9302788   0.59238585  B4GALNT1
GATA2       1       1.31817     0.9869005   0.70160114  GATA2
KBTBD8      1       0.2799195   0.25295     2.56658313  KBTBD8
LYPD6       1       0.5885738   0.5277333   1.1797581   LYPD6
MSX1        1       0.2846179   0.5276349   1.31276755  MSX1
NAP1L2      1       0.5778767   0.5252137   1.29646305  NAP1L2
PLA2G4C     1       1.545634    0.3505845   1.02694161  PLA2G4C
SLC6A15     1       3.6862153   1.7656347   0.31940624  SLC6A15
SNORA9      1       49.5847239  23.059789   0.01679016  SNORA9
STX1A       1       4.753248    2.3649298   0.17053974  STX1A
TRNP1       1       54.1230886  19.7797807  0.01907904  TRNP1
AKAP6       2       2.7115279   0.1346139   1.12646609  AKAP6
C1QL3       2       3.1646016   0.3646613   0.78840387  C1QL3
CAMK2N1     2       48.4399203  3.628805    0.05655038  CAMK2N1
CDK5R1      2       3.3858407   0.2249831   0.6292364   CDK5R1
CLSTN2      2       1.0131585   0.162797    1.96050927  CLSTN2
CNTN1       2       3.7191809   0.253088    0.83650197  CNTN1
DGKG        2       0.4607949   0.2333855   1.70445926  DGKG
DPF1        2       1.6369965   0.1873143   1.07265653  DPF1
FAM131A     2       8.7092498   1.763698    0.11250896  FAM131A
我打算通过对
kmeans
列中带组的行进行排序,并根据列
score
提取每个
kmeans
组内的排名,以下面的顺序生成下表。所以它应该如下所示

            kmeans  sd1         sd2         score       gene
B4GALNT1    1       1.138399    0.9302788   0.59238585  B4GALNT1
GATA2       1       1.31817     0.9869005   0.70160114  GATA2
KBTBD8      1       0.2799195   0.25295     2.56658313  KBTBD8
LYPD6       1       0.5885738   0.5277333   1.1797581   LYPD6
MSX1        1       0.2846179   0.5276349   1.31276755  MSX1
NAP1L2      1       0.5778767   0.5252137   1.29646305  NAP1L2
PLA2G4C     1       1.545634    0.3505845   1.02694161  PLA2G4C
SLC6A15     1       3.6862153   1.7656347   0.31940624  SLC6A15
SNORA9      1       49.5847239  23.059789   0.01679016  SNORA9
STX1A       1       4.753248    2.3649298   0.17053974  STX1A
TRNP1       1       54.1230886  19.7797807  0.01907904  TRNP1
AKAP6       2       2.7115279   0.1346139   1.12646609  AKAP6
C1QL3       2       3.1646016   0.3646613   0.78840387  C1QL3
CAMK2N1     2       48.4399203  3.628805    0.05655038  CAMK2N1
CDK5R1      2       3.3858407   0.2249831   0.6292364   CDK5R1
CLSTN2      2       1.0131585   0.162797    1.96050927  CLSTN2
CNTN1       2       3.7191809   0.253088    0.83650197  CNTN1
DGKG        2       0.4607949   0.2333855   1.70445926  DGKG
DPF1        2       1.6369965   0.1873143   1.07265653  DPF1
FAM131A     2       8.7092498   1.763698    0.11250896  FAM131A
期望输出:

            kmeans  sd1         sd2         score       gene        rank
B4GALNT1    1       1.138399    0.9302788   0.59238585  B4GALNT1    7
GATA2       1       1.31817     0.9869005   0.70160114  GATA2       6
KBTBD8      1       0.2799195   0.25295     2.56658313  KBTBD8      1
LYPD6       1       0.5885738   0.5277333   1.1797581   LYPD6       4
MSX1        1       0.2846179   0.5276349   1.31276755  MSX1        2
NAP1L2      1       0.5778767   0.5252137   1.29646305  NAP1L2      3
PLA2G4C     1       1.545634    0.3505845   1.02694161  PLA2G4C     5
SLC6A15     1       3.6862153   1.7656347   0.31940624  SLC6A15     8 
SNORA9      1       49.5847239  23.059789   0.01679016  SNORA9      11
STX1A       1       4.753248    2.3649298   0.17053974  STX1A       9
TRNP1       1       54.1230886  19.7797807  0.01907904  TRNP1       10
AKAP6       2       2.7115279   0.1346139   1.12646609  AKAP6       3
C1QL3       2       3.1646016   0.3646613   0.78840387  C1QL3       6
CAMK2N1     2       48.4399203  3.628805    0.05655038  CAMK2N1     9
CDK5R1      2       3.3858407   0.2249831   0.6292364   CDK5R1      7
CLSTN2      2       1.0131585   0.162797    1.96050927  CLSTN2      1
CNTN1       2       3.7191809   0.253088    0.83650197  CNTN1       5
DGKG        2       0.4607949   0.2333855   1.70445926  DGKG        2
DPF1        2       1.6369965   0.1873143   1.07265653  DPF1        4
FAM131A     2       8.7092498   1.763698    0.11250896  FAM131A     8
但这不是我在应用下面代码时得到的结果

dt$rank <- unlist(with(dt, tapply(score, kmeans, function(x) rank(x,ties.method= "first"))))

dt$rank我们可以用
ave
而不是
tapply
来实现这一点。
ave
的优点是,它在获取输出时将保持行的原始顺序

dt$rank <- with(dt, ave(-score, kmeans, FUN = function(x) rank(x, ties.method = "first")))
dt$rank
#[1]  7  6  1  4  2  3  5  8 11  9 10  3  6  9  7  1  5  2  4  8
数据
dt我认为在您的预期输出中有些排名不正确。例如,kmeans 2的排名“9”和“8”,您可以在哪里突出显示?分数列很好,我提到的排名可能不正确,但想法是在每个kemeans组内按分数列进行排名,其中排名1应为kemans列的最高分数。我指的是
CAMK2N1 8
FAM131A 9
排名。如果是另一种方式,是的,对不起,我会重新排序。谢谢@akrun,非常抱歉,我们的预期结果一团糟。但我想我能传达我的信息。我编辑。我自己对它们进行排名只是为了表明我的意图。ave没有按照我所寻求的顺序进行排名。理想情况下,它应该根据得分列为kmeans=1的得分最高的列,为kmeans=1生成排名,其他kemans=2,3,。。。但是ave一个不是这样的。@vchris__ngs我不知道为什么它是不同的,因为我奇怪地得到了预期的输出,即使我不理解为什么它不应该。事实上,我的数据框更大,列更多,并且有行名,这正是列
gene
,但这不应该破坏这种安排。对吗?我可以对列进行子集划分并尝试查看。让我检查一下我的是3.3.1,不确定发生了什么。我正在挖掘,但谢谢你的投入。在这种情况下,您的代码是正确的。我会接受的。我用上校的名字重新命名。在我的实际
df
中,我有一个
colnames
obj.down.4$kmeans$cluster
,这是应该进行groupby的k
meamns
列。我将其更改为
kmeans
,并执行了操作,它工作得非常完美。不确定这是否破坏了它。可能这就是问题所在,但不确定。
dt <- structure(list(kmeans = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), sd1 = c(1.138399, 
1.31817, 0.2799195, 0.5885738, 0.2846179, 0.5778767, 1.545634, 
3.6862153, 49.5847239, 4.753248, 54.1230886, 2.7115279, 3.1646016, 
48.4399203, 3.3858407, 1.0131585, 3.7191809, 0.4607949, 1.6369965, 
8.7092498), sd2 = c(0.9302788, 0.9869005, 0.25295, 0.5277333, 
0.5276349, 0.5252137, 0.3505845, 1.7656347, 23.059789, 2.3649298, 
19.7797807, 0.1346139, 0.3646613, 3.628805, 0.2249831, 0.162797, 
0.253088, 0.2333855, 0.1873143, 1.763698), score = c(0.59238585, 
0.70160114, 2.56658313, 1.1797581, 1.31276755, 1.29646305, 1.02694161, 
0.31940624, 0.01679016, 0.17053974, 0.01907904, 1.12646609, 0.78840387, 
0.05655038, 0.6292364, 1.96050927, 0.83650197, 1.70445926, 1.07265653, 
0.11250896), gene = c("B4GALNT1", "GATA2", "KBTBD8", "LYPD6", 
"MSX1", "NAP1L2", "PLA2G4C", "SLC6A15", "SNORA9", "STX1A", "TRNP1", 
"AKAP6", "C1QL3", "CAMK2N1", "CDK5R1", "CLSTN2", "CNTN1", "DGKG", 
"DPF1", "FAM131A")), .Names = c("kmeans", "sd1", "sd2", "score", 
"gene"), class = "data.frame", row.names = c("B4GALNT1", "GATA2", 
"KBTBD8", "LYPD6", "MSX1", "NAP1L2", "PLA2G4C", "SLC6A15", "SNORA9", 
"STX1A", "TRNP1", "AKAP6", "C1QL3", "CAMK2N1", "CDK5R1", "CLSTN2", 
"CNTN1", "DGKG", "DPF1", "FAM131A"))