R-将行索引添加到数据帧，但以最小秩处理关联_R_Data Manipulation

R-将行索引添加到数据帧，但以最小秩处理关联

R-将行索引添加到数据帧，但以最小秩处理关联,r,data-manipulation,R,Data Manipulation,我成功地在这条线索中使用了答案但我需要处理两（或更多）行可以绑定的情况 df <- data.frame( season = c(2014,2014,2014,2014,2014,2014, 2014, 2014), week = c(1,1,1,1,2,2,2,2), player.name = c("Matt Ryan","Peyton Manning","Cam Newton","Matthew Stafford","Carson Palmer","Andrew Luck",

我成功地在这条线索中使用了答案但我需要处理两（或更多）行可以绑定的情况

df <- data.frame(
season = c(2014,2014,2014,2014,2014,2014, 2014, 2014), 
week = c(1,1,1,1,2,2,2,2), 
player.name = c("Matt Ryan","Peyton Manning","Cam Newton","Matthew Stafford","Carson Palmer","Andrew Luck", "Aaron Rodgers", "Chad Henne"), 
fant.pts.passing = c(28,19,29,28,18,22,29,22)
)

df <- df[order(-df$season, df$week, -df$fant.pts.passing),]

df$Index <- ave( 1:nrow(df), df$season, df$week, FUN=function(x) 1:length(x) )

df

df假设您希望按季节和周排名，这可以通过dplyr
的minu-rank
轻松实现：
library(dplyr)

df %>% group_by(season, week) %>%
  mutate(indx = min_rank(desc(fant.pts.passing)))

#   season week      player.name fant.pts.passing Index indx
# 1   2014    1       Cam Newton               29     1    1
# 2   2014    1        Matt Ryan               28     2    2
# 3   2014    1 Matthew Stafford               28     3    2
# 4   2014    1   Peyton Manning               19     4    4
# 5   2014    2    Aaron Rodgers               29     1    1
# 6   2014    2      Andrew Luck               22     2    2
# 7   2014    2       Chad Henne               22     3    2
# 8   2014    2    Carson Palmer               18     4    4

您可能希望在ave
调用中使用带有ties.method=“min”
的rank
函数：
df$Index <- ave(-df$fant.pts.passing, df$season, df$week,
                FUN=function(x) rank(x, ties.method="min"))
df
#   season week      player.name fant.pts.passing Index
# 3   2014    1       Cam Newton               29     1
# 1   2014    1        Matt Ryan               28     2
# 4   2014    1 Matthew Stafford               28     2
# 2   2014    1   Peyton Manning               19     4
# 7   2014    2    Aaron Rodgers               29     1
# 6   2014    2      Andrew Luck               22     2
# 8   2014    2       Chad Henne               22     2
# 5   2014    2    Carson Palmer               18     4

df$Index您可以使用数据表中更快的frank
，并通过引用分配（：=
）列
library(data.table)#v1.9.5+
setDT(df)[, indx := frank(-fant.pts.passing, ties.method='min'), .(season, week)]
 #   season week      player.name fant.pts.passing indx
 #1:   2014    1       Cam Newton               29    1
 #2:   2014    1        Matt Ryan               28    2
 #3:   2014    1 Matthew Stafford               28    2
 #4:   2014    1   Peyton Manning               19    4
 #5:   2014    2    Aaron Rodgers               29    1
 #6:   2014    2      Andrew Luck               22    2
 #7:   2014    2       Chad Henne               22    2
 #8:   2014    2    Carson Palmer               18    4

谢谢我还没有深入研究dplr（我知道），但它在我的任务清单上。谢谢@akrun，但是数据集都<50k，所以速度不是问题。但很高兴知道未来的项目。