R 在具有NAs的组内分配值的等级
我有这样一个数据帧(df),它只是一个sapmle:R 在具有NAs的组内分配值的等级,r,R,我有这样一个数据帧(df),它只是一个sapmle: group value 1 12.1 1 10.3 1 NA 1 11.0 1 13.5 2 11.7 2 NA 2 10.4 2 9.7 即 df<-data.frame(group=c(1,1,1,1,1,2,2,2,2), value=c(12.1, 10.3, NA, 11.0, 13.5, 11.7, NA, 10.4, 9.7)) 也就是说,我想找到
group value
1 12.1
1 10.3
1 NA
1 11.0
1 13.5
2 11.7
2 NA
2 10.4
2 9.7
即
df<-data.frame(group=c(1,1,1,1,1,2,2,2,2), value=c(12.1, 10.3, NA, 11.0, 13.5, 11.7, NA, 10.4, 9.7))
也就是说,我想找到
- 从最小值开始的“值”的等级
- 在“集团”内
我怎样才能用R做到这一点?非常感谢您的帮助。我们可以使用
ave
从base R
创建rank
列(“order1”),按“group”列出“value”。如果我们需要在“值”列中为相应的NA
设置NAs
,可以这样做(df$order[is.NA(..))]
)
您可以使用一次应用于每个组的rank()
函数来获得所需的结果。我的解决方案是编写一个小的helper函数,并在for
循环中调用该函数。我确信还有其他更优雅的方法使用各种R库,但这里有一个仅使用基本R的解决方案
df <- read.table('~/Desktop/stack_overflow28283818.csv', sep = ',', header = T)
#helper function
rankByGroup <- function(df = NULL, grp = 1)
{
rank(df[df$group == grp, 'value'])
}
# Remove NAs
df.na <- df[is.na(df$value),]
df.0 <- df[!is.na(df$value),]
# For loop over groups to list the ranks
for(grp in unique(df.0$group))
{
df.0[df.0$group == grp, 'order'] <- rankByGroup(df.0, grp)
print(grp)
}
# Append NAs
df.na$order <- NA
df.out <- rbind(df.0,df.na)
#re-sort for ordering given in OP (probably not really required)
df.out <- df.out[order(as.numeric(rownames(df.out))),]
谢谢艾尔·阿克伦。这两种代码都工作得很好。L我很感激。
df$order1 <- with(df, ave(value, group, FUN=rank))
df$order1[is.na(df$value)] <- NA
library(data.table)
setDT(df)[, order1:=rank(value)* NA^(is.na(value)), by = group][]
# group value order1
#1: 1 12.1 3
#2: 1 10.3 1
#3: 1 NA NA
#4: 1 11.0 2
#5: 1 13.5 4
#6: 2 11.7 3
#7: 2 NA NA
#8: 2 10.4 2
#9: 2 9.7 1
df <- read.table('~/Desktop/stack_overflow28283818.csv', sep = ',', header = T)
#helper function
rankByGroup <- function(df = NULL, grp = 1)
{
rank(df[df$group == grp, 'value'])
}
# Remove NAs
df.na <- df[is.na(df$value),]
df.0 <- df[!is.na(df$value),]
# For loop over groups to list the ranks
for(grp in unique(df.0$group))
{
df.0[df.0$group == grp, 'order'] <- rankByGroup(df.0, grp)
print(grp)
}
# Append NAs
df.na$order <- NA
df.out <- rbind(df.0,df.na)
#re-sort for ordering given in OP (probably not really required)
df.out <- df.out[order(as.numeric(rownames(df.out))),]
> df.out
group value order
1 1 12.1 3
2 1 10.3 1
3 1 NA NA
4 1 11.0 2
5 1 13.5 4
6 2 11.7 3
7 2 NA NA
8 2 10.4 2
9 2 9.7 1