R数据帧+;逻辑运算

R数据帧+;逻辑运算,r,dataframe,R,Dataframe,我有以下咨询。 假设我有两个列,一个是ID,一个是value 我需要的是创建一个具有以下逻辑的附加列(value_ok): 对于具有相同字母的每个ID,具有较高值的ID的值不应大于较低值ID的值。如果发生这种情况,则应将其替换为相同的值 db<-data.frame(id=c("A_1","A_2","A_3","A_4","B_1","B_2","B_3","B_4","C_1","C_2","C_3","C_4","D_1","D_2","D_3","D_4","E_1","E_4")

我有以下咨询。 假设我有两个列,一个是ID,一个是value

我需要的是创建一个具有以下逻辑的附加列(value_ok): 对于具有相同字母的每个ID,具有较高值的ID的值不应大于较低值ID的值。如果发生这种情况,则应将其替换为相同的值

db<-data.frame(id=c("A_1","A_2","A_3","A_4","B_1","B_2","B_3","B_4","C_1","C_2","C_3","C_4","D_1","D_2","D_3","D_4","E_1","E_4"),
            value=c(10,9,8,7,7,8,9,5,15,30,14,20,10,10,10,20,30,40),
         value_ok=c(10,9,8,7,9,9,9,5,30,30,20,20,20,20,20,20,40,40))
有人能帮我完成这项任务吗

谢谢

你可以

library(data.table)
setDT(db)

db[.N:1, v := cummax(value), by=sub("^(.+)_(.+)$", "\\1", id)]

     id value value_ok  v
 1: A_1    10       10 10
 2: A_2     9        9  9
 3: A_3     8        8  8
 4: A_4     7        7  7
 5: B_1     7        9  9
 6: B_2     8        9  9
 7: B_3     9        9  9
 8: B_4     5        5  5
 9: C_1    15       30 30
10: C_2    30       30 30
11: C_3    14       20 20
12: C_4    20       20 20
13: D_1    10       20 20
14: D_2    10       20 20
15: D_3    10       20 20
16: D_4    20       20 20
17: E_1    30       40 40
18: E_4    40       40 40
.N:1
临时将表格从最后一个排序到第一个<代码>按=分组行
v:=cummax(值)
创建一个新列,其中包含每组中的累积最大值

by=
之后的非常难看的表达式是由于在字符串中嵌入了重要信息(字母)。我建议永远不要那样做。如果您想转换为更好的版本,请参考以下内容:

library(data.table)
setDT(db)

db[.N:1, v := cummax(value), by=sub("^(.+)_(.+)$", "\\1", id)]

     id value value_ok  v
 1: A_1    10       10 10
 2: A_2     9        9  9
 3: A_3     8        8  8
 4: A_4     7        7  7
 5: B_1     7        9  9
 6: B_2     8        9  9
 7: B_3     9        9  9
 8: B_4     5        5  5
 9: C_1    15       30 30
10: C_2    30       30 30
11: C_3    14       20 20
12: C_4    20       20 20
13: D_1    10       20 20
14: D_2    10       20 20
15: D_3    10       20 20
16: D_4    20       20 20
17: E_1    30       40 40
18: E_4    40       40 40