R数据帧+;逻辑运算
我有以下咨询。 假设我有两个列,一个是ID,一个是value 我需要的是创建一个具有以下逻辑的附加列(value_ok): 对于具有相同字母的每个ID,具有较高值的ID的值不应大于较低值ID的值。如果发生这种情况,则应将其替换为相同的值R数据帧+;逻辑运算,r,dataframe,R,Dataframe,我有以下咨询。 假设我有两个列,一个是ID,一个是value 我需要的是创建一个具有以下逻辑的附加列(value_ok): 对于具有相同字母的每个ID,具有较高值的ID的值不应大于较低值ID的值。如果发生这种情况,则应将其替换为相同的值 db<-data.frame(id=c("A_1","A_2","A_3","A_4","B_1","B_2","B_3","B_4","C_1","C_2","C_3","C_4","D_1","D_2","D_3","D_4","E_1","E_4")
db<-data.frame(id=c("A_1","A_2","A_3","A_4","B_1","B_2","B_3","B_4","C_1","C_2","C_3","C_4","D_1","D_2","D_3","D_4","E_1","E_4"),
value=c(10,9,8,7,7,8,9,5,15,30,14,20,10,10,10,20,30,40),
value_ok=c(10,9,8,7,9,9,9,5,30,30,20,20,20,20,20,20,40,40))
有人能帮我完成这项任务吗
谢谢 你可以
library(data.table)
setDT(db)
db[.N:1, v := cummax(value), by=sub("^(.+)_(.+)$", "\\1", id)]
id value value_ok v
1: A_1 10 10 10
2: A_2 9 9 9
3: A_3 8 8 8
4: A_4 7 7 7
5: B_1 7 9 9
6: B_2 8 9 9
7: B_3 9 9 9
8: B_4 5 5 5
9: C_1 15 30 30
10: C_2 30 30 30
11: C_3 14 20 20
12: C_4 20 20 20
13: D_1 10 20 20
14: D_2 10 20 20
15: D_3 10 20 20
16: D_4 20 20 20
17: E_1 30 40 40
18: E_4 40 40 40
.N:1
临时将表格从最后一个排序到第一个<代码>按=分组行v:=cummax(值)
创建一个新列,其中包含每组中的累积最大值
by=
之后的非常难看的表达式是由于在字符串中嵌入了重要信息(字母)。我建议永远不要那样做。如果您想转换为更好的版本,请参考以下内容:
library(data.table)
setDT(db)
db[.N:1, v := cummax(value), by=sub("^(.+)_(.+)$", "\\1", id)]
id value value_ok v
1: A_1 10 10 10
2: A_2 9 9 9
3: A_3 8 8 8
4: A_4 7 7 7
5: B_1 7 9 9
6: B_2 8 9 9
7: B_3 9 9 9
8: B_4 5 5 5
9: C_1 15 30 30
10: C_2 30 30 30
11: C_3 14 20 20
12: C_4 20 20 20
13: D_1 10 20 20
14: D_2 10 20 20
15: D_3 10 20 20
16: D_4 20 20 20
17: E_1 30 40 40
18: E_4 40 40 40