R 根据不同的值范围和NAs,使用嵌套的ifelse创建序列将给出错误的结果
我有这样一个数据帧:R 根据不同的值范围和NAs,使用嵌套的ifelse创建序列将给出错误的结果,r,if-statement,na,R,If Statement,Na,我有这样一个数据帧: time Value Seq.Count 1 0 0 2 0 0 3 3 0 4 4 0 5 10 0 6 10 0 7 10 0 8 7 0 9 6 0 10
time Value Seq.Count
1 0 0
2 0 0
3 3 0
4 4 0
5 10 0
6 10 0
7 10 0
8 7 0
9 6 0
10 NA 0
11 NA 0
12 NA 0
13 0 0
14 0 0
15 0 0
现在我想让“Seq.Count”列向上计数一次,每次“Value”列中的数字X在以下任意一个之间变化
0==X,0time Value Seq.Count
1 0 0
2 0 0
3 3 1
4 4 1
5 10 2
6 10 2
7 10 2
8 7 3
9 6 3
10 NA 4
11 NA 4
12 NA 4
13 0 5
14 0 5
15 0 5
我写了这段代码:
for (i in 2:nrow(df)) {
df$Seq.Count[i] <- ifelse(df$Value[i] == 10,
ifelse(df$Value[(i-1)] != 10, df$Seq.Count[i-1]+1, df$Seq.Count[i-1]),
ifelse(df$Value[i] == 0,
ifelse(df$Value[(i-1)] != 0, df$Seq.Count[i-1]+1, df$Seq.Count[i-1]),
ifelse(between(df$Value[i], 0.01, 9.99),
ifelse(df$Value[i-1] == 0 | df$Value[i-1] == 10 | is.na(df$Value[i-1]),
df$Seq.Count[i-1]+1,df$Seq.Count[i-1]),
ifelse(is.na(df$Value[i]),
ifelse(!is.na(df$Value[i-1]), df$Seq.Count[i-1]+1, df$Seq.Count[i-1]),
df$Seq.Count[i-1]
)
)
)
)
}
在“Value”列中出现第一个NA后,“Seq.Count”列的所有后续值都将为NA
为什么会这样
根据代码中的这一行:
ifelse(is.na(df$Value[i]),
ifelse(!is.na(df$Value[i-1]), df$Seq.Count[i-1]+1, df$Seq.Count[i-1]), ...
它只需从
序号计数[i-1]
如果
is.na(df$Value[i])
及
!!is.na(df$Value[i-1])
为什么这不起作用
谢谢你的帮助。我想你需要这样的东西,使用ifelse并用前面的值创建一个额外的列进行比较
require(data.table)
test <- data.table(time = 1:15,
Value = c(0,0,3,4,10,10,10,7,6,NA,NA,NA,0,0,0))
# Add a column with the previous value
test[,previous_value := c(NA, test$Value[1: (nrow(test)-1)])]
# Check which group the previous value belongs
test[,group_1 := ifelse(is.na(previous_value),4,
ifelse(previous_value == 0,1,
ifelse(previous_value > 0 & previous_value < 10,2,
ifelse(previous_value == 10, 3, NA))))]
# Check which group current value belongs
test[,group_2 := ifelse(is.na(Value),4,
ifelse(Value == 0,1,
ifelse(Value > 0 & Value < 10,2,
ifelse(Value == 10, 3, NA))))]
# Compare them if they are not equal add 1
test[, Seq.count := cumsum(group_1 != group_2) - 1]
test
time Value previous_value group_1 group_2 Seq.count
1: 1 0 NA 4 1 0
2: 2 0 0 1 1 0
3: 3 3 0 1 2 1
4: 4 4 3 2 2 1
5: 5 10 4 2 3 2
6: 6 10 10 3 3 2
7: 7 10 10 3 3 2
8: 8 7 10 3 2 3
9: 9 6 7 2 2 3
10: 10 NA 6 2 4 4
11: 11 NA NA 4 4 4
12: 12 NA NA 4 4 4
13: 13 0 NA 4 1 5
14: 14 0 0 1 1 5
15: 15 0 0 1 1 5
require(data.table)
测试0和先前的_值<10,2,
ifelse(先前的_值==10,3,NA))]
#检查当前值属于哪个组
测试[,第2组:=ifelse(is.na(值),4,
ifelse(值==0,1,
ifelse(值>0和值<10,2,
ifelse(值==10,3,NA))]
#如果它们不相等,则比较它们加1
测试[,序列计数:=cumsum(组1!=组2)-1]
测试
时间值上一个值组1组2序号计数
1:10 NA 41 0
2: 2 0 0 1 1 0
3: 3 3 0 1 2 1
4: 4 4 3 2 2 1
5: 5 10 4 2 3 2
6: 6 10 10 3 3 2
7: 7 10 10 3 3 2
8: 8 7 10 3 2 3
9: 9 6 7 2 2 3
10:10 NA 6 2 4 4
11:11 NA NA 4
12:12Na4
13:130NA415
14: 14 0 0 1 1 5
15: 15 0 0 1 1 5
这个解决方案怎么样
tmp <- as.numeric(addNA(cut(df$Value,breaks=c(0,1,9,10),include.lowest=T)))-1
Seq.Count <- cumsum(abs(c(0,diff(tmp)))>0)
cbind(df[,-3],Seq.Count)
time Value Seq.Count
1 1 0 0
2 2 0 0
3 3 3 1
4 4 4 1
5 5 10 2
6 6 10 2
7 7 10 2
8 8 7 3
9 9 6 3
10 10 NA 4
11 11 NA 4
12 12 NA 4
13 13 0 5
14 14 0 5
15 15 0 5
tmp详细阐述了客户的答案
据我所知,你有几种价值观,比如:
- x==0,由区间[0,0.9]覆盖
- 1
ifelse
是矢量化的rep(1:length(rle)(ifelse(is.na(df$Value),1,ifelse(df$Value==0,2,ifelse(df$Value==10,3,4)))$length),rle(ifelse(is.na(df$Value),1,ifelse(df$Value==0,2,ifelse(df$Value==10,3,4)))$length)
。
tmp <- as.numeric(addNA(cut(df$Value,breaks=c(0,1,9,10),include.lowest=T)))-1
Seq.Count <- cumsum(abs(c(0,diff(tmp)))>0)
cbind(df[,-3],Seq.Count)
time Value Seq.Count
1 1 0 0
2 2 0 0
3 3 3 1
4 4 4 1
5 5 10 2
6 6 10 2
7 7 10 2
8 8 7 3
9 9 6 3
10 10 NA 4
11 11 NA 4
12 12 NA 4
13 13 0 5
14 14 0 5
15 15 0 5
w <- cut(df$Value,breaks=c(0,0.9,9,10),include.lowest=T)
w1 <- addNA(w)
r <- w1 != lag(w1)
r[1] <- F
df$Seq.Count <- Reduce('+', r, accumulate = T)
(w <- cut(df$Value,breaks=c(0,0.9,9,10),include.lowest=T))
[1] [0,0.9] [0,0.9] (0.9,9] (0.9,9] (9,10] (9,10] (9,10] (0.9,9] (0.9,9] <NA> <NA> <NA> [0,0.9] [0,0.9] [0,0.9]
Levels: [0,0.9] (0.9,9] (9,10]
(w1 <- addNA(w))
[1] [0,0.9] [0,0.9] (0.9,9] (0.9,9] (9,10] (9,10] (9,10] (0.9,9] (0.9,9] <NA> <NA> <NA> [0,0.9] [0,0.9] [0,0.9]
Levels: [0,0.9] (0.9,9] (9,10] <NA>
(r <- w1 != lag(w1))
[1] NA FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
# Change the first element to FALSE
r[1] <- F
r
[1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
(df$Seq.Count <- Reduce('+', r, accumulate = T))
[1] 0 0 1 1 2 2 2 3 3 4 4 4 5 5 5