R 如何在两个特定值之间重新编码一系列行_R

R 如何在两个特定值之间重新编码一系列行

R 如何在两个特定值之间重新编码一系列行,r,R,我有以下数据框： a <- seq(1:14) b <- c(0, 0, "start", 0, 0, 0, "end", 0, 0, "start", 0, "end", 0, 0) df <- data.frame(a, b) df a b 1 0 2 0 3 start 4 0 5 0 6 0 7 end 8 0 9 0 10 start 11 0 12 end 1

我有以下数据框：

a <- seq(1:14)
b <- c(0, 0, "start", 0, 0, 0, "end", 0, 0, "start", 0, "end", 0, 0)
df <- data.frame(a, b)

 df
a      b
1      0
2      0
3   start
4      0
5      0
6      0
7    end
8      0
9      0
10  start
11     0
12   end
13     0
14     0

到目前为止，我还没有任何工作代码。我从

data.table

包中使用

which（）

和

between（）

和

inrange（）

尝试了一些东西，但我无法真正理解它。有什么办法解决这个问题吗？

已经给出

df <- data.frame(a, b, stringsAsFactors = FALSE)
#                      ^^^^^^^^^^^^^^^^^^^^^^^^

我们只需要将位置设置为零，其中

b==“start”

，即

cumsum(b == "start") - cumsum(b == "end") - b == "start"
# [1] 0 0 0 1 1 1 0 0 0 0 1 0 0 0

测试此向量是否为

，使其符合逻辑

idx <- (cumsum(b == "start") - cumsum(b == "end") - (b == "start")) == 1

我们使用此逻辑向量将

的各个元素替换为

“1”

来自@RonakShah评论的更简洁的答案是

df$b[unlist(mapply(`:`, which(df$b == "start") + 1, which(df$b == "end") - 1))] <- 1

旧循环答案

您可以使用以下函数，首先定义所有的起点和终点，然后循环并将其更改为1

a <- seq(1:14)
b <- c(0, 0, "start", 0, 0, 0, "end", 0, 0, "start", 0, "end", 0, 0)

starting <- which(b == "start")
ending <- which(b == "end")

for (i in 1:length(starting)){
  index <- (starting[i]+1):(ending[i]-1)
  b[index] <- 1
}
df <- data.frame(a, b)
df

如果您感兴趣，这是一种获取输出的好方法。
idx <- (cumsum(b == "start") - cumsum(b == "end") - (b == "start")) == 1

idx
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

df$b[unlist(mapply(`:`, which(df$b == "start") + 1, which(df$b == "end") - 1))] <- 1

starting <- which(b == "start")
ending <- which(b == "end")
my.ls <- lapply(Map(c, starting, ending), function(x) (x[1]+1):(x[2]-1))

index <- unlist(my.ls)
b[index] <- 1


df <- data.frame(a, b)
df
a     b
1   1     0
2   2     0
3   3 start
4   4     1
5   5     1
6   6     1
7   7   end
8   8     0
9   9     0
10 10 start
11 11     1
12 12   end
13 13     0
14 14     0

a <- seq(1:14)
b <- c(0, 0, "start", 0, 0, 0, "end", 0, 0, "start", 0, "end", 0, 0)

starting <- which(b == "start")
ending <- which(b == "end")

for (i in 1:length(starting)){
  index <- (starting[i]+1):(ending[i]-1)
  b[index] <- 1
}
df <- data.frame(a, b)
df