使用两个if语句加速for循环

使用两个if语句加速for循环,r,for-loop,data.table,R,For Loop,Data.table,我有一个包含15000多行的数据表DT。我有一个正常运行的for循环,但它需要30多秒,是整个代码中最慢的部分。以下是循环的: for (i in 2:nrow(DT)) { if(DT$C1[i] == DT$C1[i+1] & DT$C2[i] != DT$C2[i+1] & DT$C3[i+1] - DT$C3[i] <= 4 & DT$C2[i] == "Short" & DT$C2[i+1] != "Long") DT$C4[i] = 1 els

我有一个包含15000多行的数据表
DT
。我有一个正常运行的
for
循环,但它需要30多秒,是整个代码中最慢的部分。以下是循环的

for (i in 2:nrow(DT)) {
 if(DT$C1[i] == DT$C1[i+1] & DT$C2[i] != DT$C2[i+1] & DT$C3[i+1] - DT$C3[i] <= 4 & DT$C2[i] == "Short" & DT$C2[i+1] != "Long") DT$C4[i] = 1 else 
  if(DT$C1[i] == DT$C1[i-1] & DT$C2[i] != DT$C2[i-1] & DT$C3[i] - DT$C3[i-1] <= 4 & DT$C2[i] == "Short" & DT$C2[i-1] != "Long") DT$C4[i] = 1 else
      0 }
以及所需的输出

C1  C2      C3          C4
1   Short   2010-06-01  0
1   Short   2010-06-05  0
1   Short   2010-06-09  1
1   None    2010-06-13  0
1   None    2010-06-17  0
2   Short   2010-06-02  0
2   Short   2010-06-21  0
2   Other   2010-07-09  0
3   Long    2010-07-13  0
3   Long    2010-07-17  0
3   Long    2010-07-21  0
3   Long    2010-08-01  0
3   Long    2010-08-05  0
3   Long    2010-08-09  0
3   Long    2010-09-03  0
3   Long    2010-09-07  0
4   Short   2010-06-03  0
4   Short   2010-06-07  1
4   Other   2010-06-11  0
4   Short   2010-06-14  1
4   Short   2010-06-17  1
4   None    2010-06-21  0
4   Short   2010-06-24  1
4   None    2010-06-27  0
4   Other   2010-07-01  0
4   Short   2010-07-05  1
4   Short   2010-07-09  0
4   Short   2010-07-13  0
4   Short   2010-07-17  0

谢谢您的帮助。

您可以使用以下内容将其矢量化:

n <- nrow(DT)
DT$C4 <- NA  # Initialize however you want
# Warning -- untested due to no reproducible example...
DT$C4[2:(n-1)] <- as.numeric((DT$C1[2:(n-1)] == DT$C1[3:n] & DT$C2[2:(n-1)] != DT$C2[3:n] & DT$C3[3:n] - DT$C3[2:(n-1)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[3:n] != "Long") |
                             (DT$C1[2:(n-1)] == DT$C1[1:(n-2)] & DT$C2[2:(n-1)] != DT$C2[1:(n-2)] & DT$C3[2:(n-1)] - DT$C3[1:(n-2)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[1:(n-2)] != "Long"))

n请提供此循环将提供所需输出的示例数据集。我认为您可能只需要创建一个新列或索引来表示所讨论列的滞后和超前,然后取消循环,而采用矢量化逻辑操作。有关如何使用
data.frame
对象执行滞后/超前的示例,是否添加了第二条if语句来处理向量的开头和结尾?您可以替换其中一个。添加了示例数据和所需输出。谢谢大家的关注,谢谢!这要快得多。0.02秒,而不是30秒以上。
n <- nrow(DT)
DT$C4 <- NA  # Initialize however you want
# Warning -- untested due to no reproducible example...
DT$C4[2:(n-1)] <- as.numeric((DT$C1[2:(n-1)] == DT$C1[3:n] & DT$C2[2:(n-1)] != DT$C2[3:n] & DT$C3[3:n] - DT$C3[2:(n-1)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[3:n] != "Long") |
                             (DT$C1[2:(n-1)] == DT$C1[1:(n-2)] & DT$C2[2:(n-1)] != DT$C2[1:(n-2)] & DT$C3[2:(n-1)] - DT$C3[1:(n-2)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[1:(n-2)] != "Long"))