CPU在R while循环中接近100%_R - Fatal编程技术网

CPU在R while循环中接近100%

CPU在R while循环中接近100%,r,R,我正在尝试在oracle中使用IN子句动态创建查询。问题是oracle不允许IN子句中的项超过1000个，所以我使用了多个由OR分隔的IN子句 # expectedOutput has the values in column 1 which I have interest in uniqueCol <- df[4, 2] teststring <- "" teststring <- paste(uniqueCol, " in (", sep = "") i <-

我正在尝试在oracle中使用IN子句动态创建查询。问题是oracle不允许IN子句中的项超过1000个，所以我使用了多个由OR分隔的IN子句

# expectedOutput has the values in column 1 which I have interest in 

uniqueCol <- df[4, 2]

teststring <- ""
teststring <- paste(uniqueCol, " in (", sep = "")
i <- 1
while (i < nrow(expectedOutput)) {
   if (i %% 1000 == 0) {
      teststring <- substr(teststring, 1, nchar(teststring) - 1)
       teststring<- paste(teststring, ") OR ", uniqueCol, " in (", sep="")
   }
   teststring <- paste(teststring, "'", expectedOutput[i, 1], "',", sep="")
   print(i)
   i <- i + 1
}

#expectedOutput具有我感兴趣的第1列中的值
uniqueCol问题的主要原因是，在每一轮中，将不断增长的字符串teststring
与新的子字符串连接在一起。结果是运行时间呈指数增长。通过在每个步骤中使用整个data.frame，以及print（i）
行，这种效果会恶化。修复这些问题将大大加快代码的速度
在我的机器上，10000行的expectedOutput
运行以下初始代码需要6-7秒，50000行需要2分钟，100000行需要6-7分钟，200000行需要24-25分钟。对于50000行，这个时间可以降低到3-4秒，对于200000行，这个时间可以降低到11-12秒，然后再执行以下步骤
首先，我模拟了一些测试数据：
expectedOutput <- cbind(
  db_col_name = c(paste0("A-", c(paste0("0", c(paste0("0", c(paste0("0", 0:9), 10:99)),
                                               100:999)), 1000:9999)),
                  paste0("B-", c(paste0("0", c(paste0("0", c(paste0("0", 0:9), 10:99)),
                                               100:999)), 1000:9999)),
                  paste0("C-", c(paste0("0", c(paste0("0", c(paste0("0", 0:9), 10:99)),
                                               100:999)), 1000:9999)),
                  paste0("D-", c(paste0("0", c(paste0("0", c(paste0("0", 0:9), 10:99)),
                                               100:999)), 1000:9999)),
                  paste0("E-", c(paste0("0", c(paste0("0", c(paste0("0", 0:9), 10:99)),
                                               100:999)), 1000:9999))),
  data.frame(replicate(100, sample(0:1000, 50000, rep=TRUE))))
uniqueCol <- "COLUMN_NAME"

最后，将expectedOutput
的第一列放入一个单独的向量，进一步将50000行的运行时间减少到3-4秒，200000行的运行时间减少到11-12秒：
teststring <- character(0)
teststr_tmp <- paste(uniqueCol," in (", sep = "")
i <- 1
while(i < nrow(expectedOutput)){
  if(i %% 1000 == 0){
    teststr_tmp <- substr(teststr_tmp, 1, nchar(teststr_tmp) - 1)
    teststr_tmp <- paste(teststr_tmp, ") OR ", uniqueCol, " in (", sep="")
    teststring <- c(teststring, teststr_tmp)
    teststr_tmp <- paste(uniqueCol," in (", sep = "")
  }
  teststr_tmp <- paste(teststr_tmp, "'", expectedOutput[i, 1], "',", sep="")
  i <- i + 1
}
teststring <- paste(teststring, collapse)

teststring <- character(0)
teststr_tmp <- paste(uniqueCol," in (", sep = "")
i <- 1
expectedOutputValues <- expectedOutput[[1]]
while(i < length(expectedOutputValues)){
  if(i %% 1000 == 0){
    teststr_tmp <- substr(teststr_tmp, 1, nchar(teststr_tmp) - 1)
    teststr_tmp <- paste(teststr_tmp, ") OR ", uniqueCol, " in (", sep="")
    teststring <- c(teststring, teststr_tmp)
    teststr_tmp <- paste(uniqueCol," in (", sep = "")
  }
  teststr_tmp <- paste(teststr_tmp, "'", expectedOutputValues[i], "',", sep="")
  i <- i + 1
}
teststring <- paste(teststring, collapse="")

teststring什么是uniqueCol
？您能否粘贴前几行的预期输出
作为示例？（最好是在代码中。）什么是nrow（预期输出）
（至少大约）？第一步：从代码中删除print（i）
。印刷是缓慢的。如果代码仍然太慢，那么在没有循环的情况下重写它。我可以删除print（I）它只是为了检查进度，nrow（expectedOutput）大约是200k，值为'A-0001'，'A-0002….'，在比较之前，我应该在变量中包含它，否则每次都会计算它。主要问题是循环的使用。这是可以避免的，还是有一种更快的迭代方法？在Oracle中创建一个临时表，然后使用“in”和“or”连接表不是更有效吗？@zx8754是的，我知道，但没有这样的权限。在传递到SQL之前，您必须修复teststring的结尾，但我认为它不属于这里。同样，这可能写得更好，但我想或多或少地坚持你的解决方案。只是一些提示：如果我想使用循环，这里的for
loop对我来说更具可读性。我不会使用循环，而是使用*apply函数等。非常感谢oszkar，我会尝试一下，如果总迭代次数<500，那么会发生什么呢。我会查一查。
teststring <- character(0)
teststr_tmp <- paste(uniqueCol," in (", sep = "")
i <- 1
expectedOutputValues <- expectedOutput[[1]]
while(i < length(expectedOutputValues)){
  if(i %% 1000 == 0){
    teststr_tmp <- substr(teststr_tmp, 1, nchar(teststr_tmp) - 1)
    teststr_tmp <- paste(teststr_tmp, ") OR ", uniqueCol, " in (", sep="")
    teststring <- c(teststring, teststr_tmp)
    teststr_tmp <- paste(uniqueCol," in (", sep = "")
  }
  teststr_tmp <- paste(teststr_tmp, "'", expectedOutputValues[i], "',", sep="")
  i <- i + 1
}
teststring <- paste(teststring, collapse="")