Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 是否有更有效的方法来检查变量中每个分支的病例数?_R - Fatal编程技术网

R 是否有更有效的方法来检查变量中每个分支的病例数?

R 是否有更有效的方法来检查变量中每个分支的病例数?,r,R,对于100000行,它需要5秒。太多了。您知道如何改进此代码以加快检查速度吗 #a variable to check xVar <- as.factor(sample(x=c("transp","bud","wolny","pref", "inny"), size=100000000, replace=T)) #a trigger, if a variable has less then 1000 numbe

对于100000行,它需要5秒。太多了。您知道如何改进此代码以加快检查速度吗

 #a variable to check
xVar <- as.factor(sample(x=c("transp","bud","wolny","pref",
                                      "inny"), size=100000000, replace=T))

#a trigger, if a variable has less then 1000 number of cases in any branch then the variable warn will fill with a comunicat
sen <- 10000 

#function to improve
check <- function(xVar, sen){
  if (min(table(xVar)) < sen){
    warn <- "Variable has very low number in some branches - IV can be spoiled"
  }else{
    warn <- ""
  }
}

#go
start <- Sys.time()
check(xVar, sen)
stop <- Sys.time()
stop - start
#要检查的变量

xVar我们可以使用
制表
来提高速度

check <- function(xVar, sen){
  if (min(tabulate(xVar)) < sen){
   warn <- "Variable has very low number in some branches - IV can be spoiled"
  }else{
    warn <- ""
  }
}

start <- Sys.time()
check(xVar, sen)
stop <- Sys.time()
stop - start
#Time difference of 0.272254 secs
数据
set.seed(24)

一些示例数据会非常好。您可以使用
表格(xVar)
提高速度,我现在就做一个可重复的示例。
stop - start
#Time difference of 5.077512 secs
set.seed(24)
xVar <- as.factor(sample(x=c("transp","bud","wolny","pref",
                                  "inny"), size=100000000, replace=T))
sen <- 10000