R 是否有更有效的方法来检查变量中每个分支的病例数?
对于100000行,它需要5秒。太多了。您知道如何改进此代码以加快检查速度吗R 是否有更有效的方法来检查变量中每个分支的病例数?,r,R,对于100000行,它需要5秒。太多了。您知道如何改进此代码以加快检查速度吗 #a variable to check xVar <- as.factor(sample(x=c("transp","bud","wolny","pref", "inny"), size=100000000, replace=T)) #a trigger, if a variable has less then 1000 numbe
#a variable to check
xVar <- as.factor(sample(x=c("transp","bud","wolny","pref",
"inny"), size=100000000, replace=T))
#a trigger, if a variable has less then 1000 number of cases in any branch then the variable warn will fill with a comunicat
sen <- 10000
#function to improve
check <- function(xVar, sen){
if (min(table(xVar)) < sen){
warn <- "Variable has very low number in some branches - IV can be spoiled"
}else{
warn <- ""
}
}
#go
start <- Sys.time()
check(xVar, sen)
stop <- Sys.time()
stop - start
#要检查的变量
xVar我们可以使用制表
来提高速度
check <- function(xVar, sen){
if (min(tabulate(xVar)) < sen){
warn <- "Variable has very low number in some branches - IV can be spoiled"
}else{
warn <- ""
}
}
start <- Sys.time()
check(xVar, sen)
stop <- Sys.time()
stop - start
#Time difference of 0.272254 secs
数据
set.seed(24)
一些示例数据会非常好。您可以使用表格(xVar)
提高速度,我现在就做一个可重复的示例。
stop - start
#Time difference of 5.077512 secs
set.seed(24)
xVar <- as.factor(sample(x=c("transp","bud","wolny","pref",
"inny"), size=100000000, replace=T))
sen <- 10000