Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/64.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:VIF的定制功能_R_Regression_Correlation_R Car - Fatal编程技术网

R:VIF的定制功能

R:VIF的定制功能,r,regression,correlation,r-car,R,Regression,Correlation,R Car,我试图写一个循环来计算方差膨胀系数。我知道有一些功能和软件包可以为我做到这一点,但我需要一些定制 样本数据 library(MASS) library(clusterGeneration) set.seed(2) num.vars <- 30 num.obs<-200 cov.mat<- genPositiveDefMat(num.vars,covMethod="unifcorrmat")$Sigma rand.vars<- mvrnorm(

我试图写一个循环来计算方差膨胀系数。我知道有一些功能和软件包可以为我做到这一点,但我需要一些定制

样本数据

  library(MASS)
  library(clusterGeneration)

  set.seed(2)
  num.vars <- 30
  num.obs<-200
  cov.mat<- genPositiveDefMat(num.vars,covMethod="unifcorrmat")$Sigma
  rand.vars<- mvrnorm(num.obs,rep(0,num.vars),Sigma=cov.mat)

  cov.mat <- as.data.frame(cov.mat)
  names(cov.mat) <- rep(paste0("X",1:30))
假设上面的循环工作正常,我有一个矩阵,第一列是变量名,第二列是VIF值

     df <- data.frame(mat)
     names(df) <- c("variable", "vif")
     df <- df[sort(df$vif),]

     ifelse(df[1,2] <= 10, stop, ifelse(df[1,2] > 10 & names(df[1,1]) != "X4" | names(df[1,1]) != "X6" | names(df[1,1]) != "X10", ....
df是否为10,以及
无论它是否在X4、X6或X10之间,如果它满足条件,请将其从
cov.mat
中删除,然后再次开始迭代

编辑

我的原始数据帧有51列和1458行。当我运行上述函数时,它会给我一个错误
,模型中存在混叠系数

为什么会这样

在您的示例数据中,无法计算整个数据集的或VIF分数,很可能是因为完全共线性。但是,这里的函数应该适用于不适用的数据(例如,数据集的1:15列)。您可以忽略/删除所有
cat
代码。这只是为了说明发生了什么

此外,我使用软件包
car
来实现功能
vif

library(vif)

vif_fun <- function(df, keep_in) {
             # df: the dataset of interest
             # keep_in: the variables that should be kept in  
             highest <- c()
             while(TRUE) {
                # the rnorm() below is arbitrary as the VIF should not 
                # depend on it
                vifs <- vif(lm(rnorm(nrow(df)) ~. , data = df))
                adj_vifs <- vifs[-which(names(vifs) %in% keep_in)]
                if (max(adj_vifs) < 10) {
                     break
                }
               cat("\n")
               print(vifs)
               highest <- c(highest,names((which(adj_vifs == max(adj_vifs)))))
               cat("\n")
               cat("removed:", highest)
               cat("\n")
               df <- df[,-which(names(df) %in% highest)]

              }
            cat("\n")
            cat("final variables: \n")
            return(names(vifs))
              }

# example with mtcars dataset
vif_fun(mtcars,keep_in = c("cyl"))


# example using part of your data
vif_fun(cov.mat[,1:15], keep_in = c("X15", "X12"))
库(vif)

vif_fun在你的例子中有一些奇怪的东西。您可能想要计算
rand.vars
的波动率,而在您的示例中,计算是在
cov.mat
上完成的,谢谢。这适用于示例数据。但是,当我对数据运行此操作时,它会生成一个错误
模型中存在混叠系数
这通常意味着某些变量之间存在完美的共线,这意味着至少有一个变量可以表示为其他变量的线性组合。如果是这种情况,则无法计算VIF。您是否计划使用rand.vars而不是cov.mat进行分析?
rand.vars
只是虚构的数据。我的实际数据不同,但我现在理解你的观点。非常感谢。
library(vif)

vif_fun <- function(df, keep_in) {
             # df: the dataset of interest
             # keep_in: the variables that should be kept in  
             highest <- c()
             while(TRUE) {
                # the rnorm() below is arbitrary as the VIF should not 
                # depend on it
                vifs <- vif(lm(rnorm(nrow(df)) ~. , data = df))
                adj_vifs <- vifs[-which(names(vifs) %in% keep_in)]
                if (max(adj_vifs) < 10) {
                     break
                }
               cat("\n")
               print(vifs)
               highest <- c(highest,names((which(adj_vifs == max(adj_vifs)))))
               cat("\n")
               cat("removed:", highest)
               cat("\n")
               df <- df[,-which(names(df) %in% highest)]

              }
            cat("\n")
            cat("final variables: \n")
            return(names(vifs))
              }

# example with mtcars dataset
vif_fun(mtcars,keep_in = c("cyl"))


# example using part of your data
vif_fun(cov.mat[,1:15], keep_in = c("X15", "X12"))