Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
为什么在数据帧中重新排列变量后,regsubset(R,leaps包)中的向后选择会产生无意义的结果?_R_Linear Regression_Feature Selection - Fatal编程技术网

为什么在数据帧中重新排列变量后,regsubset(R,leaps包)中的向后选择会产生无意义的结果?

为什么在数据帧中重新排列变量后,regsubset(R,leaps包)中的向后选择会产生无意义的结果?,r,linear-regression,feature-selection,R,Linear Regression,Feature Selection,我试图使用MASS包中的Boston数据和R中leaps包中的regsubsets()函数进行向前和向后选择,并比较每种尺寸的模型。我观察到,在重新排列数据列并重新运行算法后,在向后选择中,每个大小的模型中包含哪些变量时,会得到不同的结果 首先,这很好: library(leaps) library(MASS) # Boston data: split into training and test set.seed(12345) test_inds <- sample(1:nrow(Bos

我试图使用MASS包中的Boston数据和R中leaps包中的regsubsets()函数进行向前和向后选择,并比较每种尺寸的模型。我观察到,在重新排列数据列并重新运行算法后,在向后选择中,每个大小的模型中包含哪些变量时,会得到不同的结果

首先,这很好:

library(leaps)
library(MASS)
# Boston data: split into training and test
set.seed(12345)
test_inds <- sample(1:nrow(Boston), size=nrow(Boston)/2, replace=FALSE)
bos_train <- Boston[-test_inds,]

# outcome is per capita crime rate = crim

# forward stepwise regression
bos_fwd <- regsubsets(crim ~., data=bos_train,
                      method="forward", nvmax=ncol(bos_train))

# backward stepwise regression
bos_back <- regsubsets(crim ~., data=bos_train,
                       method="backward", nvmax=ncol(bos_train))

# plot using the method from regsubsets
par(mfrow=c(1,2))
plot(bos_fwd, main="Forward selection", scale="r2")
plot(bos_back, main="Backward selection", scale="r2")
这些模型序列应该是相同的,但当包含3个变量时,它们会有所不同。发生什么事?提前感谢您提供的任何见解

# for plotting to go nicer, re-order variables in forward-selected order
# and re-fit the models
bos_order_fwd <- names(bos_train)[bos_fwd$vorder][-1]
bos_train2 <- bos_train[,c("crim",bos_order_fwd)]
bos_fwd2 <- regsubsets(crim ~., data=bos_train2,
                      method="forward", nvmax=ncol(bos_train2))
bos_back2 <- regsubsets(crim ~., data=bos_train2,
                       method="backward", nvmax=ncol(bos_train2))

# plot using the method from regsubsets
par(mfrow=c(1,2))
plot(bos_fwd2, main="Forward selection", scale="r2")
plot(bos_back2, main="Backward selection", scale="r2")
# observe selection orders in backward selection inconsistent with algo
> # compare the backwards stepwise models of each size
> coef(bos_back, 2)
(Intercept)         rad        medv 
  2.5800690   0.6055753  -0.1947727 
> coef(bos_back2, 2)
(Intercept)         rad        medv 
  2.5800690   0.6055753  -0.1947727 
> coef(bos_back, 3)
(Intercept)         dis         rad        medv 
  4.5777240  -0.4322162   0.5569806  -0.1925573 
> coef(bos_back2, 3)
(Intercept)         rad        medv       black 
 8.12946740  0.54926726 -0.17165228 -0.01534486 
> coef(bos_back, 4)
(Intercept)          zn         dis         rad        medv 
 7.01168924  0.07265221 -1.02796588  0.53498337 -0.22889528 
> coef(bos_back2, 4)
(Intercept)         rad        medv         dis          zn 
 7.01168924  0.53498337 -0.22889528 -1.02796588  0.07265221 
> coef(bos_back, 5)
(Intercept)          zn         dis         rad       black        medv 
11.50765180  0.06724239 -0.93907507  0.49208150 -0.01350222 -0.20607369 
> coef(bos_back2, 5)
(Intercept)         rad        medv       black         dis          zn 
11.50765180  0.49208150 -0.20607369 -0.01350222 -0.93907507  0.06724239