为什么在数据帧中重新排列变量后,regsubset(R,leaps包)中的向后选择会产生无意义的结果?
我试图使用MASS包中的Boston数据和R中leaps包中的regsubsets()函数进行向前和向后选择,并比较每种尺寸的模型。我观察到,在重新排列数据列并重新运行算法后,在向后选择中,每个大小的模型中包含哪些变量时,会得到不同的结果 首先,这很好:为什么在数据帧中重新排列变量后,regsubset(R,leaps包)中的向后选择会产生无意义的结果?,r,linear-regression,feature-selection,R,Linear Regression,Feature Selection,我试图使用MASS包中的Boston数据和R中leaps包中的regsubsets()函数进行向前和向后选择,并比较每种尺寸的模型。我观察到,在重新排列数据列并重新运行算法后,在向后选择中,每个大小的模型中包含哪些变量时,会得到不同的结果 首先,这很好: library(leaps) library(MASS) # Boston data: split into training and test set.seed(12345) test_inds <- sample(1:nrow(Bos
library(leaps)
library(MASS)
# Boston data: split into training and test
set.seed(12345)
test_inds <- sample(1:nrow(Boston), size=nrow(Boston)/2, replace=FALSE)
bos_train <- Boston[-test_inds,]
# outcome is per capita crime rate = crim
# forward stepwise regression
bos_fwd <- regsubsets(crim ~., data=bos_train,
method="forward", nvmax=ncol(bos_train))
# backward stepwise regression
bos_back <- regsubsets(crim ~., data=bos_train,
method="backward", nvmax=ncol(bos_train))
# plot using the method from regsubsets
par(mfrow=c(1,2))
plot(bos_fwd, main="Forward selection", scale="r2")
plot(bos_back, main="Backward selection", scale="r2")
这些模型序列应该是相同的,但当包含3个变量时,它们会有所不同。发生什么事?提前感谢您提供的任何见解
# for plotting to go nicer, re-order variables in forward-selected order
# and re-fit the models
bos_order_fwd <- names(bos_train)[bos_fwd$vorder][-1]
bos_train2 <- bos_train[,c("crim",bos_order_fwd)]
bos_fwd2 <- regsubsets(crim ~., data=bos_train2,
method="forward", nvmax=ncol(bos_train2))
bos_back2 <- regsubsets(crim ~., data=bos_train2,
method="backward", nvmax=ncol(bos_train2))
# plot using the method from regsubsets
par(mfrow=c(1,2))
plot(bos_fwd2, main="Forward selection", scale="r2")
plot(bos_back2, main="Backward selection", scale="r2")
# observe selection orders in backward selection inconsistent with algo
> # compare the backwards stepwise models of each size
> coef(bos_back, 2)
(Intercept) rad medv
2.5800690 0.6055753 -0.1947727
> coef(bos_back2, 2)
(Intercept) rad medv
2.5800690 0.6055753 -0.1947727
> coef(bos_back, 3)
(Intercept) dis rad medv
4.5777240 -0.4322162 0.5569806 -0.1925573
> coef(bos_back2, 3)
(Intercept) rad medv black
8.12946740 0.54926726 -0.17165228 -0.01534486
> coef(bos_back, 4)
(Intercept) zn dis rad medv
7.01168924 0.07265221 -1.02796588 0.53498337 -0.22889528
> coef(bos_back2, 4)
(Intercept) rad medv dis zn
7.01168924 0.53498337 -0.22889528 -1.02796588 0.07265221
> coef(bos_back, 5)
(Intercept) zn dis rad black medv
11.50765180 0.06724239 -0.93907507 0.49208150 -0.01350222 -0.20607369
> coef(bos_back2, 5)
(Intercept) rad medv black dis zn
11.50765180 0.49208150 -0.20607369 -0.01350222 -0.93907507 0.06724239