R 套索特征选择

R 套索特征选择,r,feature-selection,lasso-regression,R,Feature Selection,Lasso Regression,我有一个小数据集(37个观测值x 23个特征),我想用套索回归进行特征选择,以降低其维数。为了实现这一点,我根据在线教程设计了以下代码 #Load the libraries library(mlbench) library(elasticnet) library(caret) #Initialize cross validation and train LASSO cv_5 <- trainControl(method="cv", number=5) lasso &

我有一个小数据集(37个观测值x 23个特征),我想用套索回归进行特征选择,以降低其维数。为了实现这一点,我根据在线教程设计了以下代码

#Load the libraries
library(mlbench)
library(elasticnet)
library(caret)

#Initialize cross validation and train LASSO
cv_5 <- trainControl(method="cv", number=5)
lasso <- train( ColumnY ~., data=My_Data_Frame, method='lasso',  trControl=cv_5)

#Filter out the variables whose coefficients have squeezed to 0
drop <-predict.enet(lasso$finalModel, type='coefficients', s=lasso$bestTune$fraction, mode='fraction')$coefficients  
drop<-drop[drop==0]%>%names()
My_Data_Frame<- My_Data_Frame%>%select(-drop) 
#加载库
图书馆(mlbench)
图书馆(elasticnet)
图书馆(插入符号)
#初始化交叉验证并训练套索

cv_5您的观察次数较少,因此在某些训练集中,您的某些列很有可能全部为零,或者方差很低。例如:

library(caret)
set.seed(222)
df = data.frame(ColumnY = rnorm(37),matrix(rbinom(37*23,1,p=0.15),ncol=23))

cv_5 <- trainControl(method="cv", number=5)
lasso <- train( ColumnY ~., data=df, method='lasso',  trControl=cv_5)

Warning messages:
1: model fit failed for Fold4: fraction=0.9 Error in elasticnet::enet(as.matrix(x), y, lambda = 0, ...) : 
  Some of the columns of x have zero variance
# Load the library
library(FSinR)

# Choose one of the search methods
searcher <- searchAlgorithm('sequentialForwardSelection')

# Choose one of the filter/wrapper evaluators (You can remove the fitting and resampling params if you want to make it simpler)(These are the parameters of the train and trainControl of caret)
resamplingParams <- list(method = "cv", number = 5)
fittingParams <- list(preProc = c("center", "scale"), metric="Accuracy", tuneGrid = expand.grid(k = c(1:20)))
evaluator <- wrapperEvaluator('knn', resamplingParams, fittingParams)

# You make the feature selection (returns the best features)
results <- featureSelection(My_Data_Frame, 'ColumnY', searcher, evaluator)

库(插入符号)
种子集(222)
df=data.frame(ColumnY=rnorm(37),矩阵(rbinom(37*23,1,p=0.15),ncol=23))

cv_5您可以使用FSinR包执行功能选择。它位于R区,可从起重机进入。它有各种各样的过滤器和包装器方法,可以与搜索方法结合使用。生成包装计算器的接口遵循插入符号接口。例如:

library(caret)
set.seed(222)
df = data.frame(ColumnY = rnorm(37),matrix(rbinom(37*23,1,p=0.15),ncol=23))

cv_5 <- trainControl(method="cv", number=5)
lasso <- train( ColumnY ~., data=df, method='lasso',  trControl=cv_5)

Warning messages:
1: model fit failed for Fold4: fraction=0.9 Error in elasticnet::enet(as.matrix(x), y, lambda = 0, ...) : 
  Some of the columns of x have zero variance
# Load the library
library(FSinR)

# Choose one of the search methods
searcher <- searchAlgorithm('sequentialForwardSelection')

# Choose one of the filter/wrapper evaluators (You can remove the fitting and resampling params if you want to make it simpler)(These are the parameters of the train and trainControl of caret)
resamplingParams <- list(method = "cv", number = 5)
fittingParams <- list(preProc = c("center", "scale"), metric="Accuracy", tuneGrid = expand.grid(k = c(1:20)))
evaluator <- wrapperEvaluator('knn', resamplingParams, fittingParams)

# You make the feature selection (returns the best features)
results <- featureSelection(My_Data_Frame, 'ColumnY', searcher, evaluator)

#加载库
图书馆(FSinR)
#选择其中一种搜索方法
搜索者
# Load the library
library(FSinR)

# Choose one of the search methods
searcher <- searchAlgorithm('sequentialForwardSelection')

# Choose one of the filter/wrapper evaluators (You can remove the fitting and resampling params if you want to make it simpler)(These are the parameters of the train and trainControl of caret)
resamplingParams <- list(method = "cv", number = 5)
fittingParams <- list(preProc = c("center", "scale"), metric="Accuracy", tuneGrid = expand.grid(k = c(1:20)))
evaluator <- wrapperEvaluator('knn', resamplingParams, fittingParams)

# You make the feature selection (returns the best features)
results <- featureSelection(My_Data_Frame, 'ColumnY', searcher, evaluator)