我可以使用不同数量的预测变量对测试数据进行predict.glmnet吗？_R_Glmnet

我可以使用不同数量的预测变量对测试数据进行predict.glmnet吗？

我可以使用不同数量的预测变量对测试数据进行predict.glmnet吗？,r,glmnet,R,Glmnet,我使用glmnet在训练集上建立了一个预测模型，该训练集包含约200个预测因子和100个样本，用于二项回归/分类问题我选择了给我最大AUC的最佳模型（16个预测值）。我有一个独立的测试集，其中只有那些变量（16个预测因子），这些变量从训练集中进入最终模型是否有任何方法可以使用predict.glmnet（基于训练集中的最佳模型）和新的测试集，该测试集只包含那些从训练集中进入最终模型的变量的数据？glmnet要求训练数据集中的变量数量/名称与验证/测试集中的变量数量/名称完全相同。例如： li

我使用glmnet在训练集上建立了一个预测模型，该训练集包含约200个预测因子和100个样本，用于二项回归/分类问题

我选择了给我最大AUC的最佳模型（16个预测值）。我有一个独立的测试集，其中只有那些变量（16个预测因子），这些变量从训练集中进入最终模型

是否有任何方法可以使用predict.glmnet（基于训练集中的最佳模型）和新的测试集，该测试集只包含那些从训练集中进入最终模型的变量的数据？

glmnet

要求训练数据集中的变量数量/名称与验证/测试集中的变量数量/名称完全相同。例如：

library(caret)
library(glmnet)
df <- ... # a dataframe with 200 variables, some of which you want to predict on 
      #  & some of which you don't care about.
      # Variable 13 ('Response.Variable') is the dependent variable.
      # Variables 1-12 & 14-113 are the predictor variables
      # All training/testing & validation datasets are derived from this single df.

# Split dataframe into training & testing sets
inTrain <- createDataPartition(df$Response.Variable, p = .75, list = FALSE)
Train <- df[ inTrain, ] # Training dataset for all model development
Test <- df[ -inTrain, ] # Final sample for model validation

# Run logistic regression , using only specified predictor variables 
logCV <- cv.glmnet(x = data.matrix(Train[, c(1:12,14:113)]), y = Train[,13],
family = 'binomial', type.measure = 'auc')

# Test model over final test set, using specified predictor variables
# Create field in dataset that contains predicted values
Test$prob <- predict(logCV,type="response", newx = data.matrix(Test[,   
                     c(1:12,14:113) ]), s = 'lambda.min')

库（插入符号）
图书馆（glmnet）
df您不应使用惩罚程序进行分割样本测试，尤其是在使用小尺寸样本时glmnet
应提供所有数据。然后，可以给出未来的案例newx
，只要newx具有与原始数据相同的结构，则拟合的模型即可，这是否意味着它必须拥有与训练数据中完全相同数量的预测变量，或者它只能拥有使其进入最终模型的变量？我希望您只能使用最终模型中有系数的X变量。当我尝试这样做时，我得到的错误是：测试_预测只是为了进一步澄清：我的newx只有来自最终模型的非零系数的数据。在这种情况下，如何对新数据调用glmnet中的predict函数？感谢您的所有hrlp。
new.df <- ... # new df w/ 1,000 variables, which include all predictor variables used 
              # in developing the model

# Create object with requisite predictor variable names that we specified in the model
predictvars <- c('PredictorVar1', 'PredictorVar2', 'PredictorVar3', 
                  ... 'PredictorVarK')
new.df$prob <- predict(logCV,type="response", newx = data.matrix(new.df[names(new.df)
                        %in% predictvars ]), s = 'lambda.min')
                       # the above method limits the new df of 1,000 variables to                                                     
                       # whatever the requisite variable names or indices go into the 
                       # model.