控制caret R包中交叉验证的采样

控制caret R包中交叉验证的采样,r,sampling,cross-validation,r-caret,R,Sampling,Cross Validation,R Caret,我有以下问题。在N个受试者的数据集中,我每个受试者有几个样本。我想在数据集上训练一个模型,但我想确保在每次重采样中,在训练集中没有受试者的复制 或者,我会按主题阻止交叉验证。可能吗 如果没有caret包,我会做类似的事情(模拟代码) 受试者不直接,但您可以使用索引和indexOut参数对trainControl进行操作。以下是使用10倍CV的示例: library(caret) library(nlme) data(Orthodont) head(Orthodont) subjects <

我有以下问题。在N个受试者的数据集中,我每个受试者有几个样本。我想在数据集上训练一个模型,但我想确保在每次重采样中,在训练集中没有受试者的复制

或者,我会按主题阻止交叉验证。可能吗

如果没有caret包,我会做类似的事情(模拟代码)


受试者不直接,但您可以使用
索引和
indexOut
参数对
trainControl
进行操作。以下是使用10倍CV的示例:

library(caret)
library(nlme)

data(Orthodont)
head(Orthodont)
subjects <- as.character(unique(Orthodont$Subject))

## figure out folds at the subject level

set.seed(134)
sub_folds <- createFolds(y = subjects, list = TRUE, returnTrain = TRUE)

## now create the mappings to which *rows* are in the training set
## based on which subjects are left in or out

in_train <- holdout <- vector(mode = "list", length = length(sub_folds))

row_index <- 1:nrow(Orthodont)

for(i in seq(along = sub_folds)) {
  ## Which subjects are in fold i
  sub_in <- subjects[sub_folds[[i]]]
  ## which rows of the data correspond to those subjects
  in_train[[i]] <- row_index[Orthodont$Subject %in% sub_in]
  holdout[[i]]  <- row_index[!(Orthodont$Subject %in% sub_in)]  
}

names(in_train) <- names(holdout) <- names(sub_folds)

ctrl <- trainControl(method = "cv",
                     savePredictions = TRUE,
                     index = in_train,
                     indexOut = holdout)

mod <- train(distance ~ (age+Sex)^2, data = Orthodont,
             method = "lm", 
             trControl = ctrl)

first_fold <- subset(mod$pred, Resample == "Fold01")

## These were used to fit the model
table(Orthodont$Subject[-first_fold$rowIndex])
## These were heldout:
table(Orthodont$Subject[first_fold$rowIndex])
库(插入符号)
图书馆(nlme)
数据(正畸)
主管(正畸)
学科
looSubjCV <- function(x, samples, subjects) {
   for(i in 1:length(subjects)) {
     test <- x[ samples == subjects[i], ]
     train <- x[ samples != subjects[i], ]
     # create the model from train and predict for test
  }
}
library(caret)
library(nlme)

data(Orthodont)
head(Orthodont)
subjects <- as.character(unique(Orthodont$Subject))

## figure out folds at the subject level

set.seed(134)
sub_folds <- createFolds(y = subjects, list = TRUE, returnTrain = TRUE)

## now create the mappings to which *rows* are in the training set
## based on which subjects are left in or out

in_train <- holdout <- vector(mode = "list", length = length(sub_folds))

row_index <- 1:nrow(Orthodont)

for(i in seq(along = sub_folds)) {
  ## Which subjects are in fold i
  sub_in <- subjects[sub_folds[[i]]]
  ## which rows of the data correspond to those subjects
  in_train[[i]] <- row_index[Orthodont$Subject %in% sub_in]
  holdout[[i]]  <- row_index[!(Orthodont$Subject %in% sub_in)]  
}

names(in_train) <- names(holdout) <- names(sub_folds)

ctrl <- trainControl(method = "cv",
                     savePredictions = TRUE,
                     index = in_train,
                     indexOut = holdout)

mod <- train(distance ~ (age+Sex)^2, data = Orthodont,
             method = "lm", 
             trControl = ctrl)

first_fold <- subset(mod$pred, Resample == "Fold01")

## These were used to fit the model
table(Orthodont$Subject[-first_fold$rowIndex])
## These were heldout:
table(Orthodont$Subject[first_fold$rowIndex])