如何从插入符号网格搜索中选择最佳ntree值?

如何从插入符号网格搜索中选择最佳ntree值?,r,machine-learning,random-forest,r-caret,R,Machine Learning,Random Forest,R Caret,我已手动调整参数,以找到最佳ntree: bestMtry <- 3 control <- trainControl(method = 'repeatedcv', number = 10, repeats = 3, search = 'grid') storeMaxtrees <- l

我已手动调整参数,以找到最佳ntree:

bestMtry <- 3
control <- trainControl(method = 'repeatedcv',
                                number = 10,
                                repeats = 3,
                                search = 'grid')


storeMaxtrees <- list()
tuneGrid <- expand.grid(.mtry = bestMtry)
for (ntree in c(1000, 1500, 2000)) {
  set.seed(291)
  rf.maxtrees <- train(survived ~ .,
                       data = trainingSet,
                       method = "rf",
                       metric = "Accuracy",
                       tuneGrid = tuneGrid,
                       trControl = control,
                       importance = TRUE,
                       nodesize = 14,
                       maxnodes = 24,
                       ntree = ntree)
  key <- toString(ntree)
  storeMaxtrees[[key]] <- rf.maxtrees
}
resultsTree <- resamples(storeMaxtrees)
summary(resultsTree)

从输出中,我可以理解,基于精度和Kappa,2000是ntree的最佳值。我想动态存储ntree(2000)的最佳值。有没有类似于
best\u ntree的方法可以存储summary()调用的结果,例如:

bestMtry <- 3
control <- trainControl(method = 'repeatedcv',number = 5)
data = MASS::Pima.tr                                

storeMaxtrees <- list()
tuneGrid <- expand.grid(.mtry = bestMtry)
for (ntree in c(1000, 1500, 2000)) {
  set.seed(291)
  rf.maxtrees <- train(type ~ .,
                       data = data,
                       method = "rf",
                       metric = "Accuracy",
                       tuneGrid = tuneGrid,
                       trControl = control,
                       importance = TRUE,
                       nodesize = 14,
                       maxnodes = 24,
                       ntree = ntree)
  key <- toString(ntree)
  storeMaxtrees[[key]] <- rf.maxtrees
}
resultsTree <- resamples(storeMaxtrees)

您可以将我示例中的1500转换为数字…

我的插入符号知识已经过时了(很久没有使用它了),但它不是吗
rf.maxtrees$finalModel$ntree
?可能是使用一行代码的最佳解决方案。但是,在运行这条线之后,我得到了错误的值<代码>[1]2000
。它不应该是1000吗?对于
res$models[which.max(res$statistics$accurity[,“Mean”])]
,我得到了空值,知道了吗,为什么吗?嗨@user1896653,很抱歉我尝试了其他东西,它应该是res=summary(resultsTree)。查看更新的答案是的,它正在工作!只是一个困惑,有没有可能一个参数的准确度是最好的,但Kappa不是?我的意思是,我们只考虑“准确性”。是否完全安全?通常,对于平衡数据集,精度和kappa非常相似。当它是不平衡的,例如80:20,你不能使用准确度,因为你预测所有的都是大多数,并且仍然有80%的准确度。Kappa可能更合适。我正在考虑何时准确度可能更好。。。通常更多的是你所需要的。。因此,它又归结为课堂的平衡,例如,如果它是60:40……但你的总体目标是正确预测,那么准确度将满足你的需要
bestMtry <- 3
control <- trainControl(method = 'repeatedcv',number = 5)
data = MASS::Pima.tr                                

storeMaxtrees <- list()
tuneGrid <- expand.grid(.mtry = bestMtry)
for (ntree in c(1000, 1500, 2000)) {
  set.seed(291)
  rf.maxtrees <- train(type ~ .,
                       data = data,
                       method = "rf",
                       metric = "Accuracy",
                       tuneGrid = tuneGrid,
                       trControl = control,
                       importance = TRUE,
                       nodesize = 14,
                       maxnodes = 24,
                       ntree = ntree)
  key <- toString(ntree)
  storeMaxtrees[[key]] <- rf.maxtrees
}
resultsTree <- resamples(storeMaxtrees)
res = summary(resultsTree)
res$models[which.max(res$statistics$Accuracy[,"Mean"])]
[1] "1500"