R 了解xgboost交叉验证和AUC输出结果
我有以下XGBoost C.V.型号R 了解xgboost交叉验证和AUC输出结果,r,cross-validation,xgboost,R,Cross Validation,Xgboost,我有以下XGBoost C.V.型号 xgboostModelCV <- xgb.cv(data = dtrain, nrounds = 20, nfold = 3, metrics = "auc", verbose = TRUE,
xgboostModelCV <- xgb.cv(data = dtrain,
nrounds = 20,
nfold = 3,
metrics = "auc",
verbose = TRUE,
"eval_metric" = "auc",
"objective" = "binary:logistic",
"max.depth" = 6,
"eta" = 0.01,
"subsample" = 0.5,
"colsample_bytree" = 1,
print_every_n = 1,
"min_child_weight" = 1,
booster = "gbtree",
early_stopping_rounds = 10,
watchlist = watchlist,
seed = 1234)
但是,我设置了nrounds=20
但交叉验证nfolds
=3,那么我应该输出60个结果,而不是20个
或者上述输出是否如列名所示,每轮AUC的平均分数
因此,对于训练集,在nround=1
时,train\u auc\u mean
是结果0.8852290
,这将是3个交叉验证nFold
的平均值
所以如果我绘制这些AUC分数,那么我将绘制3倍交叉验证的平均AUC分数
只是想确保一切都清楚。输出是折叠
auc
的平均值,这是正确的。但是,如果您希望提取最佳/最后一次迭代的单个折叠auc,您可以按照以下步骤进行:
使用mlbench
library(xgboost)
library(tidyverse)
library(mlbench)
data(Sonar)
xgb.train.data <- xgb.DMatrix(as.matrix(Sonar[,1:60]), label = as.numeric(Sonar$Class)-1)
param <- list(objective = "binary:logistic")
现在检查折叠并将预测与真实标签和相应索引连接起来:
z <- lapply(model.cv$folds, function(x){
pred <- model.cv$pred[x]
true <- (as.numeric(Sonar$Class)-1)[x]
index <- x
out <- data.frame(pred, true, index)
out
})
当然,您可以做更多的事情,比如为每个折叠绘制auc曲线等等
model.cv <- xgb.cv(param = param,
data = xgb.train.data,
nrounds = 50,
early_stopping_rounds = 10,
nfold = 3,
prediction = TRUE,
eval_metric = "auc")
z <- lapply(model.cv$folds, function(x){
pred <- model.cv$pred[x]
true <- (as.numeric(Sonar$Class)-1)[x]
index <- x
out <- data.frame(pred, true, index)
out
})
names(z) <- paste("folds", 1:3, sep = "_")
z %>%
bind_rows(.id = "id") %>%
group_by(id) %>%
summarise(auroc = roc(true, pred) %>%
auc())
#output
# A tibble: 3 x 2
id auroc
<chr> <dbl>
1 folds_1 0.944
2 folds_2 0.900
3 folds_3 0.899
z %>%
bind_rows(.id = "id") %>%
group_by(id) %>%
summarise(auroc = roc(true, pred) %>%
auc()) %>%
pull(auroc) %>%
mean
#output
[1] 0.9143798
model.cv$evaluation_log[model.cv$best_iteration,]
#output
iter train_auc_mean train_auc_std test_auc_mean test_auc_std
1: 48 1 0 0.91438 0.02092817