r中的随机森林模型
我们是否可以通过微调列车数据的超参数来创建多个随机森林模型,并根据所有模型检查测试数据性能,并将其存储在csv文件中 例如:-我有一个模型,r中的随机森林模型,r,machine-learning,random-forest,R,Machine Learning,Random Forest,我们是否可以通过微调列车数据的超参数来创建多个随机森林模型,并根据所有模型检查测试数据性能,并将其存储在csv文件中 例如:-我有一个模型,mtry为6,nodesize为3,mtry为10,nodesize为4。我需要做的是在测试数据上测试这两个模型的性能,并存储关键模型度量,如混淆矩阵、敏感性和特异性 我尝试了以下代码 train_performance <- data.frame('TN'=0,'FP'=0,'FN'=0,'TP'=0,'accuracy'=0,'kappa'=0,'
mtry
为6,nodesize
为3,mtry
为10,nodesize
为4。我需要做的是在测试数据上测试这两个模型的性能,并存储关键模型度量,如混淆矩阵、敏感性和特异性
我尝试了以下代码
train_performance <- data.frame('TN'=0,'FP'=0,'FN'=0,'TP'=0,'accuracy'=0,'kappa'=0,'sensitivity'=0,'specificity'=0)
modellist <- list()
for (mtry in c(6,11)){
for (nodesize in c(2,3)){
fit_model <- randomForest(dv~., train_final,mtry = mtry, importance=TRUE, nodesize=nodesize,
sampsize = ceiling(.8*nrow(train_final)), proximity=TRUE,na.action = na.omit,
ntree=500)
Key_col <- paste0(mtry,"-",nodesize)
modellist[[Key_col]] <- fit_model
pred_train <- predict(fit_model, train_final)
cf <- confusionMatrix(pred_train, train_final$DV, mode = 'everything', positive = '1')
train_performance$TN <- cf$table[1]
train_performance$FP <- cf$table[2]
train_performance$FN <- cf$table[3]
train_performance$TP <- cf$table[4]
train_performance$accuracy=cf$overall[1]
train_performance$kappa=cf$overall[2]
train_performance$sensitivity=cf$byClass[1]
train_performance$specificity=cf$byClass[2]
train_performance$key=Key_col
}
}
train_performance以下是使用caret
包的示例方法,介绍如何调整和训练随机森林模型,该模型输出所有模型的精度参数:
library(randomForest)
library(mlbench)
library(caret)
# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]
使用插入符号进行调谐:
随机搜索:
我们可以使用的一种搜索策略是在一定范围内尝试随机值
# Random Search
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")
set.seed(seed)
mtry <- sqrt(ncol(x))
rf_random <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15, trControl=control)
print(rf_random)
plot(rf_random)
网格搜索:
另一个搜索是定义要尝试的算法参数网格
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(seed)
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)
plot(rf_gridsearch)
还有许多其他方法可以调整随机森林模型并存储这些模型的结果,以上两种是使用最广泛的方法
此外,您还可以手动设置这些参数,并训练和调整模型。我只能获得mtry为11、节点大小为3的模型的最终结果。但并非所有的模型我都能存储结果。请帮助我。我可以通过添加空列表、trainmetrics来执行代码
# Random Search
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")
set.seed(seed)
mtry <- sqrt(ncol(x))
rf_random <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15, trControl=control)
print(rf_random)
plot(rf_random)
Resampling results across tuning parameters:
mtry Accuracy Kappa Accuracy SD Kappa SD
11 0.8218470 0.6365181 0.09124610 0.1906693
14 0.8140620 0.6215867 0.08475785 0.1750848
17 0.8030231 0.5990734 0.09595988 0.1986971
24 0.8042929 0.6002362 0.09847815 0.2053314
30 0.7933333 0.5798250 0.09110171 0.1879681
34 0.8015873 0.5970248 0.07931664 0.1621170
45 0.7932612 0.5796828 0.09195386 0.1887363
47 0.7903896 0.5738230 0.10325010 0.2123314
49 0.7867532 0.5673879 0.09256912 0.1899197
50 0.7775397 0.5483207 0.10118502 0.2063198
60 0.7790476 0.5513705 0.09810647 0.2005012
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(seed)
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)
plot(rf_gridsearch)
Resampling results across tuning parameters:
mtry Accuracy Kappa Accuracy SD Kappa SD
1 0.8377273 0.6688712 0.07154794 0.1507990
2 0.8378932 0.6693593 0.07185686 0.1513988
3 0.8314502 0.6564856 0.08191277 0.1700197
4 0.8249567 0.6435956 0.07653933 0.1590840
5 0.8268470 0.6472114 0.06787878 0.1418983
6 0.8298701 0.6537667 0.07968069 0.1654484
7 0.8282035 0.6493708 0.07492042 0.1584772
8 0.8232828 0.6396484 0.07468091 0.1571185
9 0.8268398 0.6476575 0.07355522 0.1529670
10 0.8204906 0.6346991 0.08499469 0.1756645
11 0.8073304 0.6071477 0.09882638 0.2055589
12 0.8184488 0.6299098 0.09038264 0.1884499
13 0.8093795 0.6119327 0.08788302 0.1821910
14 0.8186797 0.6304113 0.08178957 0.1715189
15 0.8168615 0.6265481 0.10074984 0.2091663