R-Caret中随机森林的混淆矩阵_R_Random Forest_R Caret_Confusion Matrix

R-Caret中随机森林的混淆矩阵

R-Caret中随机森林的混淆矩阵,r,random-forest,r-caret,confusion-matrix,R,Random Forest,R Caret,Confusion Matrix,我有二进制YES/NO类响应的数据。使用以下代码运行RF模型。我在获得混淆矩阵结果时遇到问题 dataR <- read_excel("*:/*.xlsx") Train <- createDataPartition(dataR$Class, p=0.7, list=FALSE) training <- dataR[ Train, ] testing <- dataR[ -Train, ] model_rf <- train( Class~

我有二进制YES/NO类响应的数据。使用以下代码运行RF模型。我在获得混淆矩阵结果时遇到问题

 dataR <- read_excel("*:/*.xlsx")
 Train    <- createDataPartition(dataR$Class, p=0.7, list=FALSE)  
 training <- dataR[ Train, ]
 testing  <- dataR[ -Train, ]

model_rf  <- train(  Class~.,  tuneLength=3,  data = training, method = 
"rf",  importance=TRUE,  trControl = trainControl (method = "cv", number = 
5))

到目前为止还不错，但当我运行此代码时：

# Apply threshold of 0.50: p_class
class_log <- ifelse(model_rf[,1] > 0.50, "YES", "NO")

# Create confusion matrix
p <-confusionMatrix(class_log, testing[["Class"]])

##gives the accuracy
p$overall[1]

如果你们能帮我得到混淆矩阵结果，我将不胜感激。

你们可以尝试创建混淆矩阵并检查准确性

 dataR <- read_excel("*:/*.xlsx")
 Train    <- createDataPartition(dataR$Class, p=0.7, list=FALSE)  
 training <- dataR[ Train, ]
 testing  <- dataR[ -Train, ]

model_rf  <- train(  Class~.,  tuneLength=3,  data = training, method = 
"rf",  importance=TRUE,  trControl = trainControl (method = "cv", number = 
5))

m <- table(class_log, testing[["Class"]])
m   #confusion table

#Accuracy
(sum(diag(m)))/nrow(testing)

m代码段class_log 0.50，“是”、“否”）
是执行以下测试的if-else语句：
在model_rf
的第一列中，如果数字大于0.50，则返回“YES”，否则返回“NO”，并将结果保存在objectclass_log
中
因此，代码本质上创建了一个基于数字向量的类标签“YES”和“NO”的字符向量
 您需要将模型应用于测试集
prediction.rf据我所知，您希望获得用于插入符号交叉验证的混淆矩阵
为此，您需要在列车控制
中指定保存预测
。如果设置为“final”
将保存最佳模型的预测。通过指定classProbs=T
还将保存每个类的概率
data(iris)
iris_2 <- iris[iris$Species != "setosa",] #make a two class problem
iris_2$Species <- factor(iris_2$Species) #drop levels

library(caret)
model_rf  <- train(Species~., tuneLength = 3, data = iris_2, method = 
                       "rf", importance = TRUE,
                   trControl = trainControl(method = "cv",
                                            number = 5,
                                            savePredictions = "final",
                                            classProbs = T))

按CV fols排序，按原始数据框排序：
model_rf$pred[order(model_rf$pred$rowIndex),2]

要获得混淆矩阵，请执行以下操作：
confusionMatrix(model_rf$pred[order(model_rf$pred$rowIndex),2], iris_2$Species)
#output
Confusion Matrix and Statistics

            Reference
Prediction   versicolor virginica
  versicolor         46         6
  virginica           4        44

               Accuracy : 0.9            
                 95% CI : (0.8238, 0.951)
    No Information Rate : 0.5            
    P-Value [Acc > NIR] : <2e-16         

                  Kappa : 0.8            
 Mcnemar's Test P-Value : 0.7518         

            Sensitivity : 0.9200         
            Specificity : 0.8800         
         Pos Pred Value : 0.8846         
         Neg Pred Value : 0.9167         
             Prevalence : 0.5000         
         Detection Rate : 0.4600         
   Detection Prevalence : 0.5200         
      Balanced Accuracy : 0.9000         

       'Positive' Class : versicolor 

confusionMatrix（model_rf$pred[order（model_rf$pred$rowIndex），2]，iris_2$Species）
#输出
混淆矩阵与统计
参考文献
弗吉尼亚花色预测
花色46 6
弗吉尼亚州444
准确度：0.9
95%可信区间：（0.8238,0.951）
无信息率：0.5
P值[Acc>NIR]：谢谢，但我在运行class_日志部分时出错。我编辑了我的问题打印model_rf[，1]
到控制台并查看了一下。如果您的问题中包含了a，那么会更容易帮助您。谢谢。class\u日志代码用于二进制Y/N响应类？预测。rf
将具有实际值（注意type=“prob”
）。您还可以执行type=“raw”
立即获取二进制文件，但这不允许您控制阈值。请参见《预测.训练》
谢谢。只有一个问题，在这个代码中，cm pred模型只有在我将train定义为数据集时才起作用。我认为对于pred，我需要定义测试数据集。当我编写test$Class代码时，它会给出以下错误：表中的错误（数据、引用、dnn=dnn，…）：所有参数必须具有相同的长度此代码会导致混淆矩阵，以便在插入符号中交叉验证折叠。由于交叉验证是在列车组上进行的，因此仅适用于列车组。要获得测试集上的混淆矩阵，必须首先预测测试集样本的类别，并通过confusionMatrix函数将其与真实类别进行比较。
model_rf$pred[order(model_rf$pred$rowIndex),2]

confusionMatrix(model_rf$pred[order(model_rf$pred$rowIndex),2], iris_2$Species)
#output
Confusion Matrix and Statistics

            Reference
Prediction   versicolor virginica
  versicolor         46         6
  virginica           4        44

               Accuracy : 0.9            
                 95% CI : (0.8238, 0.951)
    No Information Rate : 0.5            
    P-Value [Acc > NIR] : <2e-16         

                  Kappa : 0.8            
 Mcnemar's Test P-Value : 0.7518         

            Sensitivity : 0.9200         
            Specificity : 0.8800         
         Pos Pred Value : 0.8846         
         Neg Pred Value : 0.9167         
             Prevalence : 0.5000         
         Detection Rate : 0.4600         
   Detection Prevalence : 0.5200         
      Balanced Accuracy : 0.9000         

       'Positive' Class : versicolor 

sapply(1:40/40, function(x){
  versicolor <- model_rf$pred[order(model_rf$pred$rowIndex),4]
  class <- ifelse(versicolor >=x, "versicolor", "virginica")
  mat <- confusionMatrix(class, iris_2$Species)
  kappa <- mat$overall[2]
  res <- data.frame(prob = x, kappa = kappa)
  return(res)
})