R语言中的一类分类。生成混淆矩阵时我做错了什么?

R语言中的一类分类。生成混淆矩阵时我做错了什么?,r,machine-learning,svm,supervised-learning,one-class-classification,R,Machine Learning,Svm,Supervised Learning,One Class Classification,我试图理解和实现分类器R中的一个类基于几个UCI和其中一个() 当试图打印混淆矩阵时,您给出的错误是“所有参数必须具有相同的长度” 我做错了什么 库(插入符号) 图书馆(dplyr) 图书馆(e1071) 图书馆(NLP) 图书馆(tm) ds=read.csv('rend_disease.csv', 页眉=真) #清洗剂colunas inutiliz?veis ds我看到了一些问题。首先,您的许多数据似乎是类字符,而不是分类器所需的数字。让我们选取一些列并转换为数值。我将使用data.tab

我试图理解和实现分类器R中的一个类基于几个UCI和其中一个()

当试图打印混淆矩阵时,您给出的错误是“所有参数必须具有相同的长度”

我做错了什么

库(插入符号)
图书馆(dplyr)
图书馆(e1071)
图书馆(NLP)
图书馆(tm)
ds=read.csv('rend_disease.csv',
页眉=真)
#清洗剂colunas inutiliz?veis

ds我看到了一些问题。首先,您的许多数据似乎是类字符,而不是分类器所需的数字。让我们选取一些列并转换为数值。我将使用
data.table
,因为
fread
非常方便

library(caret)
library(e1071)
library(data.table)
setDT(ds)
#Choose columns
mycols <- c("id","bp","sg","al","su")
#Convert to numeric
ds[,(mycols) := lapply(.SD, as.numeric),.SDcols = mycols]

#Convert classification to logical
data <- ds[,.(bp,sg,al,su,classification = ds$classification == "ckd")]
data
     bp    sg al su classification
  1: 80 1.020  1  0           TRUE
  2: 50 1.020  4  0           TRUE
  3: 80 1.010  2  3           TRUE
  4: 70 1.005  4  0           TRUE
  5: 80 1.010  2  0           TRUE
 ---                              
396: 80 1.020  0  0          FALSE
397: 70 1.025  0  0          FALSE
398: 80 1.020  0  0          FALSE
399: 60 1.025  0  0          FALSE
400: 80 1.025  0  0          FALSE
然后我们可以创建模型并进行预测

svm.model<-svm(classification ~ bp + sg + al + su, data = train,
               type='one-classification',
               nu=0.10,
               scale=TRUE,
               kernel="radial")

#Perform predictions 
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)
数据

library(archive)
library(data.table)
tf1 <- tempfile(fileext = ".rar")
#Download data file
download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00336/Chronic_Kidney_Disease.rar", tf1)
tf2 <- tempfile()
#Un-rar file
archive_extract(tf1, tf2)
#Read in data
ds <- fread(paste0(tf2,"/Chronic_Kidney_Disease/chronic_kidney_disease.arff"), fill = TRUE, skip = "48")
#Remove erroneous last column
ds[,V26:= NULL]
#Set column names (from header)
setnames(ds,c("id","bp","sg","al","su","rbc","pc","pcc","ba","bgr","bu","sc","sod","pot","hemo","pcv","wc","rc","htn","dm","cad","appet","pe","ane","classification"))
#Replace "?" with NA
ds[ds == "?"] <- NA
库(存档)
库(数据表)

你好,史泰龙,欢迎来到堆栈溢出。我无法复制您的代码,因为1)链接没有提供
.csv
文件,而是提供
.arff
文件;2)您没有提供有关如何将
.arff
转换为
.csv
的详细信息。如果您提供一个可复制的数据样本,将更容易提供帮助。有关更多信息,请参阅。此外,请提供工作代码。以上充其量是从调用
read.csv
时的一个语法错误开始的,但并不总是清楚代码中的哪些问题是由于您的复制/粘贴错误造成的,或者因为这是您尚未找到或询问的代码的另一个问题。上面显示R代码的一个好处是,所有红色代码都被视为字符串的一部分。(错误出现在字符串开始之前,但它仍然清楚地表明存在问题。)旁注,该错误通常意味着您正在向数据集输入不同数量的成员,您需要查看两个数据集中的观察数,如果它们不一样,那么矩阵就无法将它们进行1:1的比较……我猜是在某个地方,你在做测试集,你把东西翻了一倍……你好@StaLLoNe_CoBRa,你有GitHub吗?你能创建一个回购协议并将数据集保存在那里吗?我可以帮你,但是很难访问你提到的数据集。嗨@IanCampbell谢谢你的反馈!我发布了我正在使用的数据集的标题!嘿@Ian Campbell,首先感谢你的帮助!但我试图运行您的建议,但产生了一个错误:在
[.data.frame
(ds,
:=
((mycols),lapply(.SD,as.numeric)),:未使用的参数(.SDcols=mycols)Try
setDT(ds)
转换为
数据.table
如果尚未从导入中转换。是的!我得到了!谢谢!但我不明白为什么现在不能运行createDataPartition?
#转换为数值setDT(ds)[,(mycols):=lapply(.SD,as.numeric),.SDcols=mycols]#将分类转换为逻辑数据空变量会影响这一点吗?您是否忘记了
库(插入符号)
#Sample data for training and test set
inTrain<-createDataPartition(1:nrow(data),p=0.6,list=FALSE)
train<- data[inTrain,]
test <- data[-inTrain,]
svm.model<-svm(classification ~ bp + sg + al + su, data = train,
               type='one-classification',
               nu=0.10,
               scale=TRUE,
               kernel="radial")

#Perform predictions 
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)
confTrain <- table(Predicted=svm.predtrain,
                   Reference=train$classification[as.integer(names(svm.predtrain))])
confTest <- table(Predicted=svm.predtest,
                  Reference=test$classification[as.integer(names(svm.predtest))])

confusionMatrix(confTest,positive='TRUE')

Confusion Matrix and Statistics

         Reference
Predicted FALSE TRUE
    FALSE     0   17
    TRUE     55   64

               Accuracy : 0.4706         
                 95% CI : (0.3845, 0.558)
    No Information Rate : 0.5956         
    P-Value [Acc > NIR] : 0.9988         

                  Kappa : -0.2361        

 Mcnemar's Test P-Value : 1.298e-05      

            Sensitivity : 0.7901         
            Specificity : 0.0000         
         Pos Pred Value : 0.5378         
         Neg Pred Value : 0.0000         
             Prevalence : 0.5956         
         Detection Rate : 0.4706         
   Detection Prevalence : 0.8750         
      Balanced Accuracy : 0.3951         

       'Positive' Class : TRUE           
library(archive)
library(data.table)
tf1 <- tempfile(fileext = ".rar")
#Download data file
download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00336/Chronic_Kidney_Disease.rar", tf1)
tf2 <- tempfile()
#Un-rar file
archive_extract(tf1, tf2)
#Read in data
ds <- fread(paste0(tf2,"/Chronic_Kidney_Disease/chronic_kidney_disease.arff"), fill = TRUE, skip = "48")
#Remove erroneous last column
ds[,V26:= NULL]
#Set column names (from header)
setnames(ds,c("id","bp","sg","al","su","rbc","pc","pcc","ba","bgr","bu","sc","sod","pot","hemo","pcv","wc","rc","htn","dm","cad","appet","pe","ane","classification"))
#Replace "?" with NA
ds[ds == "?"] <- NA