R语言中的一类分类。生成混淆矩阵时我做错了什么?
我试图理解和实现分类器R中的一个类基于几个UCI和其中一个() 当试图打印混淆矩阵时,您给出的错误是“所有参数必须具有相同的长度” 我做错了什么R语言中的一类分类。生成混淆矩阵时我做错了什么?,r,machine-learning,svm,supervised-learning,one-class-classification,R,Machine Learning,Svm,Supervised Learning,One Class Classification,我试图理解和实现分类器R中的一个类基于几个UCI和其中一个() 当试图打印混淆矩阵时,您给出的错误是“所有参数必须具有相同的长度” 我做错了什么 库(插入符号) 图书馆(dplyr) 图书馆(e1071) 图书馆(NLP) 图书馆(tm) ds=read.csv('rend_disease.csv', 页眉=真) #清洗剂colunas inutiliz?veis ds我看到了一些问题。首先,您的许多数据似乎是类字符,而不是分类器所需的数字。让我们选取一些列并转换为数值。我将使用data.tab
库(插入符号)
图书馆(dplyr)
图书馆(e1071)
图书馆(NLP)
图书馆(tm)
ds=read.csv('rend_disease.csv',
页眉=真)
#清洗剂colunas inutiliz?veis
ds我看到了一些问题。首先,您的许多数据似乎是类字符,而不是分类器所需的数字。让我们选取一些列并转换为数值。我将使用data.table
,因为fread
非常方便
library(caret)
library(e1071)
library(data.table)
setDT(ds)
#Choose columns
mycols <- c("id","bp","sg","al","su")
#Convert to numeric
ds[,(mycols) := lapply(.SD, as.numeric),.SDcols = mycols]
#Convert classification to logical
data <- ds[,.(bp,sg,al,su,classification = ds$classification == "ckd")]
data
bp sg al su classification
1: 80 1.020 1 0 TRUE
2: 50 1.020 4 0 TRUE
3: 80 1.010 2 3 TRUE
4: 70 1.005 4 0 TRUE
5: 80 1.010 2 0 TRUE
---
396: 80 1.020 0 0 FALSE
397: 70 1.025 0 0 FALSE
398: 80 1.020 0 0 FALSE
399: 60 1.025 0 0 FALSE
400: 80 1.025 0 0 FALSE
然后我们可以创建模型并进行预测
svm.model<-svm(classification ~ bp + sg + al + su, data = train,
type='one-classification',
nu=0.10,
scale=TRUE,
kernel="radial")
#Perform predictions
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)
数据
library(archive)
library(data.table)
tf1 <- tempfile(fileext = ".rar")
#Download data file
download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00336/Chronic_Kidney_Disease.rar", tf1)
tf2 <- tempfile()
#Un-rar file
archive_extract(tf1, tf2)
#Read in data
ds <- fread(paste0(tf2,"/Chronic_Kidney_Disease/chronic_kidney_disease.arff"), fill = TRUE, skip = "48")
#Remove erroneous last column
ds[,V26:= NULL]
#Set column names (from header)
setnames(ds,c("id","bp","sg","al","su","rbc","pc","pcc","ba","bgr","bu","sc","sod","pot","hemo","pcv","wc","rc","htn","dm","cad","appet","pe","ane","classification"))
#Replace "?" with NA
ds[ds == "?"] <- NA
库(存档)
库(数据表)
你好,史泰龙,欢迎来到堆栈溢出。我无法复制您的代码,因为1)链接没有提供.csv
文件,而是提供.arff
文件;2)您没有提供有关如何将.arff
转换为.csv
的详细信息。如果您提供一个可复制的数据样本,将更容易提供帮助。有关更多信息,请参阅。此外,请提供工作代码。以上充其量是从调用read.csv
时的一个语法错误开始的,但并不总是清楚代码中的哪些问题是由于您的复制/粘贴错误造成的,或者因为这是您尚未找到或询问的代码的另一个问题。上面显示R代码的一个好处是,所有红色代码都被视为字符串的一部分。(错误出现在字符串开始之前,但它仍然清楚地表明存在问题。)旁注,该错误通常意味着您正在向数据集输入不同数量的成员,您需要查看两个数据集中的观察数,如果它们不一样,那么矩阵就无法将它们进行1:1的比较……我猜是在某个地方,你在做测试集,你把东西翻了一倍……你好@StaLLoNe_CoBRa,你有GitHub吗?你能创建一个回购协议并将数据集保存在那里吗?我可以帮你,但是很难访问你提到的数据集。嗨@IanCampbell谢谢你的反馈!我发布了我正在使用的数据集的标题!嘿@Ian Campbell,首先感谢你的帮助!但我试图运行您的建议,但产生了一个错误:在[.data.frame
(ds,:=
((mycols),lapply(.SD,as.numeric)),:未使用的参数(.SDcols=mycols)TrysetDT(ds)
转换为数据.table
如果尚未从导入中转换。是的!我得到了!谢谢!但我不明白为什么现在不能运行createDataPartition?#转换为数值setDT(ds)[,(mycols):=lapply(.SD,as.numeric),.SDcols=mycols]#将分类转换为逻辑数据空变量会影响这一点吗?您是否忘记了库(插入符号)
?
#Sample data for training and test set
inTrain<-createDataPartition(1:nrow(data),p=0.6,list=FALSE)
train<- data[inTrain,]
test <- data[-inTrain,]
svm.model<-svm(classification ~ bp + sg + al + su, data = train,
type='one-classification',
nu=0.10,
scale=TRUE,
kernel="radial")
#Perform predictions
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)
confTrain <- table(Predicted=svm.predtrain,
Reference=train$classification[as.integer(names(svm.predtrain))])
confTest <- table(Predicted=svm.predtest,
Reference=test$classification[as.integer(names(svm.predtest))])
confusionMatrix(confTest,positive='TRUE')
Confusion Matrix and Statistics
Reference
Predicted FALSE TRUE
FALSE 0 17
TRUE 55 64
Accuracy : 0.4706
95% CI : (0.3845, 0.558)
No Information Rate : 0.5956
P-Value [Acc > NIR] : 0.9988
Kappa : -0.2361
Mcnemar's Test P-Value : 1.298e-05
Sensitivity : 0.7901
Specificity : 0.0000
Pos Pred Value : 0.5378
Neg Pred Value : 0.0000
Prevalence : 0.5956
Detection Rate : 0.4706
Detection Prevalence : 0.8750
Balanced Accuracy : 0.3951
'Positive' Class : TRUE
library(archive)
library(data.table)
tf1 <- tempfile(fileext = ".rar")
#Download data file
download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00336/Chronic_Kidney_Disease.rar", tf1)
tf2 <- tempfile()
#Un-rar file
archive_extract(tf1, tf2)
#Read in data
ds <- fread(paste0(tf2,"/Chronic_Kidney_Disease/chronic_kidney_disease.arff"), fill = TRUE, skip = "48")
#Remove erroneous last column
ds[,V26:= NULL]
#Set column names (from header)
setnames(ds,c("id","bp","sg","al","su","rbc","pc","pcc","ba","bgr","bu","sc","sod","pot","hemo","pcv","wc","rc","htn","dm","cad","appet","pe","ane","classification"))
#Replace "?" with NA
ds[ds == "?"] <- NA