Matlab 使用交叉验证和F1分数选择SVM参数

Matlab 使用交叉验证和F1分数选择SVM参数,matlab,machine-learning,classification,svm,cross-validation,Matlab,Machine Learning,Classification,Svm,Cross Validation,在SVM中调整C和Sigma时,我需要跟踪F1成绩, 例如,以下代码跟踪精度,我需要将其更改为F1分数,但我无法做到这一点 %# read some training data [labels,data] = libsvmread('./heart_scale'); %# grid of parameters folds = 5; [C,gamma] = meshgrid(-5:2:15, -15:2:3); %# grid search, and cross-validation cv_a

在SVM中调整C和Sigma时,我需要跟踪F1成绩, 例如,以下代码跟踪精度,我需要将其更改为F1分数,但我无法做到这一点

%# read some training data
[labels,data] = libsvmread('./heart_scale');

%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);

%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
    for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, ...
                sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end

%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);

%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...
我看到了以下两个链接

我知道我必须首先在训练数据中找到最佳的C和gamma/sigma参数,然后使用这两个值做一个遗漏交叉验证分类实验, 所以我现在要做的是首先进行一次网格搜索,以优化C&sigma。 请我宁愿使用MATLAB-SVM,而不是LIBSVM。 下面是我的遗漏交叉验证分类代码

... clc
 clear all
close all
a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);  
good=B; 
input=good(:,1:12);
target=good(:,13);
CVO = cvpartition(target,'leaveout',1);  
cp = classperf(target);                      %# init performance tracker
svmModel=[];
for i = 1:CVO.NumTestSets                                %# for each fold
trIdx = CVO.training(i);              
teIdx = CVO.test(i);                   
%# train an SVM model over training instances

svmModel = svmtrain(input(trIdx,:), target(trIdx), ...
       'Autoscale',true, 'Showplot',false, 'Method','ls', ...
      'BoxConstraint',0.1, 'Kernel_Function','rbf', 'RBF_Sigma',0.1);
%# test using test instances
pred = svmclassify(svmModel, input(teIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, teIdx); 
end
%# get accuracy
accuracy=cp.CorrectRate*100
sensitivity=cp.Sensitivity*100
specificity=cp.Specificity*100
PPV=cp.PositivePredictiveValue*100
NPV=cp.NegativePredictiveValue*100
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
recallP = sensitivity;
recallN = specificity;
precisionP = PPV;
precisionN = NPV;
f1P = 2*((precisionP*recallP)/(precisionP + recallP));
f1N = 2*((precisionN*recallN)/(precisionN + recallN));
aF1 = ((f1P+f1N)/2);
我更改了密码 但是我犯了一些错误,我也犯了一些错误

a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);  
good=B; 
inpt=good(:,1:12);
target=good(:,13);
k=10;
cvFolds = crossvalind('Kfold', target, k);   %# get indices of 10-fold CV
cp = classperf(target);                      %# init performance tracker
svmModel=[];
for i = 1:k 
    testIdx = (cvFolds == i);    %# get indices of test    instances
trainIdx = ~testIdx;   
C = 0.1:0.1:1; 
S = 0.1:0.1:1; 
fscores = zeros(numel(C), numel(S)); %// Pre-allocation
for c = 1:numel(C)   
for s = 1:numel(S)
    vals = crossval(@(XTRAIN, YTRAIN, XVAL, YVAL)(fun(XTRAIN, YTRAIN, XVAL, YVAL, C(c), S(c))),inpt(trainIdx,:),target(trainIdx));
    fscores(c,s) = mean(vals);
end
end
 end

[cbest, sbest] = find(fscores == max(fscores(:)));
C_final = C(cbest);
S_final = S(sbest);    

还有功能

.....
function fscore = fun(XTRAIN, YTRAIN, XVAL, YVAL, C, S)
svmModel = svmtrain(XTRAIN, YTRAIN, ...
   'Autoscale',true, 'Showplot',false, 'Method','ls', ...
  'BoxConstraint', C, 'Kernel_Function','rbf', 'RBF_Sigma', S);

   pred = svmclassify(svmModel, XVAL, 'Showplot',false);

   cp = classperf(YVAL, pred)
   %# get accuracy
    accuracy=cp.CorrectRate*100
    sensitivity=cp.Sensitivity*100
    specificity=cp.Specificity*100
    PPV=cp.PositivePredictiveValue*100
    NPV=cp.NegativePredictiveValue*100
    %# get confusion matrix
    %# columns:actual, rows:predicted, last-row: unclassified instances
    cp.CountingMatrix
    recallP = sensitivity;
    recallN = specificity;
    precisionP = PPV;
    precisionN = NPV;
    f1P = 2*((precisionP*recallP)/(precisionP + recallP));
    f1N = 2*((precisionN*recallN)/(precisionN + recallN));
    fscore = ((f1P+f1N)/2);

    end

所以,基本上你想用你的这句话:

svmModel = svmtrain(input(trIdx,:), target(trIdx), ...
       'Autoscale',true, 'Showplot',false, 'Method','ls', ...
      'BoxConstraint',0.1, 'Kernel_Function','rbf', 'RBF_Sigma',0.1);
将其放入一个循环中,该循环改变
'BoxConstraint'
'RBF\u Sigma'
参数,然后使用该循环输出参数迭代组合的f1分数

您可以使用与libsvm代码示例完全相同的单个for循环(即使用
meshgrid
1:numel()
,这可能更快)或嵌套for循环。我将使用嵌套循环,这样您就有了两种方法:

C = [0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100, 300] %// you must choose your own set of values for the parameters that you want to test. You can either do it this way by explicitly typing out a list
S = 0:0.1:1 %// or you can do it this way using the : operator
fscores = zeros(numel(C), numel(S)); %// Pre-allocation
for c = 1:numel(C)   
    for s = 1:numel(S)
        vals = crossval(@(XTRAIN, YTRAIN, XVAL, YVAL)(fun(XTRAIN, YTRAIN, XVAL, YVAL, C(c), S(c)),input(trIdx,:),target(trIdx));
        fscores(c,s) = mean(vals);
    end
end

%// Then establish the C and S that gave you the bet f-score. Don't forget that c and s are just indexes though!
[cbest, sbest] = find(fscores == max(fscores(:)));
C_final = C(cbest);
S_final = S(sbest);
现在我们只需要定义函数
fun
。文档中有关于
乐趣的描述:

fun是一个函数句柄,函数有两个输入,即训练 X的子集XTRAIN和X的测试子集XTEST,如下所示:

testval=fun(XTRAIN,XTEST)每次调用它时,fun都应该使用 XTRAIN来拟合模型,然后返回在上计算的一些条件testval XTEST使用的是那个合适的型号

所以
fun
需要:

  • 输出一个f分数
  • 将X和Y的训练和测试集作为输入。请注意,它们都是实际训练集的子集!将其视为培训集的培训和验证子集。还请注意,crossval将为您拆分这些设置
  • 在训练子集上训练分类器(使用循环中当前的
    C
    S
    参数)
  • 在测试(或验证)子集上运行新分类器
  • 计算并输出性能指标(在您的情况下,您需要f1分数)
您会注意到,
fun
不能接受任何额外的参数,这就是为什么我将它包装在一个匿名函数中,以便我们可以传入当前的
C
S
值。(即上面所有的
@(…)(fun(…)
内容。这只是将我们的六个参数
fun
转换为
crossval
所需的四个参数的一个技巧

function fscore = fun(XTRAIN, YTRAIN, XVAL, YVAL, C, S)

   svmModel = svmtrain(XTRAIN, YTRAIN, ...
       'Autoscale',true, 'Showplot',false, 'Method','ls', ...
      'BoxConstraint', C, 'Kernel_Function','rbf', 'RBF_Sigma', S);

   pred = svmclassify(svmModel, XVAL, 'Showplot',false);

   CP = classperf(YVAL, pred)

   fscore = ... %// You can do this bit the same way you did earlier
end

我发现
target(trainIdx)
唯一的问题。它是一个行向量,所以我用
target(trainIdx)替换了
target(trainIdx)
这是一个列向量。

您想使用Matlab的统计工具箱还是机器学习?无论哪种方法,代码都将与您的libsvm示例完全不同。请先在Matlab中尝试编码:或者至少发布您的Matlab SVM代码,您可以将参数任意设置为单个值(即,不通过交叉验证进行调整)这是代码用于libsvm的链接。您需要首先向我们展示非libsvm代码的外观。只需发布简单代码,无需交叉验证(即网格搜索)但是作为一个线索,你可能想使用这个函数:我已经编辑了代码并添加了MATLAB代码。谢谢你,没问题-如果你发现任何错误,请告诉我!首先,为什么你要将它改为使用
…,inpt,target
?这是一个匿名函数,这些变量不是你工作区中的变量…其次是error听起来像是在抱怨你在
target(trIdx)
XTRAIN,YTRAIN,XVAL,YVAL
get-generated by
crossval
crossval
需要你向它提供训练数据(然后它将自己分割成更小的训练集和验证集)…我已将其更改为…vals=crossval(@(XTRAIN,YTRAIN,XVAL,YVAL)(fun(XTRAIN,YTRAIN,XVAL,YVAL,C(C),S(C))),输入,目标);………输入是一个40X12数据集,目标是一个40X1数据目标,有两个类0s和1s……。但是MATLAB仍然说Ground truth必须至少有两个类。@Wasnuga因此,首先,撤销该更改,因为您以后仍然需要测试集。我确信问题在于
classperf
(以后请发布整个错误)。你只需要在没有它的情况下计算f1分数(这真的不难)。如果你无法做到这一点(认真尝试后),请将其作为单独的问题提问。