Matlab 一袋单词没有正确地标记响应_Matlab_Opencv_Image Processing_Machine Learning_Computer Vision

Matlab 一袋单词没有正确地标记响应

matlab opencv image-processing machine-learning computer-vision

Matlab 一袋单词没有正确地标记响应,matlab,opencv,image-processing,machine-learning,computer-vision,Matlab,Opencv,Image Processing,Machine Learning,Computer Vision,我正在尝试在opencv中实现单词包，并附带了下面的实现。我正在使用。然而，由于这是我第一次，我并不熟悉，我计划使用数据库中的两个图像集，椅子图像集和足球图像集。我已经为支持向量机编码使用一切都很顺利，除了调用分类器.predict（descriptor）时，我没有得到预期的标签vale。无论我的测试图像如何，我总是得到一个0而不是“1”。椅子数据集中的图像数为10，足球数据集中的图像数为10。我把椅子标为0，足球标为1。链接代表每个类别的样本，前10个是椅子，下10个是足球 function

我正在尝试在opencv中实现单词包，并附带了下面的实现。我正在使用。然而，由于这是我第一次，我并不熟悉，我计划使用数据库中的两个图像集，椅子图像集和足球图像集。我已经为支持向量机编码使用

一切都很顺利，除了调用

分类器.predict（descriptor）

时，我没有得到预期的标签vale。无论我的测试图像如何，我总是得到一个

而不是“1”。椅子数据集中的图像数为

，足球数据集中的图像数为

。我把椅子标为

，足球标为

。链接代表每个类别的样本，前10个是椅子，下10个是足球

function hello

    clear all; close all; clc;

    detector = cv.FeatureDetector('SURF');
    extractor = cv.DescriptorExtractor('SURF');


    links = {
    'http://i.imgur.com/48nMezh.jpg'
    'http://i.imgur.com/RrZ1i52.jpg'
    'http://i.imgur.com/ZI0N3vr.jpg'
    'http://i.imgur.com/b6lY0bJ.jpg'
    'http://i.imgur.com/Vs4TYPm.jpg'
    'http://i.imgur.com/GtcwRWY.jpg'
    'http://i.imgur.com/BGW1rqS.jpg'
    'http://i.imgur.com/jI9UFn8.jpg'
    'http://i.imgur.com/W1afQ2O.jpg'
    'http://i.imgur.com/PyX3adM.jpg'


    'http://i.imgur.com/U2g4kW5.jpg'
    'http://i.imgur.com/M8ZMBJ4.jpg'
    'http://i.imgur.com/CinqIWI.jpg'
    'http://i.imgur.com/QtgsblB.jpg'
    'http://i.imgur.com/SZX13Im.jpg'
    'http://i.imgur.com/7zVErXU.jpg'
    'http://i.imgur.com/uUMGw9i.jpg'
    'http://i.imgur.com/qYSkqEg.jpg'
    'http://i.imgur.com/sAj3pib.jpg'
    'http://i.imgur.com/DMPsKfo.jpg'
    };


    N = numel(links);

    trainer = cv.BOWKMeansTrainer(100);


    train = struct('val',repmat({' '},N,1),'img',cell(N,1), 'pts',cell(N,1), 'feat',cell(N,1));


    for i=1:N

      train(i).val = links{i};
      train(i).img = imread(links{i});

       if ndims(train(i).img > 2)
         train(i).img = rgb2gray(train(i).img);
       end;

       train(i).pts = detector.detect(train(i).img);
       train(i).feat = extractor.compute(train(i).img,train(i).pts);

     end;

     for i=1:N
          trainer.add(train(i).feat);
     end;

     dictionary = trainer.cluster();
     extractor = cv.BOWImgDescriptorExtractor('SURF','BruteForce');
     extractor.setVocabulary(dictionary);

     for i=1:N
          desc(i,:) = extractor.compute(train(i).img,train(i).pts);
     end;

     a = zeros(1,10)';
     b = ones(1,10)';
     labels = [a;b];


     classifier  = cv.SVM;
     classifier.train(desc,labels);

     test_im =rgb2gray(imread('D:\ball1.jpg'));

     test_pts = detector.detect(test_im);
     test_feat = extractor.compute(test_im,test_pts);

     val = classifier.predict(test_feat);
     disp('Value is: ')
     disp(val)

     end

这些是我的测试样本：

Soccer Ball

在这个网站上搜索，我认为我的算法是好的，尽管我对它不是很有信心。如果有人能帮我找到这个bug，我会很感激的

按照Amro的代码，这是我的结果：

您用于建立词典的图像数量是多少，即N是多少？从您的代码来看，您似乎只使用了10个图像（链接中列出的图像）。我希望这篇文章的列表被截短，否则就太少了。通常，您需要1000个或更多的图像来构建字典，并且图像不需要仅限于您正在分类的这两个类。否则，只有10个图像和100个集群，你的字典很可能会一团糟

此外，您可能希望使用SIFT作为首选，因为它往往比其他描述符表现得更好

最后，还可以通过检查检测到的关键点进行调试。可以使用OpenCV绘制关键点。有时您的关键点检测器参数设置不正确，导致检测到的关键点太少，从而产生较差的特征向量

要了解更多关于BOW算法的信息，您可以查看以下文章和。第二篇文章有一个链接，链接到一本免费的pdf，内容是关于使用python的计算机视觉的O'Reilley书籍。弓模型（和其他有用的东西）在书中有更详细的描述

希望这有帮助。

我觉得你的逻辑很好

现在我想如果你想提高分类精度，你必须调整各种参数。这包括参数（如词汇表大小、聚类初始化、终止标准等）、SVM参数（核类型、C系数等）、使用的局部特征算法（SIFT、SURF等）

理想情况下，无论何时要执行参数选择，都应该使用。有些方法已经嵌入了这样的机制（例如），但在大多数情况下，您必须手动执行此操作

最后，你应该遵循一般的机器学习指南；看到整体。在线课程在第6周详细讨论了这个主题，并解释了如何执行错误分析和使用学习曲线来决定下一步要尝试什么（我们是否需要添加更多实例、增加模型复杂性等等）

话虽如此，我还是编写了自己的代码版本。您可能想将其与您的代码进行比较：

% dataset of images
% I previously saved them as: chair1.jpg, ..., ball1.jpg, ball2.jpg, ...
d = [
    dir(fullfile('images','chair*.jpg')) ;
    dir(fullfile('images','ball*.jpg'))
];

% local-features algorithm used
detector = cv.FeatureDetector('SURF');
extractor = cv.DescriptorExtractor('SURF');

% extract local features from images
t = struct();
for i=1:numel(d)
    % load image as grayscale
    img = imread(fullfile('images', d(i).name));
    if ~ismatrix(img), img = rgb2gray(img); end

    % extract local features
    pts = detector.detect(img);
    feat = extractor.compute(img, pts);

    % store along with class label
    t(i).img = img;
    t(i).class = find(strncmp(d(i).name,{'chair','ball'},4));
    t(i).pts = pts;
    t(i).feat = feat;
end

% split into training/testing sets
% (a better way would be to use cvpartition from Statistics toolbox)
disp('Distribution of classes:')
tabulate([t.class])
tTrain = t([1:7 11:17]);
tTest = t([8:10 18:20]);
fprintf('Number of training instances = %d\n', numel(tTrain));
fprintf('Number of testing instances = %d\n', numel(tTest));

% build visual vocabulary (by clustering training descriptors)
K = 100;
bowTrainer = cv.BOWKMeansTrainer(K, 'Attempts',5, 'Initialization','PP');
clust = bowTrainer.cluster(vertcat(tTrain.feat));

fprintf('Number of keypoints detected = %d\n', numel([tTrain.pts]));
fprintf('Codebook size = %d\n', K);

% compute histograms of visual words for each training image
bowExtractor = cv.BOWImgDescriptorExtractor('SURF', 'BruteForce');
bowExtractor.setVocabulary(clust);
M = zeros(numel(tTrain), K);
for i=1:numel(tTrain)
    M(i,:) = bowExtractor.compute(tTrain(i).img, tTrain(i).pts);
end
labels = vertcat(tTrain.class);

% train an SVM model (perform paramter selection using cross-validation)
svm = cv.SVM();
svm.train_auto(M, labels, 'SvmType','C_SVC', 'KernelType','RBF');
disp('SVM model parameters:'); disp(svm.Params)

% evaluate classifier using testing images
actual = vertcat(tTest.class);
pred = zeros(size(actual));
for i=1:numel(tTest)
    descs = bowExtractor.compute(tTest(i).img, tTest(i).pts);
    pred(i) = svm.predict(descs);
end

% report performance
disp('Confusion matrix:')
confusionmat(actual, pred)
fprintf('Accuracy = %.2f %%\n', 100*nnz(pred==actual)./numel(pred));

以下是输出：

Distribution of classes:
  Value    Count   Percent
      1       10     50.00%
      2       10     50.00%
Number of training instances = 14
Number of testing instances = 6

Number of keypoints detected = 6300
Codebook size = 100

SVM model parameters:
         svm_type: 'C_SVC'
      kernel_type: 'RBF'
           degree: 0
            gamma: 0.5063
            coef0: 0
                C: 312.5000
               nu: 0
                p: 0
    class_weights: []
        term_crit: [1x1 struct]

Confusion matrix:
ans =
     3     0
     1     2
Accuracy = 83.33 %

因此，分类器正确地标记了测试集中6个图像中的5个，这对一开始来说并不坏：）显然，由于聚类步骤固有的随机性，每次运行代码时都会得到不同的结果。

我的算法可以吗。我有点模糊的一般算法用于袋的话。还有我应该使用多少类，这需要很长时间，所以是一个大概的猜测。我应该包括所有的训练样本吗？@whowknows我添加了一些链接，指向我过去在BOW模型上给出的一些答案。您可能会发现其中的解释有助于理解它是如何工作的。另外，其中一篇文章包含一本有用的书的链接（用python编写代码），该书详细介绍了如何编写BOW模型。你可能会发现它们很有用。另一方面，如果你发布的内容与你正在运行的内容完全一致，那么它看起来就错了。首先，你需要从1000张图片中提取描述符来构建字典。字典的大小通常是通过交叉验证（即尝试和错误+绘图结果）决定的，谢谢。。。我不知道你是怎么知道那么多的，真的很可观。这真是离题了，你所有的专业知识是来自大学还是工作；或者两者兼而有之。一般来说，你有什么建议可以提高我在CS方面的水平。@whowknows:你会惊讶地发现，通过在Stack Overflow上闲逛，你能学到多少东西：）我对结果感到非常惊讶。@whowknows:97%的准确率确实是一个好结果！我认为是支持向量机“自动训练”帮了你（即执行交叉验证以选择支持向量机参数）@whowknows：我还没有研究过这个问题，但快速搜索表明弓+球组合存在潜在问题（真的不是你说的筛选或冲浪）：。。。

% dataset of images
% I previously saved them as: chair1.jpg, ..., ball1.jpg, ball2.jpg, ...
d = [
    dir(fullfile('images','chair*.jpg')) ;
    dir(fullfile('images','ball*.jpg'))
];

% local-features algorithm used
detector = cv.FeatureDetector('SURF');
extractor = cv.DescriptorExtractor('SURF');

% extract local features from images
t = struct();
for i=1:numel(d)
    % load image as grayscale
    img = imread(fullfile('images', d(i).name));
    if ~ismatrix(img), img = rgb2gray(img); end

    % extract local features
    pts = detector.detect(img);
    feat = extractor.compute(img, pts);

    % store along with class label
    t(i).img = img;
    t(i).class = find(strncmp(d(i).name,{'chair','ball'},4));
    t(i).pts = pts;
    t(i).feat = feat;
end

% split into training/testing sets
% (a better way would be to use cvpartition from Statistics toolbox)
disp('Distribution of classes:')
tabulate([t.class])
tTrain = t([1:7 11:17]);
tTest = t([8:10 18:20]);
fprintf('Number of training instances = %d\n', numel(tTrain));
fprintf('Number of testing instances = %d\n', numel(tTest));

% build visual vocabulary (by clustering training descriptors)
K = 100;
bowTrainer = cv.BOWKMeansTrainer(K, 'Attempts',5, 'Initialization','PP');
clust = bowTrainer.cluster(vertcat(tTrain.feat));

fprintf('Number of keypoints detected = %d\n', numel([tTrain.pts]));
fprintf('Codebook size = %d\n', K);

% compute histograms of visual words for each training image
bowExtractor = cv.BOWImgDescriptorExtractor('SURF', 'BruteForce');
bowExtractor.setVocabulary(clust);
M = zeros(numel(tTrain), K);
for i=1:numel(tTrain)
    M(i,:) = bowExtractor.compute(tTrain(i).img, tTrain(i).pts);
end
labels = vertcat(tTrain.class);

% train an SVM model (perform paramter selection using cross-validation)
svm = cv.SVM();
svm.train_auto(M, labels, 'SvmType','C_SVC', 'KernelType','RBF');
disp('SVM model parameters:'); disp(svm.Params)

% evaluate classifier using testing images
actual = vertcat(tTest.class);
pred = zeros(size(actual));
for i=1:numel(tTest)
    descs = bowExtractor.compute(tTest(i).img, tTest(i).pts);
    pred(i) = svm.predict(descs);
end

% report performance
disp('Confusion matrix:')
confusionmat(actual, pred)
fprintf('Accuracy = %.2f %%\n', 100*nnz(pred==actual)./numel(pred));

Distribution of classes:
  Value    Count   Percent
      1       10     50.00%
      2       10     50.00%
Number of training instances = 14
Number of testing instances = 6

Number of keypoints detected = 6300
Codebook size = 100

SVM model parameters:
         svm_type: 'C_SVC'
      kernel_type: 'RBF'
           degree: 0
            gamma: 0.5063
            coef0: 0
                C: 312.5000
               nu: 0
                p: 0
    class_weights: []
        term_crit: [1x1 struct]

Confusion matrix:
ans =
     3     0
     1     2
Accuracy = 83.33 %