Python 二值分类器的SVM训练总是给类0_Python_Machine Learning_Svm

Python 二值分类器的SVM训练总是给类0

python machine-learning

Python 二值分类器的SVM训练总是给类0,python,machine-learning,svm,Python,Machine Learning,Svm,我正在用SVM分类器制作一个香蕉探测器项目。我有358图像样本，用于培训和进行测试大小=0.2，随机状态=42 以下是我的数据集的相似之处：我用0或1作为文件名postfix标记了每个图像。但是，分类报告（…）总是返回： Accuracy: 0.7352941176470589 UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predict

我正在用

SVM分类器

制作一个香蕉探测器项目。我有

图像样本，用于培训和进行

测试大小=0.2

，

随机状态=42

以下是我的数据集的相似之处：

我用

或

作为文件名

postfix

标记了每个图像。但是，

分类报告（…）

总是返回：

Accuracy: 0.7352941176470589
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
              precision    recall  f1-score   support

           0       0.74      1.00      0.85        50
           1       0.00      0.00      0.00        18

    accuracy                           0.74        68
   macro avg       0.37      0.50      0.42        68
weighted avg       0.54      0.74      0.62        68

类

在表摘要中始终具有

0.00

我的完整源代码：

import os
import zipfile
import numpy as np
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.externals import joblib
import cv2

zip_ref = zipfile.ZipFile("dataset.zip", "r")
zip_ref.extractall()
zip_ref.close()

path = "bananas_dataset"
img_files = [(os.path.join(root, name))
    for root, dirs, files in os.walk(path)
    for name in files if name.endswith((".jpg"))]

winSize = (32, 32)
blockSize = (16, 16)
blockStride = (8, 8)
cellSize = (8, 8)
nbins = 9
derivAperture = 1
winSigma = -1.
histogramNormType = 0
L2HysThreshold = 0.2
gammaCorrection = 1
nlevels = 64
useSignedGradients = True

hog = cv2.HOGDescriptor(winSize, blockSize, blockStride,
    cellSize, nbins, derivAperture, winSigma, histogramNormType,
    L2HysThreshold, gammaCorrection, nlevels, useSignedGradients)

features = np.zeros((1, 324), np.float32)
labels = np.zeros(1, np.int64)
for i in img_files:
    img = cv2.imread(i)
    resized_img = cv2.resize(img, winSize)
    descriptor = np.transpose(hog.compute(resized_img))
    features = np.vstack((features, descriptor))
    labels = np.vstack((labels, int(i[-5])))

features = np.delete(features, (0), axis=0)
labels = np.delete(labels, (0), axis=0).ravel()

X_train, X_test, y_train, y_test = train_test_split(features,
                                                    labels,
                                                    test_size=0.2,
                                                    random_state=42)
print("X_train: {}, y_train: {}".format(X_train.shape, y_train.shape))
print("X_test: {}, y_test: {}".format(X_test.shape, y_test.shape))

clf = svm.SVC()
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print("Accuracy: {}".format(accuracy_score(y_test, y_pred)))

print("Classification report:")
print(classification_report(y_test, y_pred))
joblib.dump(clf, "banana_hog_svm_clf.pkl")

这导致我的预测过程总是返回class

作为结果。为什么会发生这种情况？

这可能是由于标签不平衡造成的。例如，如果10%的标签属于类别1，90%的标签属于类别2，那么SVM将创建一个准确率为90%的模型，其中所有内容都被预测为类别2

如果您检查类标签的分布情况，这会有所帮助。

我认为不建议将支持向量机用于此类任务。通常在计算机视觉问题中，你们需要卷积神经网络（提取特征）。我采用了这种方法，因为它和我的任务类似。它与SVM一起工作。但是，在这个例子中，它有3600个样本，其中2900个用于类2，700个用于类1。这也是一种不平衡。那么表现如何呢？我没有看到该页面上报告的任何结果。你100%依赖GitHub页面的任何原因？这是一个非常有名的代码包吗？准确率相当高，约为94%。分类报告在类0和类1之间也是平衡的。当它是一个不平衡的数据集时，准确度总是好的，因为我们通常有一个支配类。这就是为什么我们应该检查少数族裔分类错误率或使用其他措施，如F1