Python SKLearn多类分类器
我编写了以下代码从文件导入数据向量,并测试SVM分类器的性能(使用sklearn和python) 然而,分类器性能低于任何其他分类器(例如,NNet对测试数据的准确率为98%,但最多为92%)。根据我的经验,支持向量机应该能为这类数据产生更好的结果 我可能做错什么了吗Python SKLearn多类分类器,python,scikit-learn,svm,Python,Scikit Learn,Svm,我编写了以下代码从文件导入数据向量,并测试SVM分类器的性能(使用sklearn和python) 然而,分类器性能低于任何其他分类器(例如,NNet对测试数据的准确率为98%,但最多为92%)。根据我的经验,支持向量机应该能为这类数据产生更好的结果 我可能做错什么了吗 import numpy as np def buildData(featureCols, testRatio): f = open("car-eval-data-1.csv") data = np.loadtx
import numpy as np
def buildData(featureCols, testRatio):
f = open("car-eval-data-1.csv")
data = np.loadtxt(fname = f, delimiter = ',')
X = data[:, :featureCols] # select columns 0:featureCols-1
y = data[:, featureCols] # select column featureCols
n_points = y.size
print "Imported " + str(n_points) + " lines."
### split into train/test sets
split = int((1-testRatio) * n_points)
X_train = X[0:split,:]
X_test = X[split:,:]
y_train = y[0:split]
y_test = y[split:]
return X_train, y_train, X_test, y_test
def buildClassifier(features_train, labels_train):
from sklearn import svm
#clf = svm.SVC(kernel='linear',C=1.0, gamma=0.1)
#clf = svm.SVC(kernel='poly', degree=3,C=1.0, gamma=0.1)
clf = svm.SVC(kernel='rbf',C=1.0, gamma=0.1)
clf.fit(features_train, labels_train)
return clf
def checkAccuracy(clf, features, labels):
from sklearn.metrics import accuracy_score
pred = clf.predict(features)
accuracy = accuracy_score(pred, labels)
return accuracy
features_train, labels_train, features_test, labels_test = buildData(6, 0.3)
clf = buildClassifier(features_train, labels_train)
trainAccuracy = checkAccuracy(clf, features_train, labels_train)
testAccuracy = checkAccuracy(clf, features_test, labels_test)
print "Training Items: " + str(labels_train.size) + ", Test Items: " + str(labels_test.size)
print "Training Accuracy: " + str(trainAccuracy)
print "Test Accuracy: " + str(testAccuracy)
i = 0
while i < labels_test.size:
pred = clf.predict(features_test[i])
print "F(" + str(i) + ") : " + str(features_test[i]) + " label= " + str(labels_test[i]) + " pred= " + str(pred);
i = i + 1
我发现问题后,很长一段时间,我张贴它,以防有人需要它 问题是数据导入函数不会洗牌数据。如果以某种方式对数据进行了排序,那么就有可能用一些数据训练分类器,并用完全不同的数据对其进行测试。在NNet的情况下,使用Matlab自动洗牌输入数据
def buildData(filename, featureCols, testRatio):
f = open(filename)
data = np.loadtxt(fname = f, delimiter = ',')
np.random.shuffle(data) # randomize the order
X = data[:, :featureCols] # select columns 0:featureCols-1
y = data[:, featureCols] # select column featureCols
n_points = y.size
print "Imported " + str(n_points) + " lines."
### split into train/test sets
split = int((1-testRatio) * n_points)
X_train = X[0:split,:]
X_test = X[split:,:]
y_train = y[0:split]
y_test = y[split:]
return X_train, y_train, X_test, y_test
我相信sklearn默认情况下会为支持向量机的多类分类创建一组一对多分类器。你也可以尝试使用来优化svm超参数。一定要使用GridSearchCV来调整C和gamma,也可以使用MinMaxScaler或StandardScaler缩放数据谢谢,我明天会测试它。
def buildData(filename, featureCols, testRatio):
f = open(filename)
data = np.loadtxt(fname = f, delimiter = ',')
np.random.shuffle(data) # randomize the order
X = data[:, :featureCols] # select columns 0:featureCols-1
y = data[:, featureCols] # select column featureCols
n_points = y.size
print "Imported " + str(n_points) + " lines."
### split into train/test sets
split = int((1-testRatio) * n_points)
X_train = X[0:split,:]
X_test = X[split:,:]
y_train = y[0:split]
y_test = y[split:]
return X_train, y_train, X_test, y_test