Python 一维阵列上的支持向量机

Python 一维阵列上的支持向量机,python,machine-learning,scikit-learn,svm,Python,Machine Learning,Scikit Learn,Svm,我在摆弄泰坦尼克号的数据集。我尝试使用以下代码将SVM应用于许多单独的功能: quanti_vars = ['Age','Pclass','Fare','Parch'] imp_med = SimpleImputer(missing_values=np.nan, strategy='median') imp_med.fit(titanic[['Age']]) for i in (X_train, X_test): i[['Age']] = imp_med.transform(i[['

我在摆弄泰坦尼克号的数据集。我尝试使用以下代码将SVM应用于许多单独的功能:

quanti_vars = ['Age','Pclass','Fare','Parch']

imp_med = SimpleImputer(missing_values=np.nan, strategy='median')
imp_med.fit(titanic[['Age']])

for i in (X_train, X_test):
    i[['Age']] = imp_med.transform(i[['Age']])

svm_clf = SVC()
svm_clf.fit(X_train[quanti_vars], y_train)
y_pred = svm_clf.predict(X_test[quanti_vars])
svm_accuracy = accuracy_score(y_pred, y_test)
svm_accuracy

for i in quanti_vars:
    svm_clf.fit(X_train[i], y_train)
    y_pred = svm_clf.predict(X_test[i])
    svm_accuracy = accuracy_score(y_pred, y_test)
    print(i,': ',svm_accuracy)

最后的
for
循环抛出了一个
ValueError的错误:预期的2D数组,取而代之的是1D数组
,我不知道为什么——SVM不能对单个特征进行操作吗?

我意识到,非常简单,我需要将
I
放在两个括号中才能正确地进行子集划分。因此:

for i in quanti_vars:
    svm_clf.fit(X_train[[i]], y_train)
    y_pred = svm_clf.predict(X_test[[i]])
    svm_accuracy = accuracy_score(y_pred, y_test)
    print(i,': ',svm_accuracy)
产生

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Age :  0.5874125874125874
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Pclass :  0.5874125874125874
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Fare :  0.42657342657342656
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Parch :  0.6153846153846154
(我不会假装它很好,但至少它起作用了。)

这很容易 只需写下以下内容:

y_pred = svm_clf.predict([X_test[i]])
添加[]将其转换为二维数组 而且最好是把票价带到10级,而不是直接使用 bcs 30美元和50美元之间有巨大的差异,但随着价格的上涨,差异会逐渐消失
例如,300$和500$之间没有太大差异

我可以检查最好和最差的单一特征分类器,以确定我应该继续使用哪些分类器,以及哪些分类器效率最低。