Python 概率支持向量机，回归_Python_Machine Learning_Scikit Learn_Svm

Python 概率支持向量机，回归

python machine-learning scikit-learn

Python 概率支持向量机，回归,python,machine-learning,scikit-learn,svm,Python,Machine Learning,Scikit Learn,Svm,我目前已经为二进制类实现了一个概率（至少我这么认为）。现在我想将这种方法扩展到回归，并尝试将其用于Boston数据集。不幸的是，我的算法似乎卡住了，我当前运行的代码如下所示： from sklearn import decomposition from sklearn import svm from sklearn.model_selection import GridSearchCV from sklearn.model_selection import train_test_split im

我目前已经为二进制类实现了一个概率（至少我这么认为）。现在我想将这种方法扩展到回归，并尝试将其用于Boston数据集。不幸的是，我的算法似乎卡住了，我当前运行的代码如下所示：

from sklearn import decomposition
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

boston = load_boston()

X = boston.data
y = boston.target
inputs_train, inputs_test, targets_train, targets_test = train_test_split(X, y, test_size=0.33, random_state=42)

def plotting():
    param_C = [0.01, 0.1]
    param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
    clf = GridSearchCV(svm.SVR(), cv = 5, param_grid= param_grid)
    clf.fit(inputs_train, targets_train)
    clf = SVR(C=clf.best_params_['C'], cache_size=200, class_weight=None, coef0=0.0,
              decision_function_shape='ovr', degree=5, gamma=clf.best_params_['gamma'],
              kernel=clf.best_params_['kernel'],
              max_iter=-1, probability=True, random_state=None, shrinking=True,
              tol=0.001, verbose=False)
    clf.fit(inputs_train, targets_train)
    a = clf.predict(inputs_test[0])
    print(a)


plotting()

有人能告诉我，这种方法有什么不对，不是因为我收到了一些错误消息（我知道，我已经在上面提供了它们），而是代码从未停止运行。非常感谢您的任何建议。

您的代码有几个问题

首先，永远需要的是第一个
```
clf.fit
```
（即网格搜索），这就是为什么在第二个
```
clf.fit
```
中设置
```
max\u iter
```
和
```
tol
```
时没有看到任何变化
其次，
```
clf=SVR（）
```
部分将无法工作，因为：
- 您必须导入它，
```
SVR
```
  无法识别
- 对于可接受的
```
SVR
```
  参数，这里有一堆非法参数（
```
decision\u function\u shape
```
  ，
```
probability
```
  ，
```
random\u state
```
  等）
第三，您不需要再次使用最佳参数进行显式拟合；您只需在您的
```
GridSearchCV
```
定义中要求
```
refit=True
```
，然后使用
```
clf.best\u estimator\uuu
```
进行预测（评论后编辑：simply
```
clf.predict
```
）

因此，将内容移到任何函数定义之外，下面是代码的工作版本：

from sklearn.svm import SVR
# other imports as-is

# data loading & splitting as-is

param_C = [0.01, 0.1]
param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
clf = GridSearchCV(SVR(degree=5, max_iter=10000), cv = 5, param_grid= param_grid, refit=True,)
clf.fit(inputs_train, targets_train)
a = clf.best_estimator_.predict(inputs_test[0])
# a = clf.predict(inputs_test[0]) will also work 
print(a)
# [ 21.89849792]

除了

degree

，您使用的所有其他可接受的参数值实际上都是各自的默认值，因此在

SVR

定义中真正需要的参数只有

degree

和

max_iter

您将收到几个警告（不是错误），即安装后：

/databricks/python/lib/python3.5/site-packages/sklearn/svm/base.py:220：收敛警告：解算器提前终止（最大值=10000）。考虑使用StandardScaler或MinMaxScaler预处理数据

在预测之后：

/databricks/python/lib/python3.5/site-packages/sklearn/utils/validation.py:395: 不推荐使用警告：在0.17中不推荐将1d数组作为数据传递并将在0.19中提高ValueError。使用以下两种方法重塑数据：如果您的数据具有单个特征，则X.Reformate（-1，1）或X.Reformate（1，-1）如果它包含一个样本。弃用警告）

已经包含了一些关于下一步做什么的建议

最后但并非最不重要的一点是：概率分类器（即产生的分类器）是有效的，但“概率”回归模型不是

使用Python3.5和scikit learn0.18.1

进行测试，您已经将

max_iter

参数设置为

-1

，因此训练仅在达到公差

tol=0.001

时停止。大概没有达到这个公差。尝试将

max_iter

设置为非负整数。我只是尝试更改tol=1和max_iter=15，但没有更改。不幸的是，您还看到了其他看起来奇怪的东西吗？