Python中支持向量机模型分数的可变性/随机性'；让我们来学习_Python_Machine Learning_Scikit Learn_Svm

Python中支持向量机模型分数的可变性/随机性'；让我们来学习

python machine-learning scikit-learn

Python中支持向量机模型分数的可变性/随机性'；让我们来学习,python,machine-learning,scikit-learn,svm,Python,Machine Learning,Scikit Learn,Svm,我正在测试几个ML分类模型，在本例中是支持向量机。我对SVM算法及其工作原理有基本的了解我正在使用scikit学习的内置乳腺癌数据集 from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.svm import LinearSVC 使用以下代码： cancer = load_breast_cancer() X_train,

我正在测试几个ML分类模型，在本例中是支持向量机。我对SVM算法及其工作原理有基本的了解

我正在使用scikit学习的内置乳腺癌数据集

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC

使用以下代码：

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, 
                                                    stratify=cancer.target, random_state=42)
clf2 = LinearSVC(C=0.01).fit(X_train, y_train)
clf3 = LinearSVC(C=0.1).fit(X_train, y_train)
clf4 = LinearSVC(C=1).fit(X_train, y_train)
clf5 = LinearSVC(C=10).fit(X_train, y_train)
clf6 = LinearSVC(C=100).fit(X_train, y_train)

打印分数时，如中所示：

print("Model training score with C=0.01:\n{:.3f}".format(clf2.score(X_train, y_train)))
print("Model testing score with C=0.01:\n{:.3f}".format(clf2.score(X_test, y_test)))
print("------------------------------")
print("Model training score with C=0.1:\n{:.3f}".format(clf3.score(X_train, y_train)))
print("Model testing score with C=0.1:\n{:.3f}".format(clf3.score(X_test, y_test)))
print("------------------------------")
print("Model training score with C=1:\n{:.3f}".format(clf4.score(X_train, y_train)))
print("Model testing score with C=1:\n{:.3f}".format(clf4.score(X_test, y_test)))
print("------------------------------")
print("Model training score with C=10:\n{:.3f}".format(clf5.score(X_train, y_train)))
print("Model testing score with C=10:\n{:.3f}".format(clf5.score(X_test, y_test)))
print("------------------------------")
print("Model training score with C=100:\n{:.3f}".format(clf6.score(X_train, y_train)))
print("Model testing score with C=100:\n{:.3f}".format(clf6.score(X_test, y_test)))

当我运行这段代码时，我会根据不同的正则化参数C得到一定的分数。当我再次运行.fit行（也就是再次训练它们）时，这些分数会完全不同。有时它们甚至有很大的不同（例如，对于相同的C值，分别为72%和90%）

这种可变性从何而来？我认为，假设我使用相同的随机_状态参数，它总是会找到相同的支持向量，因此会给我相同的结果，但由于分数在我下次训练模型时发生变化，情况并非如此。例如，在逻辑回归中，无论我是否进行拟合，分数总是一致的。再次编码

解释准确度分数的这种可变性会有很大帮助

当然您需要将
random_state=None
固定到特定的种子，以便可以复制结果。

否则，您将使用默认的

random_state=None

，因此，每次调用命令时，都会使用一个随机种子，这就是您获得这种可变性的原因

使用：

有道理，谢谢！但是，当我为逻辑回归创建具有不同C值的相似模型时，为什么不指定每个模型的随机状态呢。但是如果得到相同的结果，这意味着结果和估计对C参数是不变的。在其他设置下，如果不固定随机种子，结果将永远不会相同。模型参数中的此随机_状态指示用于开始梯度坐标下降的伪随机数，因此，如果要计算这些伪随机数，则所有模型中的伪随机数必须相同。列车测试分割功能中的随机状态包括列车/测试数据的随机抽取方式。这是正确的吗？

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, 
                                                    stratify=cancer.target, random_state=42)
clf2 = LinearSVC(C=0.01,random_state=42).fit(X_train, y_train)
clf3 = LinearSVC(C=0.1, random_state=42).fit(X_train, y_train)
clf4 = LinearSVC(C=1,   random_state=42).fit(X_train, y_train)
clf5 = LinearSVC(C=10,  random_state=42).fit(X_train, y_train)
clf6 = LinearSVC(C=100, random_state=42).fit(X_train, y_train)