Python 在嵌套交叉验证中打印所选参数_Python_Scikit Learn

Python 在嵌套交叉验证中打印所选参数

python scikit-learn

Python 在嵌套交叉验证中打印所选参数,python,scikit-learn,Python,Scikit Learn,下面是一个使用scikit learn从k近邻获得交叉验证预测的示例，其中k是通过交叉验证选择的。代码似乎有效，但如何打印在每个外部折叠中选择的k import numpy as np, sklearn n = 100 X = np.random.randn(n, 2) y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red") preds = sklearn.model_selection.cro

下面是一个使用scikit learn从k近邻获得交叉验证预测的示例，其中k是通过交叉验证选择的。代码似乎有效，但如何打印在每个外部折叠中选择的k

import numpy as np, sklearn

n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")

preds = sklearn.model_selection.cross_val_predict(
    X = X,
    y = y,
    estimator = sklearn.model_selection.GridSearchCV(
       estimator = sklearn.neighbors.KNeighborsClassifier(),
       param_grid = {'n_neighbors': range(1, 7)},
       cv = sklearn.model_selection.KFold(10, random_state = 133),
       scoring = 'accuracy'),
    cv = sklearn.model_selection.KFold(10, random_state = 144))

您无法直接从该函数中获取此信息，因此您需要将

交叉值预测

替换为

交叉验证

，并将

返回估计器

标志设置为

真

。然后，您可以使用键

estimator

选择返回字典中使用的估计器。估计器的选定参数存储在属性

最佳参数

中。所以

import numpy as np
import sklearn
# sklearn 0.20.3 doesn't seem to import submodules in __init__
# So importing them directly is required.
import sklearn.model_selection
import sklearn.neighbors

n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")

scores = sklearn.model_selection.cross_validate(
    X = X,
    y = y,
    estimator = sklearn.model_selection.GridSearchCV(
       estimator = sklearn.neighbors.KNeighborsClassifier(),
       param_grid = {'n_neighbors': range(1, 7)},
       cv = sklearn.model_selection.KFold(10, random_state = 133),
       scoring = 'accuracy'),
    cv = sklearn.model_selection.KFold(10, random_state = 144),
    return_estimator=True)

# Selected hyper-parameters for the estimator from the first fold
print(scores['estimator'][0].best_params_)

不幸的是，您无法从同一函数中获得实际预测和选定的超参数。如果需要，则必须手动执行嵌套交叉验证：

cv = sklearn.model_selection.KFold(10, random_state = 144)
estimator = sklearn.model_selection.GridSearchCV(
       estimator = sklearn.neighbors.KNeighborsClassifier(),
       param_grid = {'n_neighbors': range(1, 7)},
       cv = sklearn.model_selection.KFold(10, random_state = 133),
       scoring = 'accuracy')
for train, test in cv.split(X,y):
    X_train, y_train = X[train], y[train]
    X_test, y_test = X[test], y[test]
    m = estimator.fit(X_train, y_train)
    print(m.best_params_)
    y_pred = m.predict(X_test)
    print(y_pred)

那么，你知道我如何在不额外运行模型的情况下获得预测和选择的参数吗？