Python 在嵌套交叉验证中打印所选参数
下面是一个使用scikit learn从k近邻获得交叉验证预测的示例,其中k是通过交叉验证选择的。代码似乎有效,但如何打印在每个外部折叠中选择的kPython 在嵌套交叉验证中打印所选参数,python,scikit-learn,Python,Scikit Learn,下面是一个使用scikit learn从k近邻获得交叉验证预测的示例,其中k是通过交叉验证选择的。代码似乎有效,但如何打印在每个外部折叠中选择的k import numpy as np, sklearn n = 100 X = np.random.randn(n, 2) y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red") preds = sklearn.model_selection.cro
import numpy as np, sklearn
n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")
preds = sklearn.model_selection.cross_val_predict(
X = X,
y = y,
estimator = sklearn.model_selection.GridSearchCV(
estimator = sklearn.neighbors.KNeighborsClassifier(),
param_grid = {'n_neighbors': range(1, 7)},
cv = sklearn.model_selection.KFold(10, random_state = 133),
scoring = 'accuracy'),
cv = sklearn.model_selection.KFold(10, random_state = 144))
您无法直接从该函数中获取此信息,因此您需要将
交叉值预测
替换为交叉验证
,并将返回估计器
标志设置为真
。然后,您可以使用键estimator
选择返回字典中使用的估计器。估计器的选定参数存储在属性最佳参数
中。所以
import numpy as np
import sklearn
# sklearn 0.20.3 doesn't seem to import submodules in __init__
# So importing them directly is required.
import sklearn.model_selection
import sklearn.neighbors
n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")
scores = sklearn.model_selection.cross_validate(
X = X,
y = y,
estimator = sklearn.model_selection.GridSearchCV(
estimator = sklearn.neighbors.KNeighborsClassifier(),
param_grid = {'n_neighbors': range(1, 7)},
cv = sklearn.model_selection.KFold(10, random_state = 133),
scoring = 'accuracy'),
cv = sklearn.model_selection.KFold(10, random_state = 144),
return_estimator=True)
# Selected hyper-parameters for the estimator from the first fold
print(scores['estimator'][0].best_params_)
不幸的是,您无法从同一函数中获得实际预测和选定的超参数。如果需要,则必须手动执行嵌套交叉验证:
cv = sklearn.model_selection.KFold(10, random_state = 144)
estimator = sklearn.model_selection.GridSearchCV(
estimator = sklearn.neighbors.KNeighborsClassifier(),
param_grid = {'n_neighbors': range(1, 7)},
cv = sklearn.model_selection.KFold(10, random_state = 133),
scoring = 'accuracy')
for train, test in cv.split(X,y):
X_train, y_train = X[train], y[train]
X_test, y_test = X[test], y[test]
m = estimator.fit(X_train, y_train)
print(m.best_params_)
y_pred = m.predict(X_test)
print(y_pred)
那么,你知道我如何在不额外运行模型的情况下获得预测和选择的参数吗?