Python GridSearchCV-跨测试访问预测值？_Python_Scikit Learn_Svm_Grid Search

Python GridSearchCV-跨测试访问预测值？

python scikit-learn

Python GridSearchCV-跨测试访问预测值？,python,scikit-learn,svm,grid-search,Python,Scikit Learn,Svm,Grid Search,是否有办法访问GridSearchCV过程中计算的预测值我希望能够根据实际值（来自测试/验证集）绘制预测的y值一旦网格搜索完成，我可以使用 ypred = grid.predict(xv) 但我希望能够绘制网格搜索期间计算的值。也许有一种方法可以将点保存为数据帧 from sklearn.preprocessing import StandardScaler from sklearn.model_selection import GridSearchCV, KFold, cross_v

是否有办法访问GridSearchCV过程中计算的预测值

我希望能够根据实际值（来自测试/验证集）绘制预测的y值

一旦网格搜索完成，我可以使用

 ypred = grid.predict(xv)

但我希望能够绘制网格搜索期间计算的值。也许有一种方法可以将点保存为数据帧

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV, KFold, 
cross_val_score, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.svm import SVR

scaler = StandardScaler()
svr_rbf = SVR(kernel='rbf')
pipe = Pipeline(steps=[('scaler', scaler), ('svr_rbf', svr_rbf)])
grid = GridSearchCV(pipe, param_grid=parameters, cv=splits, refit=True, verbose=3, scoring=msescorer, n_jobs=4)
grid.fit(xt, yt)

一种解决方案是创建自定义记分器，并将收到的参数保存到全局变量中：

from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error,make_scorer

X, y = np.random.rand(2,200)
clf = SVR()

ys = []

def MSE(y_true,y_pred):
    global ys
    ys.append(y_pred)
    mse = mean_squared_error(y_true, y_pred)
    return mse

def scorer():
    return make_scorer(MSE, greater_is_better=False)

n_splits = 3 
cv = GridSearchCV(clf, {'degree':[1,2,3]}, scoring=scorer(), cv=n_splits)
cv.fit(X.reshape(-1, 1), y)

然后，我们需要将每个拆分收集到一个完整阵列中：

idxs = range(0, len(ys)+1, n_splits)
#e.g. [0, 3, 6, 9]
#collect every n_split elements into a single list
new = [ys[j[0]+1:j[1]] for j in zip(idxs,idxs[1:])]
#summing every such list
ys = [reduce(lambda x,y:np.concatenate((x,y), axis=0), i) for i in new]

就我而言，你不能那样做（但我可能错了）。我能想到的解决方案是分别预测每个参数配置的值。但是，这不会复制

GridSearchCV

，因为您的测试/训练样本会有所不同（尤其是KFoldValidation）。您可以尝试为每个参数值的范围绘制

平均测试分数

，保持所有其他参数不变。尽管如此，它也不是最佳的，因为不同的参数配置会相互影响。如果我在实例化

GridSearchCV

时使用参数

n_jobs

，则可能重复，这不起作用。另外，我得到了

y\u pred

，但我想得到

y\u pred\u proba

。有解决办法吗？