Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/339.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在scikit learn中计算每个交叉验证折叠中的互信息?_Python_Machine Learning_Scikit Learn_Cross Validation - Fatal编程技术网

Python 如何在scikit learn中计算每个交叉验证折叠中的互信息?

Python 如何在scikit learn中计算每个交叉验证折叠中的互信息?,python,machine-learning,scikit-learn,cross-validation,Python,Machine Learning,Scikit Learn,Cross Validation,我正在scikit管道中运行功能选择和交叉验证,以优化SVR中的超参数。由于互信息特征选择取决于响应变量,因此我希望通过计算每个折叠中的互信息统计信息来防止测试集泄漏到训练集中。据我所知,在将数据发送到模型之前,管道会在所有数据上运行选择器。运行互信息的正确方式是什么,以便只使用每个折叠中的响应数据 cached_pipe = Pipeline( [ ('selector', SelectKBest(mutual_info_regression)), ('model',SVR()) ]

我正在scikit管道中运行功能选择和交叉验证,以优化SVR中的超参数。由于互信息特征选择取决于响应变量,因此我希望通过计算每个折叠中的互信息统计信息来防止测试集泄漏到训练集中。据我所知,在将数据发送到模型之前,管道会在所有数据上运行选择器。运行互信息的正确方式是什么,以便只使用每个折叠中的响应数据

cached_pipe = Pipeline(
 [
 ('selector', SelectKBest(mutual_info_regression)),
 ('model',SVR())
 ]
)

# iterating through hyperparameters (loop not shown)
hyperparameters = dict(
    model__kernel=[k],
    model__C=C_range,
    model__gamma=gamma_range,
    model__epsilon = [epsilon],
    selector__k = [numFeat])

clf = GridSearchCV(
    estimator = cached_pipe,
    param_grid = hyperparameters,
    cv = LeaveOneOut(),
    scoring = 'neg_mean_squared_error',
    return_train_score = 0,
    n_jobs= -1,
    verbose=0)

best_model = clf.fit(X, y.ravel())
HyperModel['feature_scores'] = best_model.best_estimator_.named_steps['selector'].scores_
HyperModel['feature_support'] = best_model.best_estimator_.named_steps['selector'].get_support()