Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python GridsearchSV能否包括列车试验的随机分组_Python_Scikit Learn_Sklearn Pandas - Fatal编程技术网

Python GridsearchSV能否包括列车试验的随机分组

Python GridsearchSV能否包括列车试验的随机分组,python,scikit-learn,sklearn-pandas,Python,Scikit Learn,Sklearn Pandas,使用Sklearn,GridSearchCV可以测试分类器函数的多个变量,例如: parameters = { 'learning_rate': [0.001,0.005,0.003], 'n_estimators': [300,800,1200], 'criterion': ['friedman_mse','mse','mae'], 'verbose':[1], 'loss' : ['deviance','exponential'], 'random_state':[0

使用Sklearn,GridSearchCV可以测试分类器函数的多个变量,例如:

parameters = {
  'learning_rate': [0.001,0.005,0.003],
  'n_estimators': [300,800,1200],
  'criterion': ['friedman_mse','mse','mae'],
  'verbose':[1],
  'loss' : ['deviance','exponential'],
  'random_state':[0]
  }

GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC, parameters)
grid.fit(X,y )   # X = data,  y = result
best_est = grid.best_estimator_
print(best_est)

predictions = best_est.predict(T) # T contains data to apply it on.
from sklearn.model_selection import GridSearchCV, ShuffleSplit

GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC,
                    param_grid=parameters,
                    cv=ShuffleSplit(train_size=X.shape[0],
                                    test_size=.3,
                                    n_splits=5,
                                    random_state=41))
grid.fit(X, y)
但如果有人想做交叉验证呢?例如,以类似于
列车测试\u分割的方式:

  X_train, X_test, y_train, y_test = train_test_split(X, y,  random_state=41)
这里我们有一个
随机状态
(这可能会产生很大的影响)。 是否可以将GridSearchCV包含在几个随机数的数组中,以确保它在某些数据的训练/测试分割的“大多数”随机状态下最佳工作


作为记录,我知道这不在GridSearchCV中(或者据我所知),我在这里问这样一个方法可能是什么样子。也许有一些聪明的方法可以做到这一点?

您可以指定
ShuffleSplit
作为交叉验证生成器

例如:

parameters = {
  'learning_rate': [0.001,0.005,0.003],
  'n_estimators': [300,800,1200],
  'criterion': ['friedman_mse','mse','mae'],
  'verbose':[1],
  'loss' : ['deviance','exponential'],
  'random_state':[0]
  }

GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC, parameters)
grid.fit(X,y )   # X = data,  y = result
best_est = grid.best_estimator_
print(best_est)

predictions = best_est.predict(T) # T contains data to apply it on.
from sklearn.model_selection import GridSearchCV, ShuffleSplit

GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC,
                    param_grid=parameters,
                    cv=ShuffleSplit(train_size=X.shape[0],
                                    test_size=.3,
                                    n_splits=5,
                                    random_state=41))
grid.fit(X, y)

我不知道这是可能的。