Machine learning sklearn决策树集成和贝叶斯优化的结果非常糟糕_Machine Learning_Scikit Learn_Random Forest_Decision Tree_Ensemble Learning

Machine learning sklearn决策树集成和贝叶斯优化的结果非常糟糕

machine-learning scikit-learn

Machine learning sklearn决策树集成和贝叶斯优化的结果非常糟糕,machine-learning,scikit-learn,random-forest,decision-tree,ensemble-learning,Machine Learning,Scikit Learn,Random Forest,Decision Tree,Ensemble Learning,没有贝叶斯优化： model = BaggingClassifier(base_estimator=DecisionTreeClassifier(min_samples_split=15), n_estimators=100, random_state=7) 结果: Training set - Matthews correlation coefficient:0.93 Test set - Matthews correlation coefficient: 0.455845302538492

没有贝叶斯优化：

model = BaggingClassifier(base_estimator=DecisionTreeClassifier(min_samples_split=15), n_estimators=100, random_state=7)

结果:

Training set - Matthews correlation coefficient:0.93
Test set - Matthews correlation coefficient: 0.45584530253849204

模型参数- model.get_params（）：

我决定进行贝叶斯优化以减少过度拟合：

    param_hyperopt = {
        'ccp_alpha': hp.uniform('ccp_alpha', 0, 1),
        'max_depth': scope.int(hp.quniform('max_depth', 5, 20, 1)),
        'n_estimators': scope.int(hp.quniform('n_estimators', 20, 200, 1)),
        'max_features': scope.int(hp.quniform('max_features', 2, 10, 1)),
        'min_samples_leaf': scope.int(hp.quniform('min_samples_leaf', 1, 40, 1)),
        'splitter': hp.choice('splitter', ['best', 'random']),
        'criterion': hp.choice('criterion', ['gini', 'entropy']),
        'max_leaf_nodes': scope.int(hp.quniform('max_leaf_nodes', 2, 20, 1)),
        'min_impurity_decrease': hp.uniform('min_impurity_decrease', 0, 1),
        'min_samples_split': scope.int(hp.quniform('min_samples_split', 3, 40, 1)),
        'min_weight_fraction_leaf': hp.uniform('min_weight_fraction_leaf', 0, 0.5),
        "max_samples" : scope.int(hp.quniform('max_samples', 1, 10, 1)),
    }

def objective_function(params):
    n_estimators = params["n_estimators"]
    max_samples = params["max_samples"]
    del params["n_estimators"]
    del params["max_samples"]
    clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(**params), n_estimators=n_estimators, max_samples=max_samples, random_state=7)
    score = cross_val_score(clf, X_train, np.ravel(y_train), cv=5).mean()
    return {'loss': -score, 'status': STATUS_OK}

trials = Trials()
best_param = fmin(objective_function, 
                  param_hyperopt, 
                  algo=tpe.suggest, 
                  max_evals=200, 
                  trials=trials,
                  rstate= np.random.RandomState(1))
loss = [x['result']['loss'] for x in trials.trials]

best_param_values = [x for x in best_param.values()]

我得到了这些结果：

{'ccp_alpha': 0.5554600863908586,
 'criterion': 1,
 'max_depth': 15.0,
 'max_features': 9,
 'max_leaf_nodes': 3,
 'min_impurity_decrease': 0.6896630931867213,
 'min_samples_leaf': 38,
 'min_samples_split': 4,
 'min_weight_fraction_leaf': 0.48094992349222787,
 'splitter': 1}

具有调整参数的模型：

clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(
    ccp_alpha =best_param["ccp_alpha"],
    criterion="entropy",
    max_depth=best_param["max_depth"],
    max_features=best_param["max_features"],
    max_leaf_nodes=best_param["max_leaf_nodes"],
    min_impurity_decrease=best_param["min_impurity_decrease"],
    min_samples_leaf=best_param["min_samples_leaf"],
    min_samples_split=best_param["min_samples_split"],
    min_weight_fraction_leaf=best_param["min_weight_fraction_leaf"],
    splitter="random"
    
), n_estimators=int(n_estimators), max_samples=int(max_samples), random_state=702120)

clf.fit(X_train, np.ravel(y_train))

这就是我得到的结果混淆矩阵：

array([[   0, 5897],
       [   0, 5974]])

把所有的东西都放在同一个班上！为什么会这样

array([[   0, 5897],
       [   0, 5974]])