Machine learning sklearn决策树集成和贝叶斯优化的结果非常糟糕
没有贝叶斯优化:Machine learning sklearn决策树集成和贝叶斯优化的结果非常糟糕,machine-learning,scikit-learn,random-forest,decision-tree,ensemble-learning,Machine Learning,Scikit Learn,Random Forest,Decision Tree,Ensemble Learning,没有贝叶斯优化: model = BaggingClassifier(base_estimator=DecisionTreeClassifier(min_samples_split=15), n_estimators=100, random_state=7) 结果: Training set - Matthews correlation coefficient:0.93 Test set - Matthews correlation coefficient: 0.455845302538492
model = BaggingClassifier(base_estimator=DecisionTreeClassifier(min_samples_split=15), n_estimators=100, random_state=7)
结果:
Training set - Matthews correlation coefficient:0.93
Test set - Matthews correlation coefficient: 0.45584530253849204
模型参数-
model.get_params():
我决定进行贝叶斯优化以减少过度拟合:
param_hyperopt = {
'ccp_alpha': hp.uniform('ccp_alpha', 0, 1),
'max_depth': scope.int(hp.quniform('max_depth', 5, 20, 1)),
'n_estimators': scope.int(hp.quniform('n_estimators', 20, 200, 1)),
'max_features': scope.int(hp.quniform('max_features', 2, 10, 1)),
'min_samples_leaf': scope.int(hp.quniform('min_samples_leaf', 1, 40, 1)),
'splitter': hp.choice('splitter', ['best', 'random']),
'criterion': hp.choice('criterion', ['gini', 'entropy']),
'max_leaf_nodes': scope.int(hp.quniform('max_leaf_nodes', 2, 20, 1)),
'min_impurity_decrease': hp.uniform('min_impurity_decrease', 0, 1),
'min_samples_split': scope.int(hp.quniform('min_samples_split', 3, 40, 1)),
'min_weight_fraction_leaf': hp.uniform('min_weight_fraction_leaf', 0, 0.5),
"max_samples" : scope.int(hp.quniform('max_samples', 1, 10, 1)),
}
def objective_function(params):
n_estimators = params["n_estimators"]
max_samples = params["max_samples"]
del params["n_estimators"]
del params["max_samples"]
clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(**params), n_estimators=n_estimators, max_samples=max_samples, random_state=7)
score = cross_val_score(clf, X_train, np.ravel(y_train), cv=5).mean()
return {'loss': -score, 'status': STATUS_OK}
trials = Trials()
best_param = fmin(objective_function,
param_hyperopt,
algo=tpe.suggest,
max_evals=200,
trials=trials,
rstate= np.random.RandomState(1))
loss = [x['result']['loss'] for x in trials.trials]
best_param_values = [x for x in best_param.values()]
我得到了这些结果:
{'ccp_alpha': 0.5554600863908586,
'criterion': 1,
'max_depth': 15.0,
'max_features': 9,
'max_leaf_nodes': 3,
'min_impurity_decrease': 0.6896630931867213,
'min_samples_leaf': 38,
'min_samples_split': 4,
'min_weight_fraction_leaf': 0.48094992349222787,
'splitter': 1}
具有调整参数的模型:
clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(
ccp_alpha =best_param["ccp_alpha"],
criterion="entropy",
max_depth=best_param["max_depth"],
max_features=best_param["max_features"],
max_leaf_nodes=best_param["max_leaf_nodes"],
min_impurity_decrease=best_param["min_impurity_decrease"],
min_samples_leaf=best_param["min_samples_leaf"],
min_samples_split=best_param["min_samples_split"],
min_weight_fraction_leaf=best_param["min_weight_fraction_leaf"],
splitter="random"
), n_estimators=int(n_estimators), max_samples=int(max_samples), random_state=702120)
clf.fit(X_train, np.ravel(y_train))
这就是我得到的结果混淆矩阵:
array([[ 0, 5897],
[ 0, 5974]])
把所有的东西都放在同一个班上!为什么会这样
array([[ 0, 5897],
[ 0, 5974]])