不稳定结果xgboost模型调整与贝叶斯优化[Python]

不稳定结果xgboost模型调整与贝叶斯优化[Python],python,pandas,machine-learning,xgboost,forecasting,Python,Pandas,Machine Learning,Xgboost,Forecasting,我的xgboost模型在使用贝叶斯优化进行调优时遇到了一个问题。每次使用相同的输入数据运行模型时,我都会得到不同的结果。每次我再次运行模型时,最佳设置都会更改 你能告诉我如何解决这个结果不稳定的问题吗 多谢各位 以下是脚本: def mean_absolute_percentage_error(y_true,y_pred): """Calculates MAPE given y_true and y_pred""" d

我的xgboost模型在使用贝叶斯优化进行调优时遇到了一个问题。每次使用相同的输入数据运行模型时,我都会得到不同的结果。每次我再次运行模型时,最佳设置都会更改

你能告诉我如何解决这个结果不稳定的问题吗

多谢各位

以下是脚本:

def mean_absolute_percentage_error(y_true,y_pred): 
    """Calculates MAPE given y_true and y_pred"""
    dat_ = pd.DataFrame()
    dat_["y_true"]=list(y_true)
    dat_["y_pred"] =y_pred
    dat_=dat_[dat_["y_true"]!=0]
    y_true, y_pred = np.array(dat_["y_true"]), np.array(dat_["y_pred"])
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def xgb_evaluate(max_depth, gamma, colsample_bytree,subsample,eta):
    params = {'eval_metric': "rmse",
          'max_depth': int(max_depth),
          'subsample': subsample,
          'eta':eta,
          'gamma': gamma,
          'colsample_bytree': colsample_bytree}
    # Used around 1000 boosting rounds in the full model
    #cv_result = xgb.cv(params, dtrain, num_boost_round=100, nfold=3)    

    model = XGBRegressor( **params)
    model2 = model.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)],
          verbose=False, early_stopping_rounds=20)

    # Bayesian optimization only knows how to maximize, not minimize, so return the negative RMSE
    return -(1.0 * mean_absolute_percentage_error(y_test,model2.predict(X_test)))


X_train = pd.DataFrame(data_train.drop(['Pieces'], axis=1))
y_train = data_train['Pieces']

X_test = pd.DataFrame(data_test.drop(['Pieces'], axis=1))
y_test = data_test['Pieces']
    
xgb_bo =None
xgb_bo = BayesianOptimization(xgb_evaluate, {'max_depth': (3, 200), 
                                         'gamma': (0, 1),
                                         'colsample_bytree': (0.3, 0.9),
                                         "subsample":(0.5,0.9),
                                         "eta": (0.1,0.5)})
# Use the expected improvement acquisition function to handle negative numbers
# Optimally needs quite a few more initiation points and number of iterations
xgb_bo.maximize(init_points=3, n_iter=100, acq='ei')
    
params = xgb_bo.max['params']
params['max_depth'] = int(params['max_depth'])

#model2 = xgb.train(params, dtrain, num_boost_round=250)
model_impl = XGBRegressor(**params)
model_Fitted = model_impl.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)],
        verbose=False, early_stopping_rounds=10)

可能有太多相关功能。由于XGBoost并非每次都遵循seed,而且您可能有相关(或过多)的解释特性,因此BayesOpt最终可能会创建不同的模型。我用的是optuna,这就是我所拥有的。奇怪的是,简单的平均加密(有或没有不同的功能集)效果非常好。如果你有一套你认为很好的功能集,使用核心功能和可选的好功能,然后平均结果,这是有效的。