不稳定结果xgboost模型调整与贝叶斯优化[Python]_Python_Pandas_Machine Learning_Xgboost_Forecasting

不稳定结果xgboost模型调整与贝叶斯优化[Python]

python pandas machine-learning

不稳定结果xgboost模型调整与贝叶斯优化[Python],python,pandas,machine-learning,xgboost,forecasting,Python,Pandas,Machine Learning,Xgboost,Forecasting,我的xgboost模型在使用贝叶斯优化进行调优时遇到了一个问题。每次使用相同的输入数据运行模型时，我都会得到不同的结果。每次我再次运行模型时，最佳设置都会更改你能告诉我如何解决这个结果不稳定的问题吗多谢各位以下是脚本： def mean_absolute_percentage_error(y_true,y_pred): """Calculates MAPE given y_true and y_pred""" d

我的xgboost模型在使用贝叶斯优化进行调优时遇到了一个问题。每次使用相同的输入数据运行模型时，我都会得到不同的结果。每次我再次运行模型时，最佳设置都会更改

你能告诉我如何解决这个结果不稳定的问题吗

多谢各位

以下是脚本：

def mean_absolute_percentage_error(y_true,y_pred): 
    """Calculates MAPE given y_true and y_pred"""
    dat_ = pd.DataFrame()
    dat_["y_true"]=list(y_true)
    dat_["y_pred"] =y_pred
    dat_=dat_[dat_["y_true"]!=0]
    y_true, y_pred = np.array(dat_["y_true"]), np.array(dat_["y_pred"])
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def xgb_evaluate(max_depth, gamma, colsample_bytree,subsample,eta):
    params = {'eval_metric': "rmse",
          'max_depth': int(max_depth),
          'subsample': subsample,
          'eta':eta,
          'gamma': gamma,
          'colsample_bytree': colsample_bytree}
    # Used around 1000 boosting rounds in the full model
    #cv_result = xgb.cv(params, dtrain, num_boost_round=100, nfold=3)    

    model = XGBRegressor( **params)
    model2 = model.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)],
          verbose=False, early_stopping_rounds=20)

    # Bayesian optimization only knows how to maximize, not minimize, so return the negative RMSE
    return -(1.0 * mean_absolute_percentage_error(y_test,model2.predict(X_test)))


X_train = pd.DataFrame(data_train.drop(['Pieces'], axis=1))
y_train = data_train['Pieces']

X_test = pd.DataFrame(data_test.drop(['Pieces'], axis=1))
y_test = data_test['Pieces']
    
xgb_bo =None
xgb_bo = BayesianOptimization(xgb_evaluate, {'max_depth': (3, 200), 
                                         'gamma': (0, 1),
                                         'colsample_bytree': (0.3, 0.9),
                                         "subsample":(0.5,0.9),
                                         "eta": (0.1,0.5)})
# Use the expected improvement acquisition function to handle negative numbers
# Optimally needs quite a few more initiation points and number of iterations
xgb_bo.maximize(init_points=3, n_iter=100, acq='ei')
    
params = xgb_bo.max['params']
params['max_depth'] = int(params['max_depth'])

#model2 = xgb.train(params, dtrain, num_boost_round=250)
model_impl = XGBRegressor(**params)
model_Fitted = model_impl.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)],
        verbose=False, early_stopping_rounds=10)

可能有太多相关功能。由于XGBoost并非每次都遵循seed，而且您可能有相关（或过多）的解释特性，因此BayesOpt最终可能会创建不同的模型。我用的是optuna，这就是我所拥有的。奇怪的是，简单的平均加密（有或没有不同的功能集）效果非常好。如果你有一套你认为很好的功能集，使用核心功能和可选的好功能，然后平均结果，这是有效的。