Python 贝叶斯优化在CatBoost中的应用_Python_Python 3.x_Pandas_Bayesian_Catboost

Python 贝叶斯优化在CatBoost中的应用

python python-3.x pandas

Python 贝叶斯优化在CatBoost中的应用,python,python-3.x,pandas,bayesian,catboost,Python,Python 3.x,Pandas,Bayesian,Catboost,这是我在CatBoost中应用BayesSearch的尝试： from catboost import CatBoostClassifier from skopt import BayesSearchCV from sklearn.model_selection import StratifiedKFold # Classifier bayes_cv_tuner = BayesSearchCV( estimator = CatBoostClassifier( silent=True ), s

这是我在CatBoost中应用BayesSearch的尝试：

from catboost import CatBoostClassifier
from skopt import BayesSearchCV
from sklearn.model_selection import StratifiedKFold


# Classifier
bayes_cv_tuner = BayesSearchCV(
estimator = CatBoostClassifier(
silent=True
),
search_spaces = {
'depth':(2,16),
'l2_leaf_reg':(1, 500),
'bagging_temperature':(1e-9, 1000, 'log-uniform'),
'border_count':(1,255),
'rsm':(0.01, 1.0, 'uniform'),
'random_strength':(1e-9, 10, 'log-uniform'),
'scale_pos_weight':(0.01, 1.0, 'uniform'),
},
scoring = 'roc_auc',
cv = StratifiedKFold(
n_splits=2,
shuffle=True,
random_state=72
),
n_jobs = 1,
n_iter = 100,
verbose = 1,
refit = True,
random_state = 72
)

跟踪结果：

def status_print(optim_result):
"""Status callback durring bayesian hyperparameter search"""

# Get all the models tested so far in DataFrame format
all_models = pd.DataFrame(bayes_cv_tuner.cv_results_)    

# Get current parameters and the best parameters    
best_params = pd.Series(bayes_cv_tuner.best_params_)
print('Model #{}\nBest ROC-AUC: {}\nBest params: {}\n'.format(
    len(all_models),
    np.round(bayes_cv_tuner.best_score_, 4),
    bayes_cv_tuner.best_params_
))

安装BayesCV

resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=status_print)

结果

前3次迭代工作正常，但随后我得到一个不间断的字符串：

Iteration with suspicious time 7.55 sec ignored in overall statistics.

Iteration with suspicious time 739 sec ignored in overall statistics.

（……）

你知道我哪里出错了吗/我该如何改进

敬礼，

根据CatBoost迄今为止记录的计时，skopt安排的一组实验中的一个迭代实际上花费了太长的时间来完成

如果您通过设置分类器的详细程度来探索何时会发生这种情况，并使用回调来探索skopt正在探索的参数组合，您可能会发现罪魁祸首很可能是深度参数：当CatBoost尝试测试更深的树时，skopt将减慢速度

您也可以尝试使用此自定义回调进行调试：

counter = 0
def onstep(res):
    global counter
    args = res.x
    x0 = res.x_iters
    y0 = res.func_vals
    print('Last eval: ', x0[-1], 
          ' - Score ', y0[-1])
    print('Current iter: ', counter, 
          ' - Score ', res.fun, 
          ' - Args: ', args)
    joblib.dump((x0, y0), 'checkpoint.pkl')
    counter = counter+1

您可以通过以下方式调用它：

resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=[onstep, status_print])

事实上，我在实验中注意到了与你相同的问题，随着深度的增加，复杂性以非线性方式增加，因此CatBoost需要更长的时间来完成其迭代。一个简单的解决方案是尝试搜索更简单的空间：

'depth':(2, 8)

通常深度8就足够了，无论如何，您可以先运行skopt，最大深度等于8，然后通过增加最大深度重新迭代。

您是否碰巧得到了解决方案？另外，如果你知道，你能告诉我在你的代码中我们在哪里可以定义分类变量索引列表吗？是的@popythesilor，降低了最大深度值，正如技巧所说的那样。另外，你如何在代码中给出分类特性的索引？你真的不在乎评估测试集上的度量吗？你所做的不会鼓励过度装修吗？哦，我明白了，交叉验证可以节省时间，没关系：）