Python 3.x Python xgboost：有些树只包含一个叶节点（没有拆分）_Python 3.x_Xgboost

Python 3.x Python xgboost：有些树只包含一个叶节点（没有拆分）

python-3.x

Python 3.x Python xgboost：有些树只包含一个叶节点（没有拆分）,python-3.x,xgboost,Python 3.x,Xgboost,我使用python3.6.3中的xgboost0.6包（在macossierra10.12.6上运行）安装了一个极端梯度增强模型。当我检查树的转储时，我注意到许多树不包含任何拆分-它们只是单叶节点： params={'colsample_bylevel':0.25,'gamma':3,'learning_rate':0.2,'max_depth':2,'n_estimators':250,'reg_alpha':0.5,'reg_lambda':3,'subsample':0.5} model

我使用

python3.6.3

中的

xgboost0.6

包（在

macossierra10.12.6

上运行）安装了一个极端梯度增强模型。当我检查树的转储时，我注意到许多树不包含任何拆分-它们只是单叶节点：

params={'colsample_bylevel':0.25,'gamma':3,'learning_rate':0.2,'max_depth':2,'n_estimators':250,'reg_alpha':0.5,'reg_lambda':3,'subsample':0.5}
model = XGBClassifier(**params, seed=12345, nthread=1, silent=True)
model.fit(X, y) # X and y are numpy arrays (13 predictors and an outcome)

tree_dump = model.get_booster().get_dump()
tree_dump[0]
Out[765]: '0:leaf=-0.387394\n'
tree_dump[1]
Out[766]: '0:leaf=-0.322484\n'
tree_dump[2]
Out[767]: '0:leaf=-0.285089\n'
tree_dump[3]
Out[768]: '0:leaf=-0.26167\n'
tree_dump[4]
Out[769]: '0:leaf=-0.240752\n'
tree_dump[5]
Out[770]: '0:leaf=-0.226565\n'
tree_dump[6]
Out[771]: '0:[f0<6.28879] yes=1,no=2,missing=1\n\t1:[f5<6.08075] yes=3,no=4,missing=3\n\t\t3:leaf=-0.21372\n\t\t4:leaf=0.00931895\n\t2:leaf=-0\n'

params={'colsample\u bylevel'：0.25，'gamma'：3，'learning\u rate'：0.2，'max\u depth'：2，'n\u估计量]：250，'reg\u alpha'：0.5，'reg\u lambda'：3，'subsample'：0.5}
model=XGBClassifier（**参数，seed=12345，nthread=1，silent=True）
模型拟合（X，y）#X和y是numpy数组（13个预测值和一个结果）
tree\u dump=model.get\u booster（）.get\u dump（）
树_转储[0]
Out[765]：“0:leaf=-0.387394\n”
树形垃圾场[1]
Out[766]：“0:leaf=-0.322484\n”
树木倾倒区[2]
Out[767]：“0:leaf=-0.285089\n”
树木倾倒区[3]
Out[768]：“0:leaf=-0.26167\n”
树形垃圾场[4]
Out[769]：“0:leaf=-0.240752\n”
树木倾倒区[5]
Out[770]：'0:leaf=-0.226565\n'
树木倾倒区[6]
Out[771]：'0:[f0我想我现在可以回答我自己的问题了……考虑到我使用的超参数值，这种行为是可以预料的
对于13个预测器和colsample\u bylevel
=0.25，每个树只对3个预测器进行采样，这些预测器的重要性可能不足以导致拆分。设置colsample\u bylevel
=1.0会增加拆分树的数量，但仍有一些仅包含一个叶节点
参数gamma
和min\u child\u weight
控制叶节点的数量。设置colsample\u bylevel
=1.0、gamma
=0和min\u child\u weight
=0，250棵树中除了1棵之外，所有树现在都包含拆分。你从哪里来的这些超参数……它们对我来说都很糟糕。@moshe我通过10倍交叉验证选择了它们。为什么你说它们很糟糕？“colsample\u bylevel”：0.25我通常会选择0.75，但这取决于你有多少功能。“学习率”：0.2可能会降低，比如0.1或0.05，并使用更多的n\u估计器和“提前停止”.关于'reg_alpha'：0.5，'reg_lambda'：3我知道的不多。我不会在我的models@EranMoshe谢谢你的建议！我确实尝试了你建议的超参数值（更高的列采样、更低的学习率和更多的树），但这些是基于交叉验证AUC的最佳值。