使用h2o.grid微调gbm模型重量列问题
我正在使用h2o.grid超参数搜索函数来微调gbm模型。h2o gbm允许添加一个重量列来指定每次观察的重量。但是,当我尝试在h2o.grid中添加该值时,它总是错误地指出非法参数/缺少值,即使权重体积已填充。 有没有人有类似的经历?谢谢 超参数:最大深度,20 [2017-04-12 13:10:05]失败详细信息:GBM模型的非法参数:深度网格模型11。详细信息:ERRR on field:\u weights\u columns:权重不能缺少值。 字段出错:\权重\列:权重不能缺少值 ============================使用h2o.grid微调gbm模型重量列问题,r,grid,h2o,R,Grid,H2o,我正在使用h2o.grid超参数搜索函数来微调gbm模型。h2o gbm允许添加一个重量列来指定每次观察的重量。但是,当我尝试在h2o.grid中添加该值时,它总是错误地指出非法参数/缺少值,即使权重体积已填充。 有没有人有类似的经历?谢谢 超参数:最大深度,20 [2017-04-12 13:10:05]失败详细信息:GBM模型的非法参数:深度网格模型11。详细信息:ERRR on field:\u weights\u columns:权重不能缺少值。 字段出错:\权重\列:权重不能缺少值 =
hyper_params = list( max_depth = c(4,6,8,12,16,20) ) ##faster for larger datasets
grid <- h2o.grid(
## hyper parameters
hyper_params = hyper_params,
## full Cartesian hyper-parameter search
search_criteria = list(strategy = "Cartesian"), ## default is Cartesian
## which algorithm to run
algorithm="gbm",
## identifier for the grid, to later retrieve it
grid_id="depth_grid",
## standard model parameters
x = X, #predictors,
y = Y, #response,
training_frame = datadev, #train,
validation_frame = dataval, #valid,
**weights_column = "Adj_Bias_correction",**
## more trees is better if the learning rate is small enough
## here, use "more than enough" trees - we have early stopping
ntrees = 10000,
## smaller learning rate is better
## since we have learning_rate_annealing, we can afford to start with a bigger learning rate
learn_rate = 0.05,
## learning rate annealing: learning_rate shrinks by 1% after every tree
## (use 1.00 to disable, but then lower the learning_rate)
learn_rate_annealing = 0.99,
## sample 80% of rows per tree
sample_rate = 0.8,
## sample 80% of columns per split
col_sample_rate = 0.8,
## fix a random number generator seed for reproducibility
seed = 1234,
## early stopping once the validation AUC doesn't improve by at least 0.01% for 5 consecutive scoring events
stopping_rounds = 5, stopping_tolerance = 1e-4, stopping_metric = "AUC",
## score every 10 trees to make early stopping reproducible (it depends on the scoring interval)
score_tree_interval = 10
)
## by default, display the grid search results sorted by increasing logloss (since this is a classification task)
grid
hyper_params=list(最大深度=c(4,6,8,12,16,20))35;对于较大的数据集,速度更快
网格“权重”列中是否缺少值?你是在数据集中指定了一列作为权重列还是创建了权重列劳伦:谢谢你的建议。我回去仔细检查了一下这一栏。它确实有4行缺少值。我在HDFS中连接了文件,不知何故系统自动生成了缺少值的行。gbm我看到h2o型号性能指标包含AUC、logloss等。有一个型号性能指标称为lift_top_group,它是升到最高的十分位吗?用户还可以指定h2o输出增益图的波段,如top 5%、5%-10%、10%-15%。。。。。。。在h2o中,增益提升(gbm)?谢谢答复如下: