使用h2o.grid微调gbm模型重量列问题_R_Grid_H2o

使用h2o.grid微调gbm模型重量列问题

r grid

使用h2o.grid微调gbm模型重量列问题,r,grid,h2o,R,Grid,H2o,我正在使用h2o.grid超参数搜索函数来微调gbm模型。h2o gbm允许添加一个重量列来指定每次观察的重量。但是，当我尝试在h2o.grid中添加该值时，它总是错误地指出非法参数/缺少值，即使权重体积已填充。有没有人有类似的经历？谢谢超参数：最大深度，20 [2017-04-12 13:10:05]失败详细信息：GBM模型的非法参数：深度网格模型11。详细信息：ERRR on field:\u weights\u columns:权重不能缺少值。字段出错：\权重\列：权重不能缺少值 =

我正在使用h2o.grid超参数搜索函数来微调gbm模型。h2o gbm允许添加一个重量列来指定每次观察的重量。但是，当我尝试在h2o.grid中添加该值时，它总是错误地指出非法参数/缺少值，即使权重体积已填充。有没有人有类似的经历？谢谢

超参数：最大深度，20 [2017-04-12 13:10:05]失败详细信息：GBM模型的非法参数：深度网格模型11。详细信息：ERRR on field:\u weights\u columns:权重不能缺少值。字段出错：\权重\列：权重不能缺少值

============================

hyper_params = list( max_depth = c(4,6,8,12,16,20) ) ##faster for larger datasets

grid <- h2o.grid(
  ## hyper parameters
  hyper_params = hyper_params,

  ## full Cartesian hyper-parameter search
  search_criteria = list(strategy = "Cartesian"),  ## default is Cartesian

  ## which algorithm to run
  algorithm="gbm",

  ## identifier for the grid, to later retrieve it
  grid_id="depth_grid",

  ## standard model parameters
  x = X,  #predictors, 
  y = Y,  #response, 
  training_frame = datadev, #train, 
  validation_frame = dataval, #valid,
    **weights_column = "Adj_Bias_correction",**

  ## more trees is better if the learning rate is small enough 
  ## here, use "more than enough" trees - we have early stopping
  ntrees = 10000,                                                            

  ## smaller learning rate is better
  ## since we have learning_rate_annealing, we can afford to start with a bigger learning rate
  learn_rate = 0.05,                                                         

  ## learning rate annealing: learning_rate shrinks by 1% after every tree 
  ## (use 1.00 to disable, but then lower the learning_rate)
  learn_rate_annealing = 0.99,                                               

  ## sample 80% of rows per tree
  sample_rate = 0.8,                                                       

  ## sample 80% of columns per split
  col_sample_rate = 0.8, 

  ## fix a random number generator seed for reproducibility
  seed = 1234,                                                             

  ## early stopping once the validation AUC doesn't improve by at least 0.01% for 5 consecutive scoring events
  stopping_rounds = 5,   stopping_tolerance = 1e-4,   stopping_metric = "AUC", 

  ## score every 10 trees to make early stopping reproducible (it depends on the scoring interval)
  score_tree_interval = 10                                                
)

## by default, display the grid search results sorted by increasing logloss (since this is a classification task)
grid

hyper_params=list（最大深度=c（4,6,8,12,16,20））35;对于较大的数据集，速度更快
网格“权重”列中是否缺少值？你是在数据集中指定了一列作为权重列还是创建了权重列劳伦：谢谢你的建议。我回去仔细检查了一下这一栏。它确实有4行缺少值。我在HDFS中连接了文件，不知何故系统自动生成了缺少值的行。gbm我看到h2o型号性能指标包含AUC、logloss等。有一个型号性能指标称为lift_top_group，它是升到最高的十分位吗？用户还可以指定h2o输出增益图的波段，如top 5%、5%-10%、10%-15%。。。。。。。在h2o中，增益提升（gbm）？谢谢答复如下：