h2o python的一个_hot_显式参数引发错误

h2o python的一个_hot_显式参数引发错误,python,python-3.x,h2o,Python,Python 3.x,H2o,在使用python h2o库在h2o v3.10中训练模型时,我在尝试将one\u hot\u explicit设置为参数选项时看到一个错误 encoding = "enum" gbm = H2OGradientBoostingEstimator( categorical_encoding = encoding) gbm.train(x, y,train_h2o_df,test_h2o_df) 工作正常,模型使用enumcategorical\u编码,但在以下情况下: en

在使用python h2o库在h2o v3.10中训练模型时,我在尝试将
one\u hot\u explicit
设置为参数选项时看到一个错误

encoding = "enum"

gbm = H2OGradientBoostingEstimator(
        categorical_encoding = encoding)

gbm.train(x, y,train_h2o_df,test_h2o_df)
工作正常,模型使用
enum
categorical\u编码,但在以下情况下:

encoding = "one_hot_explicit"

出现以下错误:

gbm Model Build progress: | (failed)
....
OSError: Job with key $03017f00000132d4ffffffff$_bde8fcb4777df7e0be1199bf590a47f9 failed with an exception: java.lang.AssertionError
stacktrace: 
java.lang.AssertionError
at hex.ModelBuilder.init(ModelBuilder.java:958)
at hex.tree.SharedTree.init(SharedTree.java:78)
at hex.tree.gbm.GBM.init(GBM.java:57)
at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:159)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1203)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

我是否缺少某些依赖项,或者这是一个bug

虽然您可能希望更新到H2O的最新稳定版本,但您的编码选择应该有效。下面是一段代码片段,您可以运行它,并测试它是否适合您。如果它有效,那么您可以尝试找出以前的代码与下面的示例之间的差异

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()

# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()

# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "DayOfWeek", "Month", "Distance"]
response = "IsDepDelayed"

# split into train and validation sets
train, valid= airlines.split_frame(ratios = [.8], seed = 1234)

# try using the `categorical_encoding` parameter:
encoding = "one_hot_explicit"

# initialize the estimator
airlines_gbm = H2OGradientBoostingEstimator(categorical_encoding = encoding, seed =1234)

# then train the model
airlines_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the auc for the validation set
airlines_gbm.auc(valid=True)

谢谢Lauren,我将尝试使用测试数据集的样板代码,看看是否可以缩小设置中的差异。由于我无法控制的原因,我目前只能使用3.10.x,但这只是暂时的。期待着尝试AutoML模块。。。
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()

# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()

# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "DayOfWeek", "Month", "Distance"]
response = "IsDepDelayed"

# split into train and validation sets
train, valid= airlines.split_frame(ratios = [.8], seed = 1234)

# try using the `categorical_encoding` parameter:
encoding = "one_hot_explicit"

# initialize the estimator
airlines_gbm = H2OGradientBoostingEstimator(categorical_encoding = encoding, seed =1234)

# then train the model
airlines_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the auc for the validation set
airlines_gbm.auc(valid=True)