Python H2O本地服务器意外死亡

Python H2O本地服务器意外死亡,python,h2o,automl,h2o.ai,Python,H2o,Automl,H2o.ai,我在复制文件时遇到问题。初始化h2o本地服务器(h2o.init())后,我得到以下输出,听起来很正确: Checking whether there is an H2O instance running at http://localhost:54321 ..... not found. Attempting to start a local H2O server... Java Version: java version "1.8.0_181"; Java(TM) S

我在复制文件时遇到问题。初始化h2o本地服务器(
h2o.init()
)后,我得到以下输出,听起来很正确:

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_181"; Java(TM) SE Runtime Environment (build 1.8.0_181-b13); Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
  Starting server from /home/cdsw/.local/lib/python3.8/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp3nh32di4
  JVM stdout: /tmp/tmp3nh32di4/h2o_cdsw_started_from_python.out
  JVM stderr: /tmp/tmp3nh32di4/h2o_cdsw_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O_cluster_uptime: 01 secs
H2O_cluster_timezone:   Etc/UTC
H2O_data_parsing_timezone:  UTC
H2O_cluster_version:    3.32.1.3
H2O_cluster_version_age:    14 days, 20 hours and 29 minutes
H2O_cluster_name:   H2O_from_python_cdsw_cpcrap
H2O_cluster_total_nodes:    1
H2O_cluster_free_memory:    13.98 Gb
H2O_cluster_total_cores:    32
H2O_cluster_allowed_cores:  32
H2O_cluster_status: accepting new members, healthy
H2O_connection_url: http://127.0.0.1:54321
H2O_connection_proxy:   {"http": null, "https": null}
H2O_internal_security:  False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.8.5 final
接下来,我导入教程指定的数据集:

# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()
最后,我训练我的AutoML模型:

# Run AutoML for 20 base models (limited to 1 hour max runtime by default)
aml = H2OAutoML(max_models=20, seed=1)
aml.train(x=x, y=y, training_frame=train)
当它崩溃时,会显示以下消息:

AutoML progress: |██Failed polling AutoML progress log: Local server has died unexpectedly. RIP.
Job request failed Local server has died unexpectedly. RIP., will retry after 3s.
Job request failed Local server has died unexpectedly. RIP., will retry after 3s.
我尝试了不同的数据集,包括一些样本,以防内存问题,但没有效果。错误占了上风

有人知道我该怎么做才能解决这个问题吗

非常感谢


问候。

我想我能解决它。在使用htop命令进行了一些监视之后,我认为问题实际上是内存问题。我重新启动了h2o,将内存限制为1GB和2个线程(可能这不是严格必要的),并且我能够运行一切正常,就像看起来的那样

h2o.init(max_mem_size="1G", nthreads=2)

希望它能帮助那些遇到同样问题的人。

您好,AutoML之外的服务器也在消亡吗?您可以从这里尝试一个简单的GBM示例来检查:非常感谢Erin的输入。我确实试过你建议的例子,它确实运行良好。看起来它与AutoML进程有关?更新后,我再次尝试运行它,但它开始像AutoML一样失败,并显示相同的消息。有点不稳定,不确定关于此错误的外观发生了什么。需要注意的是XGBoost(包含在AutoML中)使用H2O集群之外的内存。因此,当您将机器上的所有可用内存与H2O集群一起使用时,XGBoost将无法使用,这可能是导致此问题的原因。我们打开了一张通知单来警告用户,这将减少将来的混淆:另外,您可能可以将nthreads=-1(使用所有线程),并将H2O集群的内存大小增加到机器上总可用RAM的2/3左右。我认为您当前的配置有点保守,但这是一个很好的方法来确定问题并解决它!