Python 3.x 如何改变H2O GBM和DRF中的预测_Python 3.x_Random Forest_H2o_Gbm

Python 3.x 如何改变H2O GBM和DRF中的预测

python-3.x

Python 3.x 如何改变H2O GBM和DRF中的预测,python-3.x,random-forest,h2o,gbm,Python 3.x,Random Forest,H2o,Gbm,我正在用h2o DRF和GBM建立一个分类模型。我想改变预测的概率，这样，如果当前p0，您需要手动执行此操作。如果我们为predict（）方法提供了一个threshold参数，那就容易多了，所以我创建了一个票证，让它更直接一些请参见下面的Python示例，了解如何手动执行此操作 import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator h2o.init() # Import a sample binary o

我正在用h2o DRF和GBM建立一个分类模型。我想改变预测的概率，这样，如果当前p0，您需要手动执行此操作。如果我们为

predict（）

方法提供了一个

threshold

参数，那就容易多了，所以我创建了一个票证，让它更直接一些

请参见下面的Python示例，了解如何手动执行此操作

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()

# Import a sample binary outcome train/test set into H2O
train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

# Train and cross-validate a GBM
my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", seed=1)
my_gbm.train(x=x, y=y, training_frame=train)

# Predict on a test set using default threshold
pred = my_gbm.predict(test_data=test)

查看

pred

帧：

In [16]: pred.tail()
Out[16]:
  predict        p0        p1
---------  --------  --------
        1  0.484712  0.515288
        0  0.693893  0.306107
        1  0.319674  0.680326
        0  0.582344  0.417656
        1  0.471658  0.528342
        1  0.079922  0.920078
        1  0.150146  0.849854
        0  0.835288  0.164712
        0  0.639877  0.360123
        1  0.54377   0.45623

[10 rows x 3 columns]

In [24]: pred.tail()
Out[24]:
  predict        p0        p1
---------  --------  --------
        1  0.484712  0.515288
        1  0.693893  0.306107
        1  0.319674  0.680326
        1  0.582344  0.417656
        1  0.471658  0.528342
        1  0.079922  0.920078
        1  0.150146  0.849854
        0  0.835288  0.164712
        1  0.639877  0.360123
        1  0.54377   0.45623

[10 rows x 3 columns]

下面是如何手动创建所需的预测。有关如何切片帧的详细信息，请参见

现在你有了你想要的预测。您还可以用新的预测标签替换

“predict”

列

pred["predict"] = newpred

现在重新检查

pred

帧：

In [16]: pred.tail()
Out[16]:
  predict        p0        p1
---------  --------  --------
        1  0.484712  0.515288
        0  0.693893  0.306107
        1  0.319674  0.680326
        0  0.582344  0.417656
        1  0.471658  0.528342
        1  0.079922  0.920078
        1  0.150146  0.849854
        0  0.835288  0.164712
        0  0.639877  0.360123
        1  0.54377   0.45623

[10 rows x 3 columns]

In [24]: pred.tail()
Out[24]:
  predict        p0        p1
---------  --------  --------
        1  0.484712  0.515288
        1  0.693893  0.306107
        1  0.319674  0.680326
        1  0.582344  0.417656
        1  0.471658  0.528342
        1  0.079922  0.920078
        1  0.150146  0.849854
        0  0.835288  0.164712
        1  0.639877  0.360123
        1  0.54377   0.45623

[10 rows x 3 columns]

非常感谢。我已经手动实现了这个逻辑。正如你正确提到的，我们正在寻找一些属性。@Erin，我们可以得到列车数据而不是测试数据的预测概率吗。我们可以做pred=my\u gbm.predict（test\u data=train），而不仅仅是通过训练模型。Thanks@Neo我们保存训练指标，但不保存训练预测。因此，您必须使用

my\u gbm.predict（test\u data=train）

重新生成它们。