为什么我在Xgboost和x27之间得到不同的预测结果；什么是python和CLI版本？_Python_Shell_Machine Learning_Xgboost

为什么我在Xgboost和x27之间得到不同的预测结果；什么是python和CLI版本？

python shell machine-learning

为什么我在Xgboost和x27之间得到不同的预测结果；什么是python和CLI版本？,python,shell,machine-learning,xgboost,Python,Shell,Machine Learning,Xgboost,最近，当我尝试使用xgboost的CLI版本来预测输入时，我发现它的结果与python版本大不相同对于python，我预测如下： data = xgb.DMatrix(X) bst = xgb.Booster() bst.load_model(modelfile) leafindex = bst.predict(data, pred_leaf=False) 并按如下方式使用CLI： ./xgboost xgboost.conf task=pred model_in=../models/gb.

最近，当我尝试使用xgboost的CLI版本来预测输入时，我发现它的结果与python版本大不相同

对于python，我预测如下：

data = xgb.DMatrix(X)
bst = xgb.Booster()
bst.load_model(modelfile)
leafindex = bst.predict(data, pred_leaf=False)

并按如下方式使用CLI：

./xgboost xgboost.conf task=pred model_in=../models/gb.model_depth4_150trees_2016-07-02

这是我的配置文件：

# General Parameters, see comment for each definition
# can be gbtree or gblinear
booster = gbtree
# choose logistic regression loss function for binary classification
objective = binary:logistic

# Tree Booster Parameters
# step size shrinkage
eta = 1.0
# minimum loss reduction required to make a further partition
gamma = 1.0
# minimum sum of instance weight(hessian) needed in a child
min_child_weight = 1
# maximum depth of a tree
max_depth = 4

# Task Parameters
# the number of round to do boosting
num_round = 150
# 0 means do not save any model except the final round model
save_period = 0
# The path of training data
data = "agaricus.txt.train"
# The path of validation data, used to monitor training process, here [test] sets name of the validation set
eval[test] = "agaricus.txt.test"
# The path of test data
test:data = "data"

Python输入数据格式：

8       201     1       2       26      10000.0 8589934592      32      0       0       1000000.0       0
2       3       1       1       50      10000.0 8589934592      32      524288  8       1000000.0       0
2       3       2       2       19      10000.0 8589934592      512     512     8       1000000.0       0
4       24      1       1       23      10000.0 8589934592      8192    0       0       1000000.0       0
1       2       2       3       50      10000.0 8589934592      32      512     8       1000000.0       0
21      1       2       3       48      10000.0 8589934592      32      512     8       1000000.0       0
5       12      1       2       42      10000.0 137438953472    32      512     8       1000000.0       0
2       11      2       2       86      10000.0 0       0       0       0       1000000.0       0
1       10      2       8       99      10000.0 8589934592      32      65536   8       1000000.0       0
2       11      2       8       97      10000.0 8589934592      32      65536   8       1000000.0       0
4       5       1       1       4       10000.0 1073741824      32      0       0       1000000.0       0
...

CLI输入格式：

0 1:8 2:201 3:1 4:2 5:26 6:10000.0 7:8589934592 8:32 9:0 10:0 11:1000000.0 12:0
0 1:2 2:3 3:1 4:1 5:50 6:10000.0 7:8589934592 8:32 9:524288 10:8 11:1000000.0 12:0
0 1:2 2:3 3:2 4:2 5:19 6:10000.0 7:8589934592 8:512 9:512 10:8 11:1000000.0 12:0
0 1:4 2:24 3:1 4:1 5:23 6:10000.0 7:8589934592 8:8192 9:0 10:0 11:1000000.0 12:0
0 1:1 2:2 3:2 4:3 5:50 6:10000.0 7:8589934592 8:32 9:512 10:8 11:1000000.0 12:0
0 1:21 2:1 3:2 4:3 5:48 6:10000.0 7:8589934592 8:32 9:512 10:8 11:1000000.0 12:0
0 1:5 2:12 3:1 4:2 5:42 6:10000.0 7:137438953472 8:32 9:512 10:8 11:1000000.0 12:0
...

python版本的结果：

0.138298
0.00288907
0.0114002
0.0477143
0.00185653
0.00455882
0.000503023
0.000817317
0.00332584
0.00178041
0.0666806
0.03003
...

CLI版本：

0.000100178
0.201246
0.449562
0.0506984
0.451953
0.389587
0.034748
0.992795
0.00348666
0.00661674
0.0186095
0.0260032
0.996163
0.259104
0.552341
0.972762
...

我使用了相同的模型文件，CLI版本的值比0.5高出40%，这与我们的预期不符。

解决了

python和cli训练的模型文件似乎不能相互使用。当使用每个人自己训练的模型时，结果仍然有一些差异，如下所示：

by python       by cli
0.169874        0.222063
0.999997        0.999554
0.00454239      0.000879413
0.0140518       0.00824018
0.0148116       0.00859811
0.000353913     0.000880754
0.0207635       0.019058
0.000916939     0.000579058
0.00109237      0.000286653
0.00247333      0.00272115
0.0650928       0.0319875
0.946068        0.965301
0.997704        0.999615
0.987644        0.991665
0.997242        0.984403
0.948666        0.909703
0.000781899     0.00079996
0.000319449     0.000138011
0.0400793       0.164134
0.00216081      0.000781626
0.023867        0.0323994

解决了

python和cli训练的模型文件似乎不能相互使用。当使用每个人自己训练的模型时，结果仍然有一些差异，如下所示：

by python       by cli
0.169874        0.222063
0.999997        0.999554
0.00454239      0.000879413
0.0140518       0.00824018
0.0148116       0.00859811
0.000353913     0.000880754
0.0207635       0.019058
0.000916939     0.000579058
0.00109237      0.000286653
0.00247333      0.00272115
0.0650928       0.0319875
0.946068        0.965301
0.997704        0.999615
0.987644        0.991665
0.997242        0.984403
0.948666        0.909703
0.000781899     0.00079996
0.000319449     0.000138011
0.0400793       0.164134
0.00216081      0.000781626
0.023867        0.0323994