Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何理解Xgboost模型转储_Apache Spark_Xgboost_Xgbclassifier - Fatal编程技术网

Apache spark 如何理解Xgboost模型转储

Apache spark 如何理解Xgboost模型转储,apache-spark,xgboost,xgbclassifier,Apache Spark,Xgboost,Xgbclassifier,注意到spark xgboost没有像Python API中那样的APItrees\u to\u dataframe(),我试图解析getModelDump结果,但我对其格式、哪些字段表示什么等感到困惑 // train xgb_model in spark version of xgboost scala> xgb_model res18: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_89286dd04

注意到spark xgboost没有像Python API中那样的API
trees\u to\u dataframe()
,我试图解析
getModelDump
结果,但我对其格式、哪些字段表示什么等感到困惑

 // train xgb_model in spark version of xgboost
scala> xgb_model
res18: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_89286dd04aa3

scala> xgb_model.nativeBooster.getModelDump(null, true);
res19: Array[String] =
Array("0:[f1<53] yes=1,no=2,missing=2,gain=58047.7812,cover=336165
        1:[f3<53.9500008] yes=3,no=4,missing=3,gain=24677.3848,cover=63748.25
                3:leaf=-0.0531237721,cover=53626.5
                4:leaf=0.031994272,cover=10121.75
        2:[f16<1.66669905] yes=5,no=6,missing=6,gain=10181.9785,cover=272416.75
                5:leaf=-0.0937986076,cover=268367
                6:leaf=-0.0139159411,cover=4049.75
", "0:[f1<51] yes=1,no=2,missing=2,gain=52816.4062,cover=336097.594
        1:[f8<369.570007] yes=3,no=4,missing=4,gain=22681.3555,cover=60529.668
                3:leaf=-0.0121749714,cover=37363.5625
                4:leaf=-0.0751453713,cover=23166.1055
        2:[f16<1.67979908] yes=5,no=6,missing=6,gain=10274.8359,cover=275567.906
                5:leaf=-0.089068912,cover=271300.188
                6:leaf=-0.0108754979,cover=4267.74268
", "0:[f1<56] yes=1,no=2,missing=2,gain=4887...

scala> res19.size
res20: Int = 200
我认为
res19.size
=200是合理的,因为我已经将
n_估计值设置为200。我对
res19
中的每个字符串感到困惑,它们的格式如下: 我认为
f2
必须表示某个特定的特性,但是如何才能找到示例特性名称?另外,
0
1
2
代表什么?是=3,否=4是什么意思

提前谢谢

0:[f2<0.380098999] yes=1,no=2,missing=1,gain=732.850342,cover=72529.7266
        1:[f47<31.9999981] yes=3,no=4,missing=3,gain=753.887451,cover=67352.3594
                3:leaf=4.21585646e-05,cover=63820.7422
                4:leaf=0.0237709191,cover=3531.61987
        2:[f4<1050] yes=5,no=6,missing=6,gain=410.277802,cover=5177.3667
                5:leaf=0.00518732425,cover=1373.32422
                6:leaf=-0.0266880095,cover=3804.04224


0:[F2我投票结束这个问题,因为这个问题更适合数据科学堆栈交换主题
0:[f2<0.380098999] yes=1,no=2,missing=1,gain=732.850342,cover=72529.7266
        1:[f47<31.9999981] yes=3,no=4,missing=3,gain=753.887451,cover=67352.3594
                3:leaf=4.21585646e-05,cover=63820.7422
                4:leaf=0.0237709191,cover=3531.61987
        2:[f4<1050] yes=5,no=6,missing=6,gain=410.277802,cover=5177.3667
                5:leaf=0.00518732425,cover=1373.32422
                6:leaf=-0.0266880095,cover=3804.04224