Apache spark 如何理解Xgboost模型转储
注意到spark xgboost没有像Python API中那样的APIApache spark 如何理解Xgboost模型转储,apache-spark,xgboost,xgbclassifier,Apache Spark,Xgboost,Xgbclassifier,注意到spark xgboost没有像Python API中那样的APItrees\u to\u dataframe(),我试图解析getModelDump结果,但我对其格式、哪些字段表示什么等感到困惑 // train xgb_model in spark version of xgboost scala> xgb_model res18: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_89286dd04
trees\u to\u dataframe()
,我试图解析getModelDump
结果,但我对其格式、哪些字段表示什么等感到困惑
// train xgb_model in spark version of xgboost
scala> xgb_model
res18: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_89286dd04aa3
scala> xgb_model.nativeBooster.getModelDump(null, true);
res19: Array[String] =
Array("0:[f1<53] yes=1,no=2,missing=2,gain=58047.7812,cover=336165
1:[f3<53.9500008] yes=3,no=4,missing=3,gain=24677.3848,cover=63748.25
3:leaf=-0.0531237721,cover=53626.5
4:leaf=0.031994272,cover=10121.75
2:[f16<1.66669905] yes=5,no=6,missing=6,gain=10181.9785,cover=272416.75
5:leaf=-0.0937986076,cover=268367
6:leaf=-0.0139159411,cover=4049.75
", "0:[f1<51] yes=1,no=2,missing=2,gain=52816.4062,cover=336097.594
1:[f8<369.570007] yes=3,no=4,missing=4,gain=22681.3555,cover=60529.668
3:leaf=-0.0121749714,cover=37363.5625
4:leaf=-0.0751453713,cover=23166.1055
2:[f16<1.67979908] yes=5,no=6,missing=6,gain=10274.8359,cover=275567.906
5:leaf=-0.089068912,cover=271300.188
6:leaf=-0.0108754979,cover=4267.74268
", "0:[f1<56] yes=1,no=2,missing=2,gain=4887...
scala> res19.size
res20: Int = 200
我认为res19.size
=200是合理的,因为我已经将n_估计值设置为200。我对res19
中的每个字符串感到困惑,它们的格式如下:
我认为f2
必须表示某个特定的特性,但是如何才能找到示例特性名称?另外,0
、1
、2
代表什么?是=3,否=4是什么意思
提前谢谢
0:[f2<0.380098999] yes=1,no=2,missing=1,gain=732.850342,cover=72529.7266
1:[f47<31.9999981] yes=3,no=4,missing=3,gain=753.887451,cover=67352.3594
3:leaf=4.21585646e-05,cover=63820.7422
4:leaf=0.0237709191,cover=3531.61987
2:[f4<1050] yes=5,no=6,missing=6,gain=410.277802,cover=5177.3667
5:leaf=0.00518732425,cover=1373.32422
6:leaf=-0.0266880095,cover=3804.04224
0:[F2我投票结束这个问题,因为这个问题更适合数据科学堆栈交换主题
0:[f2<0.380098999] yes=1,no=2,missing=1,gain=732.850342,cover=72529.7266
1:[f47<31.9999981] yes=3,no=4,missing=3,gain=753.887451,cover=67352.3594
3:leaf=4.21585646e-05,cover=63820.7422
4:leaf=0.0237709191,cover=3531.61987
2:[f4<1050] yes=5,no=6,missing=6,gain=410.277802,cover=5177.3667
5:leaf=0.00518732425,cover=1373.32422
6:leaf=-0.0266880095,cover=3804.04224