Scala 如何从Spark MLlib计算的原始分数推断预测类标签_Scala_Apache Spark_Apache Spark Mllib_Apache Spark Ml

Scala 如何从Spark MLlib计算的原始分数推断预测类标签

scala apache-spark

Scala 如何从Spark MLlib计算的原始分数推断预测类标签,scala,apache-spark,apache-spark-mllib,apache-spark-ml,Scala,Apache Spark,Apache Spark Mllib,Apache Spark Ml,阅读下面的Spark文档二元分类预测的示例代码段如下： val model = new LogisticRegressionModel( Vectors.dense(weightsWithIntercept.toArray.slice(0,weightsWithIntercept.size - 1)), weightsWithIntercept(weightsWithIntercept.size - 1)) // Clear the default thre

阅读下面的Spark文档

二元分类预测的示例代码段如下：

    val model = new LogisticRegressionModel(
    Vectors.dense(weightsWithIntercept.toArray.slice(0,weightsWithIntercept.size - 1)),
    weightsWithIntercept(weightsWithIntercept.size - 1))

    // Clear the default threshold.
    model.clearThreshold()

   // Compute raw scores on the test set.
   val scoreAndLabels = test.map { point =>
   val score = model.predict(point.features)
   (score, point.label)

如您所见，model.prediction（point.features）返回原始分数，这是到超平面分离的距离

我的问题是:

（1）根据以上计算的原始分数，我如何知道预测类标签是0还是1

或

（2）在这种二元分类情况下，如何从上述计算的原始分数推断预测的类别标签（0或1）？

默认情况下，阈值为0.5，因此当使用

BinaryClassificationMetrics

时，如果分数<0.5，则类别标签为0，如果分数更高，则类别标签为1。所以你也可以根据分数推断出班级

如何获得算法确定的计算ROC曲线的最佳阈值？在metrics对象上，您可以通过阈值获得各种指标的得分。。例如：val f1Score=metrics.fMeasureByThreshold。然后，您可以在此处迭代查找最佳阈值详细信息：