Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark逻辑回归与度量_Scala_Apache Spark - Fatal编程技术网

Scala Spark逻辑回归与度量

Scala Spark逻辑回归与度量,scala,apache-spark,Scala,Apache Spark,我想进行100次逻辑回归,随机分为测试和训练。然后,我想保存各个运行的性能指标,然后在以后使用它们来了解性能 for (index <- 1 to 100) { val splits = training_data.randomSplit(Array(0.90, 0.10), seed = index) val training = splits(0).cache() val test = splits(1) logrmodel = train_L

我想进行100次逻辑回归,随机分为测试和训练。然后,我想保存各个运行的性能指标,然后在以后使用它们来了解性能

    for (index <- 1 to 100) {
    val splits = training_data.randomSplit(Array(0.90, 0.10), seed = index)
    val training = splits(0).cache()
    val test = splits(1)

    logrmodel = train_LogisticRegression_model(training)
    performLogisticRegressionRuns(logrmodel, test, index)
    }

    spark.stop()
  }

  def performLogisticRegressionRuns(model: LogisticRegressionModel, test: RDD[LabeledPoint], iterationcount: Int) {
   private val sb = StringBuilder.newBuilder

 // Compute raw scores on the test set. Once I cle

    model.clearThreshold()

    val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
      val prediction = model.predict(features)
      (prediction, label)
    }


    val bcmetrics = new BinaryClassificationMetrics(predictionAndLabels)

    // I am showing two sample metrics, but I am collecting more including recall, area under roc, f1 score etc....

val precision = bcmetrics.precisionByThreshold()

precision.foreach { case (t, p) =>
  // If threshold is 0.5 as what we want, then get the precision and append it to the string. Idea is if score is <0.5 class 0, else class 1.
  if (t == 0.5) {
    println(s"Threshold is: $t, Precision is: $p")
    sb ++= p.toString() + "\t"

  }
}
    val auROC = bcmetrics.areaUnderROC
    sb ++= iteration + auPRC.toString() + "\t"

}我能够解决这个问题,我做了以下几点。我将字符串转换为列表

val data = spark.parallelize(List(sb))
val filename =  "logreg-metrics" + iterationcount.toString() + ".txt"
data.saveAsTextFile(filename)

我能够解决这个问题,我做了以下几点。我将字符串转换为列表

val data = spark.parallelize(List(sb))
val filename =  "logreg-metrics" + iterationcount.toString() + ".txt"
data.saveAsTextFile(filename)