Scala BinaryClassificationMetrics中Spark 2.4.4度量属性出错_Scala_Apache Spark_Metrics_Spark2.4.4

Scala BinaryClassificationMetrics中Spark 2.4.4度量属性出错

scala apache-spark

Scala BinaryClassificationMetrics中Spark 2.4.4度量属性出错,scala,apache-spark,metrics,spark2.4.4,Scala,Apache Spark,Metrics,Spark2.4.4,我试图复制这个，但当我试图从已处理的.csv文件中提取一些指标时，我出现了一个错误我的代码片段： val splitSeed = 5043 val Array(trainingData, testData) = df3.randomSplit(Array(0.7, 0.3), splitSeed) val lr = new LogisticRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) trai

我试图复制这个，但当我试图从已处理的.csv文件中提取一些指标时，我出现了一个错误

我的代码片段：

val splitSeed = 5043
val Array(trainingData, testData) = df3.randomSplit(Array(0.7, 0.3), splitSeed)

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

trainingData.show(20);

// Fit the model
val model = lr.fit(trainingData)

// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")

// run the  model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
testData.show()

// run the  model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
predictions.show()

// use MLlib to evaluate, convert DF to RDD**
val myRdd = predictions.select("rawPrediction", "label").rdd

val predictionAndLabels = myRdd.map(x => (x(0).asInstanceOf[DenseVector](1), x(1).asInstanceOf[Double]))
// Instantiate metrics object
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
println("area under the precision-recall curve: " + metrics.areaUnderPR)
println("area under the receiver operating characteristic (ROC) curve : " + metrics.areaUnderROC)
// A Precision-Recall curve plots (precision, recall) points for different threshold values, while a
// receiver operating characteristic, or ROC, curve plots (recall, false positive rate) points.
// The closer  the area Under ROC is to 1, the better the model is making predictions.**

+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
|    id|thickness|size|shape|madh|epsize|bnuc|bchrom|nNuc|mit|clas|clasLogistic|            features|label|       rawPrediction|         probability|prediction|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
| 63375|      9.0| 1.0|  2.0| 6.0|   4.0|10.0|   7.0| 7.0|2.0|   4|           1|[9.0,1.0,2.0,6.0,...|  1.0|[0.36391634252951...|[0.58998813846052...|       0.0|
|128059|      1.0| 1.0|  1.0| 1.0|   2.0| 5.0|   5.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[0.81179252636135...|[0.69249134920886...|       0.0|
|145447|      8.0| 4.0|  4.0| 1.0|   2.0| 9.0|   3.0| 3.0|1.0|   4|           1|[8.0,4.0,4.0,1.0,...|  1.0|[0.06964047482828...|[0.51740308582457...|       0.0|
|183913|      1.0| 2.0|  2.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,2.0,2.0,1.0,...|  0.0|[0.96139876234944...|[0.72340177322811...|       0.0|
|342245|      1.0| 1.0|  3.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,3.0,1.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|434518|      3.0| 1.0|  1.0| 1.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[3.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|493452|      1.0| 1.0|  3.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,3.0,1.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|508234|      7.0| 4.0|  5.0|10.0|   2.0|10.0|   3.0| 8.0|2.0|   4|           1|[7.0,4.0,5.0,10.0...|  1.0|[-0.0809133769755...|[0.47978268474014...|       1.0|
|521441|      5.0| 1.0|  1.0| 2.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[5.0,1.0,1.0,2.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|527337|      4.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|534555|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|535331|      3.0| 1.0|  1.0| 1.0|   3.0| 1.0|   2.0| 1.0|1.0|   2|           0|[3.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|558538|      4.0| 1.0|  3.0| 3.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,3.0,3.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|560680|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|601265|     10.0| 4.0|  4.0| 6.0|   2.0|10.0|   2.0| 3.0|1.0|   4|           1|[10.0,4.0,4.0,6.0...|  1.0|[-0.0034290346398...|[0.49914274218002...|       1.0|
|603148|      4.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|606722|      5.0| 5.0|  7.0| 8.0|   6.0|10.0|   7.0| 4.0|1.0|   4|           1|[5.0,5.0,7.0,8.0,...|  1.0|[-0.3103173938140...|[0.42303726852941...|       1.0|
|616240|      5.0| 3.0|  4.0| 3.0|   4.0| 5.0|   4.0| 7.0|1.0|   2|           0|[5.0,3.0,4.0,3.0,...|  0.0|[0.43719456056061...|[0.60759034803682...|       0.0|
|640712|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|654546|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|8.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
only showing top 20 rows

当我试图了解属性
AreaUnderbr
时，我遇到了以下错误：

val splitSeed = 5043
val Array(trainingData, testData) = df3.randomSplit(Array(0.7, 0.3), splitSeed)

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

trainingData.show(20);

// Fit the model
val model = lr.fit(trainingData)

// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")

// run the  model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
testData.show()

// run the  model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
predictions.show()

// use MLlib to evaluate, convert DF to RDD**
val myRdd = predictions.select("rawPrediction", "label").rdd

val predictionAndLabels = myRdd.map(x => (x(0).asInstanceOf[DenseVector](1), x(1).asInstanceOf[Double]))
// Instantiate metrics object
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
println("area under the precision-recall curve: " + metrics.areaUnderPR)
println("area under the receiver operating characteristic (ROC) curve : " + metrics.areaUnderROC)
// A Precision-Recall curve plots (precision, recall) points for different threshold values, while a
// receiver operating characteristic, or ROC, curve plots (recall, false positive rate) points.
// The closer  the area Under ROC is to 1, the better the model is making predictions.**

+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
|    id|thickness|size|shape|madh|epsize|bnuc|bchrom|nNuc|mit|clas|clasLogistic|            features|label|       rawPrediction|         probability|prediction|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
| 63375|      9.0| 1.0|  2.0| 6.0|   4.0|10.0|   7.0| 7.0|2.0|   4|           1|[9.0,1.0,2.0,6.0,...|  1.0|[0.36391634252951...|[0.58998813846052...|       0.0|
|128059|      1.0| 1.0|  1.0| 1.0|   2.0| 5.0|   5.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[0.81179252636135...|[0.69249134920886...|       0.0|
|145447|      8.0| 4.0|  4.0| 1.0|   2.0| 9.0|   3.0| 3.0|1.0|   4|           1|[8.0,4.0,4.0,1.0,...|  1.0|[0.06964047482828...|[0.51740308582457...|       0.0|
|183913|      1.0| 2.0|  2.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,2.0,2.0,1.0,...|  0.0|[0.96139876234944...|[0.72340177322811...|       0.0|
|342245|      1.0| 1.0|  3.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,3.0,1.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|434518|      3.0| 1.0|  1.0| 1.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[3.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|493452|      1.0| 1.0|  3.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,3.0,1.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|508234|      7.0| 4.0|  5.0|10.0|   2.0|10.0|   3.0| 8.0|2.0|   4|           1|[7.0,4.0,5.0,10.0...|  1.0|[-0.0809133769755...|[0.47978268474014...|       1.0|
|521441|      5.0| 1.0|  1.0| 2.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[5.0,1.0,1.0,2.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|527337|      4.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|534555|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|535331|      3.0| 1.0|  1.0| 1.0|   3.0| 1.0|   2.0| 1.0|1.0|   2|           0|[3.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|558538|      4.0| 1.0|  3.0| 3.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,3.0,3.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|560680|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|601265|     10.0| 4.0|  4.0| 6.0|   2.0|10.0|   2.0| 3.0|1.0|   4|           1|[10.0,4.0,4.0,6.0...|  1.0|[-0.0034290346398...|[0.49914274218002...|       1.0|
|603148|      4.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|606722|      5.0| 5.0|  7.0| 8.0|   6.0|10.0|   7.0| 4.0|1.0|   4|           1|[5.0,5.0,7.0,8.0,...|  1.0|[-0.3103173938140...|[0.42303726852941...|       1.0|
|616240|      5.0| 3.0|  4.0| 3.0|   4.0| 5.0|   4.0| 7.0|1.0|   2|           0|[5.0,3.0,4.0,3.0,...|  0.0|[0.43719456056061...|[0.60759034803682...|       0.0|
|640712|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|654546|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|8.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
only showing top 20 rows

20/01/10 10:41:02警告TaskSetManager:在阶段56.0中丢失任务0.0 （TID 246，10.10.252.172，执行人1）： java.lang.ClassNotFoundException：预测。TestCancerOriginal$$anonfun$1在 java.net.URLClassLoader.findClass（URLClassLoader.java:382）位于 loadClass（ClassLoader.java:424）位于 loadClass（ClassLoader.java:357）位于 java.lang.Class.forName0（本机方法）位于 java.lang.Class.forName（Class.java:348）位于 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass（JavaSerializer.scala:67）在 java.io.ObjectInputStream.readNonProxyDesc（ObjectInputStream.java:1868）在 java.io.ObjectInputStream.readClassDesc（ObjectInputStream.java:1751）在 java.io.ObjectInputStream.ReadOrderinaryObject（ObjectInputStream.java:2042）位于java.io.ObjectInputStream.readObject0（ObjectInputStream.java:1573）在 ObjectInputStream.defaultReadFields（ObjectInputStream.java:2287）在 java.io.ObjectInputStream.readSerialData（ObjectInputStream.java:2211）在 ObjectInputStream.readOrdinaryObject（ObjectInputStream.java:2069）位于java.io.ObjectInputStream.readObject0（ObjectInputStream.java:1573）在 ObjectInputStream.defaultReadFields（ObjectInputStream.java:2287）在 java.io.ObjectInputStream.readSerialData（ObjectInputStream.java:2211）在 ObjectInputStream.readOrdinaryObject（ObjectInputStream.java:2069）位于java.io.ObjectInputStream.readObject0（ObjectInputStream.java:1573）在 ObjectInputStream.defaultReadFields（ObjectInputStream.java:2287）在 java.io.ObjectInputStream.readSerialData（ObjectInputStream.java:2211）在 ObjectInputStream.readOrdinaryObject（ObjectInputStream.java:2069）位于java.io.ObjectInputStream.readObject0（ObjectInputStream.java:1573）位于java.io.ObjectInputStream.readObject（ObjectInputStream.java:431）在 org.apache.spark.serializer.JavaDeserializationStream.readObject（JavaSerializer.scala:75）在 org.apache.spark.serializer.JavaSerializerInstance.deserialize（JavaSerializer.scala:114）在 org.apache.spark.scheduler.ShuffleMapTask.runTask（ShuffleMapTask.scala:88）在 org.apache.spark.scheduler.ShuffleMapTask.runTask（ShuffleMapTask.scala:55）位于org.apache.spark.scheduler.Task.run（Task.scala:123） org.apache.spark.executor.executor$TaskRunner$$anonfun$10.apply（executor.scala:408）位于org.apache.spark.util.Utils$.tryWithSafeFinally（Utils.scala:1360）在 org.apache.spark.executor.executor$TaskRunner.run（executor.scala:414）在 java.util.concurrent.ThreadPoolExecutor.runWorker（ThreadPoolExecutor.java:1149）在 java.util.concurrent.ThreadPoolExecutor$Worker.run（ThreadPoolExecutor.java:624）运行（Thread.java:748）

我的预测。显示结果：

val splitSeed = 5043
val Array(trainingData, testData) = df3.randomSplit(Array(0.7, 0.3), splitSeed)

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

trainingData.show(20);

// Fit the model
val model = lr.fit(trainingData)

// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")

// run the  model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
testData.show()

// run the  model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
predictions.show()

// use MLlib to evaluate, convert DF to RDD**
val myRdd = predictions.select("rawPrediction", "label").rdd

val predictionAndLabels = myRdd.map(x => (x(0).asInstanceOf[DenseVector](1), x(1).asInstanceOf[Double]))
// Instantiate metrics object
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
println("area under the precision-recall curve: " + metrics.areaUnderPR)
println("area under the receiver operating characteristic (ROC) curve : " + metrics.areaUnderROC)
// A Precision-Recall curve plots (precision, recall) points for different threshold values, while a
// receiver operating characteristic, or ROC, curve plots (recall, false positive rate) points.
// The closer  the area Under ROC is to 1, the better the model is making predictions.**

+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
|    id|thickness|size|shape|madh|epsize|bnuc|bchrom|nNuc|mit|clas|clasLogistic|            features|label|       rawPrediction|         probability|prediction|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
| 63375|      9.0| 1.0|  2.0| 6.0|   4.0|10.0|   7.0| 7.0|2.0|   4|           1|[9.0,1.0,2.0,6.0,...|  1.0|[0.36391634252951...|[0.58998813846052...|       0.0|
|128059|      1.0| 1.0|  1.0| 1.0|   2.0| 5.0|   5.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[0.81179252636135...|[0.69249134920886...|       0.0|
|145447|      8.0| 4.0|  4.0| 1.0|   2.0| 9.0|   3.0| 3.0|1.0|   4|           1|[8.0,4.0,4.0,1.0,...|  1.0|[0.06964047482828...|[0.51740308582457...|       0.0|
|183913|      1.0| 2.0|  2.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,2.0,2.0,1.0,...|  0.0|[0.96139876234944...|[0.72340177322811...|       0.0|
|342245|      1.0| 1.0|  3.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,3.0,1.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|434518|      3.0| 1.0|  1.0| 1.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[3.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|493452|      1.0| 1.0|  3.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,3.0,1.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|508234|      7.0| 4.0|  5.0|10.0|   2.0|10.0|   3.0| 8.0|2.0|   4|           1|[7.0,4.0,5.0,10.0...|  1.0|[-0.0809133769755...|[0.47978268474014...|       1.0|
|521441|      5.0| 1.0|  1.0| 2.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[5.0,1.0,1.0,2.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|527337|      4.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|534555|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|535331|      3.0| 1.0|  1.0| 1.0|   3.0| 1.0|   2.0| 1.0|1.0|   2|           0|[3.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|558538|      4.0| 1.0|  3.0| 3.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,3.0,3.0,...|  0.0|[0.95750903648839...|[0.72262279564412...|       0.0|
|560680|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|601265|     10.0| 4.0|  4.0| 6.0|   2.0|10.0|   2.0| 3.0|1.0|   4|           1|[10.0,4.0,4.0,6.0...|  1.0|[-0.0034290346398...|[0.49914274218002...|       1.0|
|603148|      4.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|1.0|   2|           0|[4.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
|606722|      5.0| 5.0|  7.0| 8.0|   6.0|10.0|   7.0| 4.0|1.0|   4|           1|[5.0,5.0,7.0,8.0,...|  1.0|[-0.3103173938140...|[0.42303726852941...|       1.0|
|616240|      5.0| 3.0|  4.0| 3.0|   4.0| 5.0|   4.0| 7.0|1.0|   2|           0|[5.0,3.0,4.0,3.0,...|  0.0|[0.43719456056061...|[0.60759034803682...|       0.0|
|640712|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   2.0| 1.0|1.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.10995557408198...|[0.75212082898242...|       0.0|
|654546|      1.0| 1.0|  1.0| 1.0|   2.0| 1.0|   1.0| 1.0|8.0|   2|           0|[1.0,1.0,1.0,1.0,...|  0.0|[1.11079628977456...|[0.75227753466134...|       0.0|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
only showing top 20 rows

我在这里看到的一个错误是，您将

rawPrediction

列传递给

binaryclassionmetrics

对象，而不是

prediction

列

rawPrediction

包含一个数组，每个类具有某种“概率”，而

BinaryClassificationMetrics

需要一个双精度值，由其签名指定：

newbinaryclassionmetrics（scoreAndLabels:RDD[（Double，Double）]）

你可以看到细节

我已经用这个修改做了一个快速测试，它似乎有效，下面是代码片段：

import org.apache.spark.sql.{Encoders，SparkSession}
导入org.apache.spark.ml.classification.logisticReturnal
导入org.apache.spark.ml.feature.StringIndexer
导入org.apache.spark.ml.feature.VectorAssembler
导入org.apache.spark.sql.functions_
导入org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
外壳等级Obs（id:Int，厚度：Double，尺寸：Double，形状：Double，madh:Double，
epsize:Double，bnuc:Double，bchrom:Double，nNuc:Double，mit:Double，clas:Double）
val obsSchema=Encoders.product[Obs].schema
val spark=SparkSession.builder
.appName（“StackoverflowQuestions”）
.master（“本地[*]”）
.getOrCreate（）
//使用.as[]方法将DataFrame转换为Dataset是必需的
导入spark.implicits_
val df=spark.read
.schema（obsSchema）
.csv（“威斯康星州乳腺癌数据”）
.drop（“id”）
.带列（“clas”，当（列（“clas”）。等于（4.0），1.0时。否则（0.0））
.na.drop（）//确保删除空值，否则功能装配将失败
//定义要放入要素向量的要素列**
val featureCols=数组（“厚度”、“大小”、“形状”、“madh”、“epsize”、“bnuc”、“bchrom”、“nNuc”、“mit”）
//设置输入和输出列名**
val assembler=new VectorAssembler（）.setInputCols（featureCols）.setOutputCol（“features”）
//返回向量列中包含所有要素列的数据帧**
val df2=汇编程序.转换（df）
//使用StringIndexer创建标签列**
val labelIndex=new StringIndexer（）.setInputCol（“clas”）.setOutputCol（“标签”）
val df3=labelIndexer.fit（df2）.transform（df2）
val splitSeed=5043
val数组（trainingData，testData）=df3.randomSplit（数组（0.7,0.3），splitSeed）
val lr=新逻辑回归（）
.setMaxIter（10）
.setRegParam（0.3）
.setElasticNetParam（0.8）
培训数据显示（20）；
//符合模型
val模型=lr.配合（培训数据）
//打印逻辑回归的系数和截距
println（s“系数：${model.coverties}截距：${model.Intercept}”）
//在测试功能上运行模型以获得预测**
val预测=model.transform（testData）
//尽你所能