在Spark Scala中生成混淆矩阵实现MLlib的随机林
我正在Scala中使用Spark的MLlib实现randomforest。我想从随机森林算法生成一个混淆矩阵 我编写了以下代码。但是什么都没有。我怎样才能得到混淆矩阵 代码:在Spark Scala中生成混淆矩阵实现MLlib的随机林,scala,apache-spark,Scala,Apache Spark,我正在Scala中使用Spark的MLlib实现randomforest。我想从随机森林算法生成一个混淆矩阵 我编写了以下代码。但是什么都没有。我怎样才能得到混淆矩阵 代码: package org.test.newrandom import org.apache.spark.mllib.tree.RandomForest import org.apache.spark.mllib.tree.model.RandomForestModel import org.apache.spark.mll
package org.test.newrandom
import org.apache.spark.mllib.tree.RandomForest
import org.apache.spark.mllib.tree.model.RandomForestModel
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
//org/apache/spark/mllib/evaluation/MulticlassMetrics
object RandomTest {
def main(args: Array[String]) = {
//Start the Spark context
val conf = new SparkConf()
.setAppName("RandomTest1")
.setMaster("local")
val sc = new SparkContext(conf)
// Load and parse the data file.
val data = MLUtils.loadLibSVMFile(sc, "sample_libsvm_data.txt")
// Split the data into training and test sets (30% held out for testing)
val splits = data.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
// Train a RandomForest model.
// Empty categoricalFeaturesInfo indicates all features are continuous.
val numClasses = 2
val categoricalFeaturesInfo = Map[Int, Int]()
val numTrees = 5 // Use more in practice.
val featureSubsetStrategy = "auto" // Let the algorithm choose.
val impurity = "gini"
val maxDepth = 4
val maxBins = 32
val model = RandomForest.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,
numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)
// Evaluate model on test instances and compute test error
val labelAndPreds = testData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
val testErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / testData.count()
println("Test Error = " + testErr)
println("Learned classification forest model:\n" + model.toDebugString)
MultiClassMetrics metrics = new MultiClassMetrics(labelAndPreds.rdd())
println(metrics.precision()); //prints 0.94334140435
println(metrics.confusionMatrix()); //prints like the following
}
}
我在这行中遇到错误:
MultiClassMetrics metrics = new MultiClassMetrics(labelAndPreds.rdd())
上面写着-未找到:value MultiClassMetrics,但我添加了
import org.apache.spark.mllib.evaluation.MulticlassMetrics
在scala中,变量可以有一个类型归属,其名称如下:
val metrics:MultiClassMetrics=新的MultiClassMetrics(labelAndPreds.rdd())
也许您复制了从java源文件中截取的数据,其中类型位于变量名之前?您是否在SBT中运行它?如果是,则应将依赖项添加到build.sbt
libraryDependencies += "spark-mllib" % "1.4.0"
如果这是问题所在,它应该仍然适用于scala shell。您可以尝试用
val metrics=new MulticlassMetrics(labelandreds)替换该行。
@ChristianHirsch,它可以工作。(y)