Apache spark Spark:如何使用经过训练的数据集进行预测(MLLIB:SVMWithSGD)
我是新手。我能够训练数据集。但无法使用经过训练的数据集进行预测 以下是训练1800x4000矩阵数据的代码Apache spark Spark:如何使用经过训练的数据集进行预测(MLLIB:SVMWithSGD),apache-spark,prediction,Apache Spark,Prediction,我是新手。我能够训练数据集。但无法使用经过训练的数据集进行预测 以下是训练1800x4000矩阵数据的代码 import org.apache.spark.mllib.classification.SVMWithSGD import org.apache.spark.mllib.regression.LinearRegressionWithSGD import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spa
import org.apache.spark.mllib.classification.SVMWithSGD
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data
val data = sc.textFile("data/mllib/ridge-data/myfile.txt")
val parsedData = data.map { line =>
val parts = line.split(' ')
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}
val firstDataPoint = parsedData.take(1)(0)
// Building the model
val numIterations = 100
val model = SVMWithSGD.train(parsedData, numIterations)
//val model = LinearRegressionWithSGD.train(parsedData,numIterations)
val labelAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count
println("Training Error = " + trainErr)
现在我加载用于执行预测的数据:数据是1800个值的向量
val test = sc.textFile("data/mllib/ridge-data/data.txt")
但不确定如何使用这些数据进行预测。请提供帮助。首先从文本文件加载标签点(请记住,您必须使用saveAsTextFile保存RDD):
javarddtest=MLUtils.loadLabeledPoints(init.context,”hdfs://../test/toJavaRDD();
JavaRDD scoreAndLabels=test.map(
新函数(){
公共元组2调用(标签点p){
双倍分数=模型预测(p.特征());
返回新的Tuple2(score,p.label());
}
}
);
现在收集分数并对其进行迭代:
List<Tuple2<Object, Object>> scores = scoreAndLabels.collect();
for(Tuple2<Object, Object> score : scores){
System.out.println(score._1 + " \t" + score._2);
}
List scores=scoreAndLabels.collect();
对于(Tuple2分数:分数){
System.out.println(分数1+“\t”+分数2);
}
它是用Java编写的,但也许您可以将其转换为:)
但预测值没有意义:
-18.841544889249917 0.0
168.32916035523283 1.0
420.67763915879794 1.0
-974.1942589201286 0.0
71.73602841256813 1.0
233.13636224524993 1.0
-1000.5902168199027 0.0
有人知道他们的意思吗
List<Tuple2<Object, Object>> scores = scoreAndLabels.collect();
for(Tuple2<Object, Object> score : scores){
System.out.println(score._1 + " \t" + score._2);
}