Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 合并两种不同类型的RDD_Scala_Apache Spark_Apache Spark Mllib - Fatal编程技术网

Scala 合并两种不同类型的RDD

Scala 合并两种不同类型的RDD,scala,apache-spark,apache-spark-mllib,Scala,Apache Spark,Apache Spark Mllib,大家好,我想把RDD[Vector]和RDD[Int]组合成RDD[Vector] 这就是我所做的,我用Kmeans来预测聚类,想法是在每个向量前面加上相应的聚类 val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate() val data = spark.sparkContext.textFile("C:/spark/data/mllib/kmeans_data.txt")

大家好,我想把RDD[Vector]和RDD[Int]组合成RDD[Vector] 这就是我所做的,我用Kmeans来预测聚类,想法是在每个向量前面加上相应的聚类

    val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
val data = spark.sparkContext.textFile("C:/spark/data/mllib/kmeans_data.txt")
 //Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()//RDD[vector]
val clusters = KMeans.train(parsedData, numClusters, numIterations)
val resultatOfprediction=clusters.predict(parsedData)//RDD[int]
val finalData=parsedData.zip(resultatOfprediction)
finalData.collect().foreach(println)
结果是

([0.0,0.0,0.0],0)
([0.1,0.1,0.1],0)
([0.2,0.2,0.2],0)
([9.0,9.0,9.0],1)
([9.1,9.1,9.1],1)
([9.2,9.2,9.2],1)
我想要的输出

    [0.0,0.0,0.0,1.0]
    [0.1,0.1,0.1,1.0]
    [0.2,0.2,0.2,1.0]
    [9.0,9.0,9.0,0.0]
    [9.1,9.1,9.1,0.0]
    [9.2,9.2,9.2,0.0]

我的目标是将一个最终RDD[vector]保存到一个txt文件中,以在网格中显示它。但是您提供的结果不是RDD[vector]

要获得您想要的结果,您需要压缩这两个RDD。这是你怎么做的

val parsedData = spark.sparkContext.parallelize(Seq(1.0,1.0,1.0,0.0,0.0,0.0))

val resultatOfprediction = spark.sparkContext.parallelize(Seq(
  (0.0,0.0,0.0),
  (0.1,0.1,0.1),
  (0.2,0.2,0.2),
  (9.0,9.0,9.0),
  (9.1,9.1,9.1),
  (9.2,9.2,9.2)
))

resultatOfprediction.zip(parsedData)
因为它返回一个元组,所以可以将结果作为

resultatOfprediction.zip(parsedData)
      .map(t => (t._1._1, t._1._2, t._1._3, t._2))
对于动态,您可以按照@Rahul Sukla
resultatOfprediction.zip(parsedData).map(t=>t._1.productIterator.toList.map(u.asInstanceOf[Double]):+t._2)的建议执行folling操作。


希望这有帮助

要获得您想要的结果,您需要压缩这两个RDD。这是你怎么做的

val parsedData = spark.sparkContext.parallelize(Seq(1.0,1.0,1.0,0.0,0.0,0.0))

val resultatOfprediction = spark.sparkContext.parallelize(Seq(
  (0.0,0.0,0.0),
  (0.1,0.1,0.1),
  (0.2,0.2,0.2),
  (9.0,9.0,9.0),
  (9.1,9.1,9.1),
  (9.2,9.2,9.2)
))

resultatOfprediction.zip(parsedData)
因为它返回一个元组,所以可以将结果作为

resultatOfprediction.zip(parsedData)
      .map(t => (t._1._1, t._1._2, t._1._3, t._2))
对于动态,您可以按照@Rahul Sukla
resultatOfprediction.zip(parsedData).map(t=>t._1.productIterator.toList.map(u.asInstanceOf[Double]):+t._2)的建议执行folling操作。


希望这有帮助

我没有得到正确的答案,请您提供另一个答案谢谢这不是您的输出(0.0,0.0,0.0,1.0)(0.1,0.1,0.1,1.0)(0.2,0.2,0.2,1.0)(9.0,9.0,9.0,9.0,0.0)(9.1,9.1,0.0)(9.2,9.2,0.0)我如何概括.map(t…)因为我不知道数据中的列数,所以,不可能手动将其映射为meant@MaherHTBShankar的回答对你提出的问题是准确的。您可以接受这一点,并针对您遇到的另一个问题创建一个新问题。请尝试此ResultaOfPrediction.zip(parsedData).map(t=>t.。_1.productIterator.toList.map(u.asInstanceOf[Double]):+t.。_2)我没有得到正确的答案,请您提供另一个答案,谢谢这不是您的输出(0.0,0.0,0.0.0,0.0,1.0)(0.1,1.0)(0.2,0.2,0.2,1.0)(9.0,9.0,9.0,0.0)(9.1,9.1,9.1,0.0)(9.2,9.2,9.2,0.0)我如何概括.map(t…),因为我不知道数据中的列数,因此,无法手动映射我需要的meant@MaherHTBShankar的回答对你提出的问题是准确的。您可以接受这一点,并针对您遇到的另一个问题创建一个新问题。请尝试此ResultaOfPrediction.zip(parsedData).map(t=>t.。_1.productIterator.toList.map(u.asInstanceOf[Double]):+t._2)