Scala 将RDD打印到带有标题的文本文件_Scala_Apache Spark_Rdd

Scala 将RDD打印到带有标题的文本文件

scala apache-spark

Scala 将RDD打印到带有标题的文本文件,scala,apache-spark,rdd,Scala,Apache Spark,Rdd,简单问题：对于下面的RDD，我想打印出一个具有以下格式和标题（UserID、MovieID、Pred_rating）的输出文本文件很简单。正确的？所以我使用这个函数： def print_outputfile(final_predictions_adjusted:RDD[((Int, Int), Double)])={ val writer = new FileWriter(new File("output.txt" )) writer.write("UserID,Mov

简单问题：对于下面的RDD，我想打印出一个具有以下格式和标题（UserID、MovieID、Pred_rating）的输出文本文件

很简单。正确的？所以我使用这个函数：

  def print_outputfile(final_predictions_adjusted:RDD[((Int, Int), Double)])={
    val writer = new FileWriter(new File("output.txt" ))
    writer.write("UserID,MovieID,Pred_rating")
    final_predictions_adjusted.sortByKey().foreach(x=>{writer.write(x.toString())})
    writer.close()
  }

上述函数不工作，出现以下错误

caused by: java.io.NotSerializableException: java.io.FileWrite

在您的代码中，FileWriter对象将被发送到所有节点并并行执行，这不适用于对本地文件的引用。因此，您将获得NotSerializableException

您通常会通过saveAsTextFile将RDD保存到一个文件中：

final_predictions_adjusted.sortByKey().map(e=> (e._1._1,e._1._2,e._2)).saveAsTextFile("output.dir")

这将以多个部分写出文件。您可以添加标题并在以后手动组合部件。

这就像sweet lords将：

  def print_outputfile(final_predictions_adjusted:RDD[((Int, Int), Double)])={
    val writer = new FileWriter(new File("output.txt" ))
    writer.write("UserID,MovieID,Pred_rating\n")
    final_predictions_adjusted.sortByKey().collect().foreach(x=>{writer.write(x._1._1+","+x._1._2+","+x._2+"\n")})
    writer.close()
  }

哇！甚至更好。如果我使用

final\u predictions\u adjusted.sortByKey（）.collect（）.foreach（x=>{writer.write（x.toString（））

than来解释错误，

collect

结果将仅在驱动程序上。只要结果足够小，适合那里，但不能处理大型数据集，就可以了

  def print_outputfile(final_predictions_adjusted:RDD[((Int, Int), Double)])={
    val writer = new FileWriter(new File("output.txt" ))
    writer.write("UserID,MovieID,Pred_rating\n")
    final_predictions_adjusted.sortByKey().collect().foreach(x=>{writer.write(x._1._1+","+x._1._2+","+x._2+"\n")})
    writer.close()
  }