Arrays Spark:如何将数组保存为两列CSV?
我从逻辑回归中得到了一个带有Arrays Spark:如何将数组保存为两列CSV?,arrays,csv,apache-spark,Arrays,Csv,Apache Spark,我从逻辑回归中得到了一个带有预测和标签的数组,如下所示: labelAndPreds: org.apache.spark.rdd.RDD[(Double, Double)] = MapPartitionsRDD[517] at map at <console>:52 scala> labelAndPreds.collect() res2: Array[(Double, Double)] = Array((0.004106564139257318, 0.0), (0.36
预测
和标签的数组,如下所示:
labelAndPreds: org.apache.spark.rdd.RDD[(Double, Double)] =
MapPartitionsRDD[517] at map at <console>:52
scala> labelAndPreds.collect()
res2: Array[(Double, Double)] = Array((0.004106564139257318, 0.0),
(0.3641478408865635, 0.0), (0.9999258409695498, 1.0), (0.342287288060...
labelandreds:org.apache.spark.rdd.rdd[(双精度,双精度)]=
MapPartitionsRDD[517]位于地图位置:52
scala>labelandreds.collect()
res2:Array[(Double,Double)]=数组((0.004106564139257318,0.0),
(0.3641478408865635, 0.0), (0.9999258409695498, 1.0), (0.342287288060...
如何将其以CSV格式保存在本地磁盘上,格式为两列(一列用于标签,一列用于预测)?您可以使用:
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext.implicits._
val df = labelsAndPreds.toDF("labels", "predictions")
df.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("labelsAndPreds.csv")