Scala 将行追加到数据帧

Scala 将行追加到数据帧,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我想为spark a数据帧实现以下功能。我希望继续将新行追加到数据帧,如下面的示例所示 for(a<- value) { val num = a val count = a+10 //creating a df with the above values// val data = Seq((num.asInstanceOf[Double], count.asInstanceOf[Double]))

我想为spark a数据帧实现以下功能。我希望继续将新行追加到数据帧,如下面的示例所示

for(a<- value)
        { 
         val num = a
         val count = a+10
         //creating a df with the above values//
         val data = Seq((num.asInstanceOf[Double], count.asInstanceOf[Double]))
         val row = spark.sparkContext.parallelize(data).toDF("Number","count")
         val data2 =  data1.union(row)
         val data1 = data2 --> currently this assignment is not possible.
         }
我也试过了

for(a<- value)
        { 
         val num = a
         val count = a+10
         //creating a df with the above values//
         val data = Seq((num.asInstanceOf[Double], count.asInstanceOf[Double]))
         val row = spark.sparkContext.parallelize(data).toDF("Number","count")
         val data1 =  data1.union(row) --> Union with self is not possible
         }

如何在spark中实现这一点。

您的数据1必须声明为var:


只需使用for循环创建一个DataFrame,然后与data1联合,如下所示:

val df = ( for(a <- values) yield (a, a+10) ).toDF("Number", "count")
val result = data1.union(df)

这将比在for循环中进行联合更有效

数据帧是不可变的,您需要使用可变结构。以下是可能对您有所帮助的解决方案

scala> val value = Array(1.0, 2.0, 55.0)
value: Array[Double] = Array(1.0, 2.0, 55.0)

scala> import scala.collection.mutable.ListBuffer
import scala.collection.mutable.ListBuffer

scala> var data = new ListBuffer[(Double, Double)]
data: scala.collection.mutable.ListBuffer[(Double, Double)] = ListBuffer()

scala> for(a <- value)
     | {
     | val num = a
     | val count = a+10
     | data += ((num.asInstanceOf[Double], count.asInstanceOf[Double]))
     | println(data)
     | }
ListBuffer((1.0,11.0))
ListBuffer((1.0,11.0), (2.0,12.0))
ListBuffer((1.0,11.0), (2.0,12.0), (55.0,65.0))

scala> val DF = spark.sparkContext.parallelize(data).toDF("Number","count")
DF: org.apache.spark.sql.DataFrame = [Number: double, count: double]

scala> DF.show()
+------+-----+
|Number|count|
+------+-----+
|   1.0| 11.0|
|   2.0| 12.0|
|  55.0| 65.0|
+------+-----+


scala>
val df = ( for(a <- values) yield (a, a+10) ).toDF("Number", "count")
val result = data1.union(df)
scala> val value = Array(1.0, 2.0, 55.0)
value: Array[Double] = Array(1.0, 2.0, 55.0)

scala> import scala.collection.mutable.ListBuffer
import scala.collection.mutable.ListBuffer

scala> var data = new ListBuffer[(Double, Double)]
data: scala.collection.mutable.ListBuffer[(Double, Double)] = ListBuffer()

scala> for(a <- value)
     | {
     | val num = a
     | val count = a+10
     | data += ((num.asInstanceOf[Double], count.asInstanceOf[Double]))
     | println(data)
     | }
ListBuffer((1.0,11.0))
ListBuffer((1.0,11.0), (2.0,12.0))
ListBuffer((1.0,11.0), (2.0,12.0), (55.0,65.0))

scala> val DF = spark.sparkContext.parallelize(data).toDF("Number","count")
DF: org.apache.spark.sql.DataFrame = [Number: double, count: double]

scala> DF.show()
+------+-----+
|Number|count|
+------+-----+
|   1.0| 11.0|
|   2.0| 12.0|
|  55.0| 65.0|
+------+-----+


scala>