将2d列表保存到数据帧scala spark中_Scala_Apache Spark_Dataframe_Spark Dataframe

将2d列表保存到数据帧scala spark中

scala apache-spark dataframe

将2d列表保存到数据帧scala spark中,scala,apache-spark,dataframe,spark-dataframe,Scala,Apache Spark,Dataframe,Spark Dataframe,我有一个名为TupleSlides的以下格式的2d列表： List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7)) 我创建了以下模式： val schema = StructType( Array( StructField("1", IntegerType, true),

我有一个名为TupleSlides的以下格式的2d列表：

List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7))

我创建了以下模式：

val schema = StructType(
            Array(
            StructField("1", IntegerType, true), 
            StructField("2", IntegerType, true), 
            StructField("3", IntegerType, true), 
            StructField("4", IntegerType, true),  
            StructField("5", IntegerType, true), 
            StructField("6", IntegerType, true), 
            StructField("7", IntegerType, true), 
            StructField("8", IntegerType, true), 
            StructField("9", IntegerType, true), 
            StructField("10", IntegerType, true) )
        )

我正在创建一个数据帧，如下所示：

val tuppleSlidesDF = sparkSession.createDataFrame(tuppleSlides, schema)

但它甚至不会编译。我该怎么做才好呢

谢谢。

在创建数据帧之前，您需要将2d列表转换为RDD[Row]对象：

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val rdd = sc.parallelize(tupleSlides).map(Row.fromSeq(_))

sqlContext.createDataFrame(rdd, schema)
# res7: org.apache.spark.sql.DataFrame = [1: int, 2: int, 3: int, 4: int, 5: int, 6: int, 7: int, 8: int, 9: int, 10: int]

另请注意，在spark 2.x中，sqlContext替换为spark：

哈，我在这里写了一个

toTuple10

，

fromSeq

是一个不错的选择

spark.createDataFrame(rdd, schema)
# res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 8 more fields]