将2d列表保存到数据帧scala spark中

将2d列表保存到数据帧scala spark中,scala,apache-spark,dataframe,spark-dataframe,Scala,Apache Spark,Dataframe,Spark Dataframe,我有一个名为TupleSlides的以下格式的2d列表: List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7)) 我创建了以下模式: val schema = StructType( Array( StructField("1", IntegerType, true),

我有一个名为TupleSlides的以下格式的2d列表:

List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7))
我创建了以下模式:

val schema = StructType(
            Array(
            StructField("1", IntegerType, true), 
            StructField("2", IntegerType, true), 
            StructField("3", IntegerType, true), 
            StructField("4", IntegerType, true),  
            StructField("5", IntegerType, true), 
            StructField("6", IntegerType, true), 
            StructField("7", IntegerType, true), 
            StructField("8", IntegerType, true), 
            StructField("9", IntegerType, true), 
            StructField("10", IntegerType, true) )
        )
我正在创建一个数据帧,如下所示:

val tuppleSlidesDF = sparkSession.createDataFrame(tuppleSlides, schema)
但它甚至不会编译。我该怎么做才好呢


谢谢。

在创建数据帧之前,您需要将2d列表转换为RDD[Row]对象:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val rdd = sc.parallelize(tupleSlides).map(Row.fromSeq(_))

sqlContext.createDataFrame(rdd, schema)
# res7: org.apache.spark.sql.DataFrame = [1: int, 2: int, 3: int, 4: int, 5: int, 6: int, 7: int, 8: int, 9: int, 10: int]

另请注意,在spark 2.x中,sqlContext替换为spark:


哈,我在这里写了一个
toTuple10
fromSeq
是一个不错的选择
spark.createDataFrame(rdd, schema)
# res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 8 more fields]