Apache spark spark-将元组列表转换为数据集-scala
我正在尝试创建数据集的示例,下面的示例很有效:Apache spark spark-将元组列表转换为数据集-scala,apache-spark,Apache Spark,我正在尝试创建数据集的示例,下面的示例很有效: val lname = List(("Krishna", 32, "GWL"), ("Pankaj", 37, "BIHAR"), ("Sunil", 29, "Bangalre")) import spark.implicits._ val rddLName = spark.sparkContext.parallelize(lname) case class Test1(name: String, age: Int, place: String)
val lname = List(("Krishna", 32, "GWL"), ("Pankaj", 37, "BIHAR"), ("Sunil", 29, "Bangalre"))
import spark.implicits._
val rddLName = spark.sparkContext.parallelize(lname)
case class Test1(name: String, age: Int, place: String)
val ds1 = lname.toDS()
val ds2 = rddLName.toDS()
val ds3 = spark.createDataset(rddLName).as("Test1")
val ds4 = rddLName.toDF().as("Test1")
a) 但是如何使用as[U](隐式:编码器[U])来创建数据集:
我尝试了下面的代码,它给了我下面的错误。你能给我介绍一些参考资料吗
错误:(41,62)找不到Test1类型的编码器。需要隐式编码器[Test1]在数据集中存储Test1实例。导入spark.implicits支持基本类型(Int、String等)和产品类型(case类)。\uu
val rddNew = lname map{case (x,y,z) => Test1(x,y,z)}
val ds5 = spark.sparkContext.parallelize(lname).toDF().as[Test]
ds5.show()
不支持以下代码。
val ds5=spark.sparkContext.parallelize(rddNew.toDF)()
b) ds4.show()为我提供带有_1、_2和_3的标题,如下所示:
+-------+---+--------+
| _1| _2| _3|
+-------+---+--------+
|Krishna| 32| GWL|
| Pankaj| 37| BIHAR|
| Sunil| 29|Bangalre|
+-------+---+--------+
如何使用我提供的模式获取名称、年龄和位置标题
case class Test1(name: String, age: Int, place: String)
必须是顶级类,不能在方法中声明
show()给我一个带有_1、_2和_3的标题,如下所示
您的列表声明包含元组的not对象,对于Spark如何命名列没有任何线索
List(("Krishna", 32, "GWL"), ("Pankaj", 37, "BIHAR"), ("Sunil", 29, "Bangalre"))
List(("Krishna", 32, "GWL"), ("Pankaj", 37, "BIHAR"), ("Sunil", 29, "Bangalre"))
val ds5 = spark.sparkContext.parallelize(lname).toDF()
.withColumnRenamed("_1", "name")
.withColumnRenamed("_2", "age")
.withColumnRenamed("_3", "place").as[Test1]
ds5.show()
+-------+---+--------+
| name|age| place|
+-------+---+--------+
|Krishna| 32| GWL|
| Pankaj| 37| BIHAR|
| Sunil| 29|Bangalre|
+-------+---+--------+