在Scala中使用RDD操作转换错误
我是Scala新手,在练习时遇到了错误 我试图将RDD转换为DataFrame,下面是我的代码在Scala中使用RDD操作转换错误,scala,apache-spark,Scala,Apache Spark,我是Scala新手,在练习时遇到了错误 我试图将RDD转换为DataFrame,下面是我的代码 package com.sclee.examples import com.sun.org.apache.xalan.internal.xsltc.compiler.util.IntType import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.Row import org.apache.spark
package com.sclee.examples
import com.sun.org.apache.xalan.internal.xsltc.compiler.util.IntType
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{LongType, StringType, StructField, StructType};
object App {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("examples").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Person(name: String, age: Long)
val personRDD = sc.makeRDD(Seq(Person("A",10),Person("B",20)))
val df = personRDD.map({
case Row(val1: String, val2: Long) => Person(val1,val2)
}).toDS()
// val ds = personRDD.toDS()
}
}
我遵循Spark文档中的说明,还参考了一些博客,向我展示了如何将rdd转换为dataframe,但我发现了下面的错误
Error:(20, 27) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._ Support for serializing other types will be added in future releases.
val df = personRDD.map({
虽然我试图自己解决这个问题,但失败了。任何帮助都将不胜感激。以下代码有效:
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
case class Person(name: String, age: Long)
object SparkTest {
def main(args: Array[String]): Unit = {
// use the SparkSession of Spark 2
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
import spark.implicits._
// this your RDD - just a sample how to create an RDD
val personRDD: RDD[Person] = spark.sparkContext.parallelize(Seq(Person("A",10),Person("B",20)))
// the sparksession has a method to convert to an Dataset
val ds = spark.createDataset(personRDD)
println(ds.count())
}
}
我做了以下修改:
- 使用
代替SparkSession
和SparkContext
SqlContext
- 将
类移出应用程序(我不知道为什么我必须这么做 本)Person
- 使用
进行转换createDataset
数据集