在Scala中使用RDD操作转换错误_Scala_Apache Spark

在Scala中使用RDD操作转换错误

scala apache-spark

在Scala中使用RDD操作转换错误,scala,apache-spark,Scala,Apache Spark,我是Scala新手，在练习时遇到了错误我试图将RDD转换为DataFrame，下面是我的代码 package com.sclee.examples import com.sun.org.apache.xalan.internal.xsltc.compiler.util.IntType import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.Row import org.apache.spark

我是Scala新手，在练习时遇到了错误

我试图将RDD转换为DataFrame，下面是我的代码

package com.sclee.examples

import com.sun.org.apache.xalan.internal.xsltc.compiler.util.IntType
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{LongType, StringType, StructField, StructType};


object App {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("examples").setMaster("local")
    val sc = new SparkContext(conf)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._

    case class Person(name: String, age: Long)

    val personRDD = sc.makeRDD(Seq(Person("A",10),Person("B",20)))
    val df = personRDD.map({
      case Row(val1: String, val2: Long) => Person(val1,val2)
    }).toDS()

//    val ds = personRDD.toDS()
  }
}

我遵循Spark文档中的说明，还参考了一些博客，向我展示了如何将rdd转换为dataframe，但我发现了下面的错误

Error:(20, 27) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.
    val df = personRDD.map({

虽然我试图自己解决这个问题，但失败了。任何帮助都将不胜感激。

以下代码有效：

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession

case class Person(name: String, age: Long)
object SparkTest {
  def main(args: Array[String]): Unit = {

    // use the SparkSession of Spark 2
    val spark = SparkSession
      .builder()
      .appName("Spark SQL basic example")
      .config("spark.some.config.option", "some-value")
      .getOrCreate()

    import spark.implicits._

    // this your RDD - just a sample how to create an RDD
    val personRDD: RDD[Person] = spark.sparkContext.parallelize(Seq(Person("A",10),Person("B",20)))

   // the sparksession has a method to convert to an Dataset
   val ds = spark.createDataset(personRDD)
   println(ds.count())
  }
}

我做了以下修改：

使用
```
SparkSession
```
代替
```
SparkContext
```
和
```
SqlContext
```
将
```
Person
```
类移出应用程序（我不知道为什么我必须这么做本）
使用
```
createDataset
```
进行转换

但是，我想进行这种转换是很少见的，您可能希望使用该方法将输入直接读入

数据集