Spark/Scala列表的序列化。任务不可序列化:java.io.NotSerializableException

Spark/Scala列表的序列化。任务不可序列化:java.io.NotSerializableException,scala,apache-spark,serialization,Scala,Apache Spark,Serialization,问题在于Spark数据集和INT列表的序列化。Scala版本是2.10.4,Spark版本是1.6 这与其他问题类似,但基于这些回答,我无法让它起作用。为了说明问题,我简化了代码 我有一个案例课: case class FlightExt(callsign: Option[String], serials: List[Int]) 我的主要方法是: val (ctx, sctx) = SparkUtil.createContext() // just a helper function

问题在于Spark数据集和INT列表的序列化。Scala版本是2.10.4,Spark版本是1.6

这与其他问题类似,但基于这些回答,我无法让它起作用。为了说明问题,我简化了代码

我有一个案例课:

case class FlightExt(callsign: Option[String], serials: List[Int])
我的主要方法是:

    val (ctx, sctx) = SparkUtil.createContext() // just a helper function to build context
    val flightsDataFrame = separateFlightsMock(sctx) // reads data from Parquet file

    import sctx.implicits._
    flightsDataFrame.as[FlightExt]
      .map(flight => flight.callsign)
      .show()

case class FlightExt(callsign: Option[String], other: Array[AnotherCaseClass])
我得到以下错误:

SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.reflect.internal.Symbols$PackageClassSymbol
Serialization stack:
    - object not serializable (class: scala.reflect.internal.Symbols$PackageClassSymbol, value: package scala)
    - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: class scala.reflect.internal.Symbols$Symbol)
    - object (class scala.reflect.internal.Types$UniqueThisType, scala.type)
    - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: class scala.reflect.internal.Types$Type)
    - object (class scala.reflect.internal.Types$TypeRef$$anon$6, scala.Int)
    - field (class: org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$5, name: elementType$2, type: class scala.reflect.api.Types$TypeApi)
    - object (class org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$5, <function1>)
    - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, name: function, type: interface scala.Function1)
    - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType))
    - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: targetObject, type: class org.apache.spark.sql.catalyst.expressions.Expression)
    - object (class org.apache.spark.sql.catalyst.expressions.Invoke, invoke(mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType),array,ObjectType(class [Ljava.lang.Object;)))
    - writeObject data (class: scala.collection.immutable.$colon$colon)
    - object (class scala.collection.immutable.$colon$colon, List(invoke(mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType),array,ObjectType(class [Ljava.lang.Object;))))
    - field (class: org.apache.spark.sql.catalyst.expressions.StaticInvoke, name: arguments, type: interface scala.collection.Seq)
    - object (class org.apache.spark.sql.catalyst.expressions.StaticInvoke, staticinvoke(class scala.collection.mutable.WrappedArray$,ObjectType(interface scala.collection.Seq),make,invoke(mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType),array,ObjectType(class [Ljava.lang.Object;)),true))
    - writeObject data (class: scala.collection.immutable.$colon$colon)

它也会因同样的错误而失败


我是Scala和Spark的新手,可能遗漏了一些东西,但请给出解释。

flightText
类放入
对象中,检查下面的代码

object Flight {
 case class FlightExt(callsign: Option[String], var serials: List[Int])
}
使用
Flight.FlightExt

val (ctx, sctx) = SparkUtil.createContext() // just a helper function to build context
    val flightsDataFrame = separateFlightsMock(sctx) // reads data from Parquet file

    import sctx.implicits._
    flightsDataFrame.as[Flight.FlightExt]
      .map(flight => flight.callsign)
      .show()


谢谢,直到现在我才得到:线程“main”org.apache.spark.sql.AnalysisException中的异常:无法为内部类
org.some.package.Flight$FlightExt
生成编码器,而无法访问该类定义的范围。尝试将此类移出其父类。我现在需要一个自定义编码器还是什么?