Spark/Scala列表的序列化。任务不可序列化:java.io.NotSerializableException
问题在于Spark数据集和INT列表的序列化。Scala版本是2.10.4,Spark版本是1.6 这与其他问题类似,但基于这些回答,我无法让它起作用。为了说明问题,我简化了代码 我有一个案例课:Spark/Scala列表的序列化。任务不可序列化:java.io.NotSerializableException,scala,apache-spark,serialization,Scala,Apache Spark,Serialization,问题在于Spark数据集和INT列表的序列化。Scala版本是2.10.4,Spark版本是1.6 这与其他问题类似,但基于这些回答,我无法让它起作用。为了说明问题,我简化了代码 我有一个案例课: case class FlightExt(callsign: Option[String], serials: List[Int]) 我的主要方法是: val (ctx, sctx) = SparkUtil.createContext() // just a helper function
case class FlightExt(callsign: Option[String], serials: List[Int])
我的主要方法是:
val (ctx, sctx) = SparkUtil.createContext() // just a helper function to build context
val flightsDataFrame = separateFlightsMock(sctx) // reads data from Parquet file
import sctx.implicits._
flightsDataFrame.as[FlightExt]
.map(flight => flight.callsign)
.show()
case class FlightExt(callsign: Option[String], other: Array[AnotherCaseClass])
我得到以下错误:
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.reflect.internal.Symbols$PackageClassSymbol
Serialization stack:
- object not serializable (class: scala.reflect.internal.Symbols$PackageClassSymbol, value: package scala)
- field (class: scala.reflect.internal.Types$ThisType, name: sym, type: class scala.reflect.internal.Symbols$Symbol)
- object (class scala.reflect.internal.Types$UniqueThisType, scala.type)
- field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: class scala.reflect.internal.Types$Type)
- object (class scala.reflect.internal.Types$TypeRef$$anon$6, scala.Int)
- field (class: org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$5, name: elementType$2, type: class scala.reflect.api.Types$TypeApi)
- object (class org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$5, <function1>)
- field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, name: function, type: interface scala.Function1)
- object (class org.apache.spark.sql.catalyst.expressions.MapObjects, mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType))
- field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: targetObject, type: class org.apache.spark.sql.catalyst.expressions.Expression)
- object (class org.apache.spark.sql.catalyst.expressions.Invoke, invoke(mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType),array,ObjectType(class [Ljava.lang.Object;)))
- writeObject data (class: scala.collection.immutable.$colon$colon)
- object (class scala.collection.immutable.$colon$colon, List(invoke(mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType),array,ObjectType(class [Ljava.lang.Object;))))
- field (class: org.apache.spark.sql.catalyst.expressions.StaticInvoke, name: arguments, type: interface scala.collection.Seq)
- object (class org.apache.spark.sql.catalyst.expressions.StaticInvoke, staticinvoke(class scala.collection.mutable.WrappedArray$,ObjectType(interface scala.collection.Seq),make,invoke(mapobjects(<function1>,cast(serials#7 as array<int>),IntegerType),array,ObjectType(class [Ljava.lang.Object;)),true))
- writeObject data (class: scala.collection.immutable.$colon$colon)
它也会因同样的错误而失败
我是Scala和Spark的新手,可能遗漏了一些东西,但请给出解释。将
flightText
类放入对象中,检查下面的代码
object Flight {
case class FlightExt(callsign: Option[String], var serials: List[Int])
}
使用Flight.FlightExt
val (ctx, sctx) = SparkUtil.createContext() // just a helper function to build context
val flightsDataFrame = separateFlightsMock(sctx) // reads data from Parquet file
import sctx.implicits._
flightsDataFrame.as[Flight.FlightExt]
.map(flight => flight.callsign)
.show()
谢谢,直到现在我才得到:线程“main”org.apache.spark.sql.AnalysisException中的异常:无法为内部类org.some.package.Flight$FlightExt
生成编码器,而无法访问该类定义的范围。尝试将此类移出其父类。我现在需要一个自定义编码器还是什么?