Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/303.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
为什么这个Spark代码抛出java.io.NotSerializableException_Java_Scala_Apache Spark_Serialization_Rdd - Fatal编程技术网

为什么这个Spark代码抛出java.io.NotSerializableException

为什么这个Spark代码抛出java.io.NotSerializableException,java,scala,apache-spark,serialization,rdd,Java,Scala,Apache Spark,Serialization,Rdd,我想在RDD上的转换中访问伴生对象的方法。为什么以下方法不起作用: import org.apache.spark.rdd.RDD import spark.implicits._ import org.apache.spark.sql.{Encoder, Encoders} class Abc { def transform(x: RDD[Int]): RDD[Double] = { x.map(Abc.fn) } } object Abc { def fn(x: Int):

我想在RDD上的转换中访问伴生对象的方法。为什么以下方法不起作用:

import org.apache.spark.rdd.RDD
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}

class Abc {
    def transform(x: RDD[Int]): RDD[Double] = { x.map(Abc.fn) }
}

object Abc {
  def fn(x: Int): Double = { x.toDouble }
}

implicit def abcEncoder: Encoder[Abc] = Encoders.kryo[Abc]

new Abc().transform(sc.parallelize(1 to 10)).collect
上面的代码抛出一个
java.io.NotSerializableException

org.apache.spark.SparkException: Task not serializable
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
  at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
  at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
  at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.map(RDD.scala:369)
  at Abc.transform(<console>:19)
  ... 47 elided
Caused by: java.io.NotSerializableException: Abc
Serialization stack:
        - object not serializable (class: Abc, value: Abc@4f598dfb)
        - field (class: Abc$$anonfun$transform$1, name: $outer, type: class Abc)
        - object (class Abc$$anonfun$transform$1, <function1>)
  at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
  at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
  at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
  ... 57 more

我得到了一个
java.io.NotSerializableException:Xyz

这里有一篇很棒的文章,讨论了Apache Spark中的“可序列化”与“不可序列化对象”:

本文就以下方面提出了几点建议:

  • 你的案子怎么了

  • 一些替代方案,使您的对象不需要是“可序列化的”


操作系统spark的主要抽象是RDD,它跨集群的节点进行分区。因此,当我们运行RDD时,它在驱动程序节点中序列化,并分发到其他适当的节点。然后工作节点将其反序列化并执行

在您的情况下,类ABC无法序列化并分发到其他工作节点。 您需要使用Serializable序列化ABC类

class Abc with Serializable{
    def transform(x: RDD[Int]): RDD[Double] = { x.map(Abc.fn) }
}

工作不会发生在边缘节点上;类(或对象)必须序列化,以便数据节点可以运行它。因为您实际上没有定义序列化/反序列化函数,更不用说实现正确的接口了?()默认情况下,Sereilsion只能访问公共设置和可获取的内容。除此之外,您还需要提供自己的功能。
class Abc with Serializable{
    def transform(x: RDD[Int]): RDD[Double] = { x.map(Abc.fn) }
}