Multithreading org.apache.spark.SparkException:任务不可序列化,wh

Multithreading org.apache.spark.SparkException:任务不可序列化,wh,multithreading,scala,serialization,apache-spark,serializable,Multithreading,Scala,Serialization,Apache Spark,Serializable,当我实现自己的partioner并尝试洗牌原始rdd时,我遇到了一个问题。我知道这是由于引用了不可序列化的函数,但是在添加 扩展可序列化 对于每一个相关的类来说,这个问题仍然存在。我该怎么办 线程“main”org.apache.spark.SparkException中的异常:任务不可序列化 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) 位于org.apache.spar

当我实现自己的partioner并尝试洗牌原始rdd时,我遇到了一个问题。我知道这是由于引用了不可序列化的函数,但是在添加

扩展可序列化

对于每一个相关的类来说,这个问题仍然存在。我该怎么办

线程“main”org.apache.spark.SparkException中的异常:任务不可序列化 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) 位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) 位于org.apache.spark.SparkContext.clean(SparkContext.scala:1622)

对象STRPartitioner扩展了可序列化{
def应用(预期参数:Int,
采样器:双,
originRdd:RDD[顶点]:单位={
val绑定=计算绑定(原始RDD)
val rdd=originRdd.mapPartitions(
iter=>iter.map(行=>{
val cp=行
(cp.coordinate,cp.copy())
}
)
)
val partitioner=新的STRPartitioner(expectedParNum、sampleRate、bound、rdd)
val shuffled=新的shuffleddd[坐标,顶点,顶点](rdd,分割器)
shuffled.setSerializer(新KryoSerializer(新SparkConf(false)))
val result=shuffled.collect()
}
类STRPartitioner(expectedParNum:Int,
采样器:双,
绑定:MBR,

rdd:rdd[\p>我刚刚解决了这个问题!将-Dsun.io.serialization.extendedDebugInfo=true添加到您的VM配置中,您将以不可分解类为目标

object STRPartitioner extends Serializable{
  def apply(expectedParNum: Int,
        sampleRate: Double,
        originRdd: RDD[Vertex]): Unit= {
    val bound = computeBound(originRdd)
    val rdd = originRdd.mapPartitions(
      iter => iter.map(row => {
        val cp = row
        (cp.coordinate, cp.copy())
      }
      )
    )
    val partitioner = new STRPartitioner(expectedParNum, sampleRate, bound, rdd)
    val shuffled = new ShuffledRDD[Coordinate, Vertex, Vertex](rdd,  partitioner)
    shuffled.setSerializer(new KryoSerializer(new SparkConf(false)))
    val result = shuffled.collect()
  }

class STRPartitioner(expectedParNum: Int,
                     sampleRate: Double,
                     bound: MBR,
                     rdd: RDD[_ <: Product2[Coordinate, Vertex]])
  extends Partitioner with  Serializable {
    ... 
}