Multithreading org.apache.spark.SparkException:任务不可序列化,wh
当我实现自己的partioner并尝试洗牌原始rdd时,我遇到了一个问题。我知道这是由于引用了不可序列化的函数,但是在添加 扩展可序列化 对于每一个相关的类来说,这个问题仍然存在。我该怎么办 线程“main”org.apache.spark.SparkException中的异常:任务不可序列化 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) 位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) 位于org.apache.spark.SparkContext.clean(SparkContext.scala:1622)Multithreading org.apache.spark.SparkException:任务不可序列化,wh,multithreading,scala,serialization,apache-spark,serializable,Multithreading,Scala,Serialization,Apache Spark,Serializable,当我实现自己的partioner并尝试洗牌原始rdd时,我遇到了一个问题。我知道这是由于引用了不可序列化的函数,但是在添加 扩展可序列化 对于每一个相关的类来说,这个问题仍然存在。我该怎么办 线程“main”org.apache.spark.SparkException中的异常:任务不可序列化 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) 位于org.apache.spar
对象STRPartitioner扩展了可序列化{
def应用(预期参数:Int,
采样器:双,
originRdd:RDD[顶点]:单位={
val绑定=计算绑定(原始RDD)
val rdd=originRdd.mapPartitions(
iter=>iter.map(行=>{
val cp=行
(cp.coordinate,cp.copy())
}
)
)
val partitioner=新的STRPartitioner(expectedParNum、sampleRate、bound、rdd)
val shuffled=新的shuffleddd[坐标,顶点,顶点](rdd,分割器)
shuffled.setSerializer(新KryoSerializer(新SparkConf(false)))
val result=shuffled.collect()
}
类STRPartitioner(expectedParNum:Int,
采样器:双,
绑定:MBR,
rdd:rdd[\p>我刚刚解决了这个问题!将-Dsun.io.serialization.extendedDebugInfo=true添加到您的VM配置中,您将以不可分解类为目标
object STRPartitioner extends Serializable{
def apply(expectedParNum: Int,
sampleRate: Double,
originRdd: RDD[Vertex]): Unit= {
val bound = computeBound(originRdd)
val rdd = originRdd.mapPartitions(
iter => iter.map(row => {
val cp = row
(cp.coordinate, cp.copy())
}
)
)
val partitioner = new STRPartitioner(expectedParNum, sampleRate, bound, rdd)
val shuffled = new ShuffledRDD[Coordinate, Vertex, Vertex](rdd, partitioner)
shuffled.setSerializer(new KryoSerializer(new SparkConf(false)))
val result = shuffled.collect()
}
class STRPartitioner(expectedParNum: Int,
sampleRate: Double,
bound: MBR,
rdd: RDD[_ <: Product2[Coordinate, Vertex]])
extends Partitioner with Serializable {
...
}