Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/hibernate/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark任务不可序列化_Scala_Apache Spark_Exception_Serialization_Serializable - Fatal编程技术网

Scala Spark任务不可序列化

Scala Spark任务不可序列化,scala,apache-spark,exception,serialization,serializable,Scala,Apache Spark,Exception,Serialization,Serializable,我已经尝试了在StackOverflow上找到的所有解决方案,但尽管如此,我还是无法解决它。 我有一个“MainObj”对象,它实例化了一个“推荐”对象。当我调用“recommendationProducts”方法时,总是会出现错误。 以下是该方法的代码: def recommendationProducts(item: Int): Unit = { val aMatrix = new DoubleMatrix(Array(1.0, 2.0, 3.0)) def cosineSimilari

我已经尝试了在StackOverflow上找到的所有解决方案,但尽管如此,我还是无法解决它。 我有一个“MainObj”对象,它实例化了一个“推荐”对象。当我调用“recommendationProducts”方法时,总是会出现错误。 以下是该方法的代码:

def recommendationProducts(item: Int): Unit = {

val aMatrix = new DoubleMatrix(Array(1.0, 2.0, 3.0))

def cosineSimilarity(vec1: DoubleMatrix, vec2: DoubleMatrix): Double = {
  vec1.dot(vec2) / (vec1.norm2() * vec2.norm2())
}

val itemFactor = model.productFeatures.lookup(item).head
val itemVector = new DoubleMatrix(itemFactor)

//Here is where I get the error:
val sims = model.productFeatures.map { case (id, factor) =>
  val factorVector = new DoubleMatrix(factor)
  val sim = cosineSimilarity(factorVector, itemVector)
  (id, sim)
}

val sortedSims = sims.top(10)(Ordering.by[(Int, Double), Double] {
  case (id, similarity) => similarity
})

println("\nTop 10 products:")
sortedSims.map(x => (x._1, x._2)).foreach(println)
这就是错误:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.map(RDD.scala:369)
at RecommendationObj.recommendationProducts(RecommendationObj.scala:269)
at MainObj$.analisiIUNGO(MainObj.scala:257)
at MainObj$.menu(MainObj.scala:54)
at MainObj$.main(MainObj.scala:37)
at MainObj.main(MainObj.scala)
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@7c2312fa)
- field (class: RecommendationObj, name: sc, type: class org.apache.spark.SparkContext)
- object (class MainObj$$anon$1, MainObj$$anon$1@615bad16)
- field (class: RecommendationObj$$anonfun$37, name: $outer, type: class RecommendationObj)
- object (class RecommendationObj$$anonfun$37, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 14 more
线程“main”org.apache.spark.SparkException中的异常:任务不可序列化 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) 位于org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) 位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) 位于org.apache.spark.SparkContext.clean(SparkContext.scala:2094) 位于org.apache.spark.rdd.rdd$$anonfun$map$1.apply(rdd.scala:370) 位于org.apache.spark.rdd.rdd$$anonfun$map$1.apply(rdd.scala:369) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 位于org.apache.spark.rdd.rdd.withScope(rdd.scala:362) 位于org.apache.spark.rdd.rdd.map(rdd.scala:369) at RecommendationObj.recommendationProducts(RecommendationObj.scala:269) 在MainObj$.analisiungo(MainObj.scala:257) 在MainObj$.menu(MainObj.scala:54) 在MainObj$.main时(MainObj.scala:37) 在MainObj.main(MainObj.scala) 原因:java.io.NotSerializableException:org.apache.spark.SparkContext 序列化堆栈: -对象不可序列化(类:org.apache.spark.SparkContext,值:org.apache.spark)。SparkContext@7c2312fa) -字段(类:RecommendationObj,名称:sc,类型:class org.apache.spark.SparkContext) -对象(类MainObj$$anon$1,MainObj$$anon$1@615bad16) -字段(类:RecommendationObj$$anonfun$37,名称:$outer,类型:类RecommendationObj) -对象(类建议OBJ$$anonfun$37,) 位于org.apache.spark.serializer.SerializationDebugger$.ImproveeException(SerializationDebugger.scala:40) 位于org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) 位于org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) ... 14多 我试图补充: 1) “扩展可序列化”(Scala)到我的类 2) “扩展java.io.Serializable”到我的类 3) “暂时”到某些部分 4) 在这个类中获取模型(和其他特性)(现在我从另一个对象获取它们,并将它们像参数一样传递给我的类)

我如何解决它?我快疯了! 提前谢谢你

关键在于:

 field (class: RecommendationObj, name: sc, type: class org.apache.spark.SparkContext)
因此,您有一个名为sc的字段,类型为SparkContext。Spark希望序列化该类,因此他还尝试序列化所有字段

你应该:

  • 使用@transient注释并检查是否为null,然后重新创建
  • 不使用来自字段的SparkContext,而是将其放入方法的参数中。但是请记住,在map、flatMap等中,永远不要在闭包内使用SparkContext

谢谢!它起作用了!但是,我传递了类的类似SparkContext(sc)的参数,并使用它在构造函数中加载模型。它错了吗?@S.SP如果它对你有帮助,请投票并接受答案。这没有错,但是您必须使用
@transient
注释来表示serializer以不序列化itOk!非常感谢。我想投你一票,但我仍然没有15点的声望。我很抱歉,因为你的回答对我帮助很大@S.SP谢谢:)