Scala 在转换中使用函数是否会导致不可序列化的异常?
我有一个Scala 在转换中使用函数是否会导致不可序列化的异常?,scala,function,apache-spark,matrix,serializable,Scala,Function,Apache Spark,Matrix,Serializable,我有一个Breeze DenseMatrix,我找到每行的平均值和每行的平均值,然后将它们放在另一个DenseMatrix,每列一个。但是我得到了任务不可序列化的异常。我知道sc不是Serializable,但我认为例外是因为我在安全区的转换中调用函数 我说得对吗?如果没有任何功能,怎么可能做到这一点呢?任何帮助都会很好 代码: 例外情况: org.apache.spark.SparkException: Task not serializable at org.apache.s
Breeze DenseMatrix
,我找到每行的平均值和每行的平均值
,然后将它们放在另一个DenseMatrix
,每列一个。但是我得到了任务不可序列化的异常。我知道sc
不是Serializable
,但我认为例外是因为我在安全区的转换中调用函数
我说得对吗?如果没有任何功能,怎么可能做到这一点呢?任何帮助都会很好
代码:
例外情况:
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.map(RDD.scala:369)
at ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1.apply(MotitorDetection.scala:85)
at ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1.apply(MotitorDetection.scala:82)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@6eee7027)
- field (class: ScalaApps.MotitorDetection$MonDetect, name: sc, type: class org.apache.spark.SparkContext)
- object (class ScalaApps.MotitorDetection$MonDetect, MonDetect())
- field (class: ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1, name: $outer, type: class ScalaApps.MotitorDetection$MonDetect)
- object (class ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1, <function2>)
- field (class: ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1$$anonfun$2, name: $outer, type: class ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1)
- object (class ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1$$anonfun$2, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 28 more
org.apache.spark.SparkException:任务不可序列化
位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
位于org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
位于org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
位于org.apache.spark.rdd.rdd$$anonfun$map$1.apply(rdd.scala:370)
位于org.apache.spark.rdd.rdd$$anonfun$map$1.apply(rdd.scala:369)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
位于org.apache.spark.rdd.rdd.withScope(rdd.scala:362)
位于org.apache.spark.rdd.rdd.map(rdd.scala:369)
在ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1.apply(MotitorDetection.scala:85)
在ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1.apply(MotitorDetection.scala:82)
在org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
在org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
在org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
位于org.apache.spark.streaming.dstream.dstream.createRDDWithLocalProperties(dstream.scala:416)
在org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
在org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply上(ForEachDStream.scala:50)
在org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply上(ForEachDStream.scala:50)
在scala.util.Try$.apply(Try.scala:192)
位于org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
在org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
位于org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
位于org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
在scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)中
位于org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
在java.lang.Thread.run(Thread.java:748)处,由以下原因引起:java.io.notserializableeexception:org.apache.spark.SparkContext序列化堆栈:
-对象不可序列化(类:org.apache.spark.SparkContext,值:org.apache.spark)。SparkContext@6eee7027)
-字段(类:ScalaApps.MotitorDetection$MonDetect,名称:sc,类型:class org.apache.spark.SparkContext)
-对象(类ScalaApps.MotitorDetection$MonDetect,MonDetect())
-字段(类:ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1,名称:$outer,类型:类ScalaApps.MotitorDetection$MonDetect)
-对象(类ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1,)
-字段(类:ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1$$anonfun$2,名称:$outer,类型:类ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1)
-对象(类ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1$$anonfun$2,)
位于org.apache.spark.serializer.SerializationDebugger$.ImproveeException(SerializationDebugger.scala:40)
位于org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
位于org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 28多
findMean方法是一种对象检测方法。对象MotitorDetection
具有板载SparkContext
,不可序列化。因此,rdd.map
中使用的任务是不可序列化的
将所有与矩阵相关的函数移到一个单独的可序列化对象中,MatrixUtils
,例如:
object MatrixUtils {
def findMean(a: BDM[Double]): BDV[Double] = {
var c = mean(a(*, ::))
c
}
def toMatrix(x: BDV[Double], y: BDV[Double], C: Int): BDM[Double]={
val m = BDM.zeros[Double](C,2)
m(::, 0) := x
m(::, 1) := y
m
}
...
}
然后只使用rdd.map(…)
中的那些方法:
它不起作用,但我想知道,仅仅通过转换进行相同的计算是否也会导致异常?@mkey您还去掉了计数器,并确保所有这些对象都不包含在某个(合成生成的)对象(repl?)中?。
object MatrixUtils {
def findMean(a: BDM[Double]): BDV[Double] = {
var c = mean(a(*, ::))
c
}
def toMatrix(x: BDV[Double], y: BDV[Double], C: Int): BDM[Double]={
val m = BDM.zeros[Double](C,2)
m(::, 0) := x
m(::, 1) := y
m
}
...
}
object MotitorDetection {
val sc = ...
def SafeZones(stream: DStream[(Int, BDM[Double])]){
import MatrixUtils._
... = rdd.map( ... )
}
}