Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何修复sparkstreaming中的任务不可序列化异常_Scala_Apache Spark - Fatal编程技术网

Scala 如何修复sparkstreaming中的任务不可序列化异常

Scala 如何修复sparkstreaming中的任务不可序列化异常,scala,apache-spark,Scala,Apache Spark,我想使用sparkstreaming总结internet日志。我有变换 将数据记录到地图中。计算过程中发生错误 将spark序列化配置设置为avro。但它不起作用 代码如下: 。。。 val sc=新的SparkContext(配置) ... val lines=kafkaStream.map(u._2) .map{{.split(“\\\\\”)} .map{arr=> 地图( ... ) } lines.print()//这很有效 lines.map{clearMap=>//行异常指向 ..

我想使用sparkstreaming总结internet日志。我有变换 将数据记录到地图中。计算过程中发生错误

将spark序列化配置设置为avro。但它不起作用

代码如下:

。。。
val sc=新的SparkContext(配置)
...
val lines=kafkaStream.map(u._2)
.map{{.split(“\\\\\”)}
.map{arr=>
地图(
...
)
}
lines.print()//这很有效
lines.map{clearMap=>//行异常指向
...
val filter=new RowFilter(CompareOp.EQUAL,新的RegexStringComparator(“^\\d+“+uvid+“*.$”)
val r=HBaseUtils.queryFromHBase(sc,“flux”,zerotime.getBytes,nowtime.getBytes,filter)
val uv=if(r.count()==0)1其他0
val sCount=clearMap(“sCount”)
val vv=if(sscount==“0”)1其他0
val cip=clearMap(“cip”)
val filter2=new RowFilter(CompareOp.EQUAL,new RegexStringComparator(“^\\d+\\\d+\\\d+\\\d+\\\\\\\\\\\\\\\++cip+”.*$”)
val r2=HBaseUtils.queryFromHBase(sc,“flux”,zerotime.getBytes,nowtime.getBytes,filter2)
val newip=if(r2.count()==0)1其他0
val filter3=新的行过滤器(CompareOp.EQUAL,新的RegexStringComparator(“^\\d+u“+uvid+”)。*$”)
val r3=HBaseUtils.queryFromHBase(sc,“flux”,null,nowtime.getBytes,filter3)
val newcust=if(r3.count()==0)1其他0
(nowtime、pv、uv、vv、newip、newcust)
}
...
以下是异常消息:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2056)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:546)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:546)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.SparkContext.withScope(SparkContext.scala:679)
    at org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:264)
    at org.apache.spark.streaming.dstream.DStream.map(DStream.scala:545)
    at cn.tedu.flux.fluxdriver$.main(fluxdriver.scala:73)
    at cn.tedu.flux.fluxdriver.main(fluxdriver.scala)
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
    - object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@3fc08eec)
    - field (class: cn.tedu.flux.fluxdriver$$anonfun$main$2, name: sc$1, type: class org.apache.spark.SparkContext)
    - object (class cn.tedu.flux.fluxdriver$$anonfun$main$2, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 12 more
线程“main”org.apache.spark.sparkeexception中的异常:任务不能在org.apache.spark.util.ClosureCleaner$.ensureSerializable上序列化(ClosureCleaner.scala:298)
位于org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
位于org.apache.spark.SparkContext.clean(SparkContext.scala:2056)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$map$1.apply(dstream.scala:546)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$map$1.apply(dstream.scala:546)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
位于org.apache.spark.SparkContext.withScope(SparkContext.scala:679)
位于org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:264)
位于org.apache.spark.streaming.dstream.dstream.map(dstream.scala:545)
位于cn.tedu.flux.fluxdriver$.main(fluxdriver.scala:73)
位于cn.tedu.flux.fluxdriver.main(fluxdriver.scala)
原因:java.io.NotSerializableException:org.apache.spark.SparkContext
序列化堆栈:
-对象不可序列化(类:org.apache.spark.SparkContext,值:org.apache.spark)。SparkContext@3fc08eec)
-字段(类:cn.tedu.flux.fluxdriver$$anonfun$main$2,名称:sc$1,类型:class org.apache.spark.SparkContext)
-对象(类cn.tedu.flux.fluxdriver$$anonfun$main$2,)
位于org.apache.spark.serializer.SerializationDebugger$.ImproveeException(SerializationDebugger.scala:40)
位于org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
位于org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 还有12个

我已解决此问题。SparkContext在函数中定义时无法序列化为参数。 所以我试着把它定义为这样一种态度:

对象驱动程序{

var sc:SparkContext=null
def main(arg:Array[String]):Unit = {
    sc = new SparkContext();
....
def main(arg:Array[String]):Unit = {
} }

这就行了

以前是这样的,

对象驱动程序{

var sc:SparkContext=null
def main(arg:Array[String]):Unit = {
    sc = new SparkContext();
....
def main(arg:Array[String]):Unit = {
vla sc=新SparkContext

}
}

我已解决此问题。SparkContext在函数中定义时无法序列化为参数。 所以我试着把它定义为这样一种态度:

对象驱动程序{

var sc:SparkContext=null
def main(arg:Array[String]):Unit = {
    sc = new SparkContext();
....
def main(arg:Array[String]):Unit = {
} }

这就行了

以前是这样的,

对象驱动程序{

var sc:SparkContext=null
def main(arg:Array[String]):Unit = {
    sc = new SparkContext();
....
def main(arg:Array[String]):Unit = {
vla sc=新SparkContext

}
}

异常消息告诉您的
SparkContext
不可序列化。也许在声明
sc
之前添加
@transient
注释
@transient val sc=new SparkContext(conf)
在sc之前添加@transient也不起作用。异常消息告诉您您的
SparkContext
不可序列化。也许在声明
sc
之前添加
@transient
注释
@transient val sc=new SparkContext(conf)
在sc之前添加@transient也不起作用