Apache spark SparkContext.clean java.util.zip.ZipeException:无效的LOC头(错误签名)
这个奇怪的异常终止了我的spark任务,有什么想法吗 我正在通过sc.parallelize(256项的序列)向spark上下文“提交”许多较小的任务。(不要问我为什么,但这正是我需要的)Apache spark SparkContext.clean java.util.zip.ZipeException:无效的LOC头(错误签名),apache-spark,Apache Spark,这个奇怪的异常终止了我的spark任务,有什么想法吗 我正在通过sc.parallelize(256项的序列)向spark上下文“提交”许多较小的任务。(不要问我为什么,但这正是我需要的) 我不确定这是不是你和我遇到的相同问题,但我发现如果我做了一个spark summit,当作业运行时,我开始修改同一个jar(即scp一个新的集群构建),我会得到这个错误 java.util.zip.ZipException: invalid LOC header (bad signature) at
我不确定这是不是你和我遇到的相同问题,但我发现如果我做了一个spark summit,当作业运行时,我开始修改同一个jar(即scp一个新的集群构建),我会得到这个错误
java.util.zip.ZipException: invalid LOC header (bad signature)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.spark.util.Utils$$anonfun$copyStream$1.apply$mcJ$sp(Utils.scala:285)
at org.apache.spark.util.Utils$$anonfun$copyStream$1.apply(Utils.scala:253)
at org.apache.spark.util.Utils$$anonfun$copyStream$1.apply(Utils.scala:253)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250)
at org.apache.spark.util.Utils$.copyStream(Utils.scala:293)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:43)
at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:81)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.map(RDD.scala:323)
at org.apache.spark.sql.DataFrame.map(DataFrame.scala:1449)
at com.xxxxxx.spark.streaming.driver.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx$$anonfun$main$2.apply(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.scala:74)
at com.xxxxxx.spark.streaming.driver.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx$$anonfun$main$2.apply(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.scala:67)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
这有点离题,但我在做同样的事情时遇到的另一个错误是:
java.io.IOException: Class not found
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:40)
at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:81)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$flatMap$1.apply(RDD.scala:333)
at org.apache.spark.rdd.RDD$$anonfun$flatMap$1.apply(RDD.scala:332)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.flatMap(RDD.scala:332)
at org.apache.spark.streaming.dstream.FlatMappedDStream$$anonfun$compute$1.apply(FlatMappedDStream.scala:35)
at org.apache.spark.streaming.dstream.FlatMappedDStream$$anonfun$compute$1.apply(FlatMappedDStream.scala:35)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:346)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341)
at org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:346)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341)
at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:47)
at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:114)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:114)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
java.io.IOException:找不到类
位于org.apache.xbean.asm5.ClassReader.a(未知源)
位于org.apache.xbean.asm5.ClassReader。(未知来源)
位于org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:40)
在org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:81)上
位于org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
位于org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
位于org.apache.spark.rdd.rdd$$anonfun$flatMap$1.apply(rdd.scala:333)
位于org.apache.spark.rdd.rdd$$anonfun$flatMap$1.apply(rdd.scala:332)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
位于org.apache.spark.rdd.rdd.withScope(rdd.scala:316)
位于org.apache.spark.rdd.rdd.flatMap(rdd.scala:332)
在org.apache.spark.streaming.dstream.FlatMappedDStream$$anonfun$compute$1.apply上(FlatMappedDStream.scala:35)
在org.apache.spark.streaming.dstream.FlatMappedDStream$$anonfun$compute$1.apply上(FlatMappedDStream.scala:35)
位于scala.Option.map(Option.scala:145)
位于org.apache.spark.streaming.dstream.flatmappedstream.compute(flatmappedstream.scala:35)
在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:352)
在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:352)
在scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)中
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1.apply(dstream.scala:351)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1.apply(dstream.scala:351)
位于org.apache.spark.streaming.dstream.dstream.createRDDWithLocalProperties(dstream.scala:426)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1.apply(dstream.scala:346)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1.apply(dstream.scala:344)
在scala.Option.orElse(Option.scala:257)
位于org.apache.spark.streaming.dstream.dstream.getOrCompute(dstream.scala:341)
位于org.apache.spark.streaming.dstream.filteredstream.compute(filteredstream.scala:35)
在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:352)
在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:352)
在scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)中
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1.apply(dstream.scala:351)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1.apply(dstream.scala:351)
位于org.apache.spark.streaming.dstream.dstream.createRDDWithLocalProperties(dstream.scala:426)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1.apply(dstream.scala:346)
位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1.apply(dstream.scala:344)
在scala.Option.orElse(Option.scala:257)
位于org.apache.spark.streaming.dstream.dstream.getOrCompute(dstream.scala:341)
位于org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:47)
位于org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
位于org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:114)
位于scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
位于scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
位于scala.collection.mutable.resizeblearray$class.foreach(resizeblearray.scala:59)
位于scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
位于scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
位于scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
位于org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:114)
位于org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
位于org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
在scala.util.Try$.apply处(Try.scala:161)
位于org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
位于org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
位于org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
位于org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
位于org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
我遇到了同样的问题,问题是scp
管道没有复制整个jar,或者在复制期间连接被关闭
这意味着未正确复制可运行的jar
我再次运行scp来复制jar
文件,在看到完成100%
之后,我再次运行spark submit
作业,使用该jar
,成功地运行了作业
java.io.IOException: Class not found
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:40)
at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:81)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$flatMap$1.apply(RDD.scala:333)
at org.apache.spark.rdd.RDD$$anonfun$flatMap$1.apply(RDD.scala:332)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.flatMap(RDD.scala:332)
at org.apache.spark.streaming.dstream.FlatMappedDStream$$anonfun$compute$1.apply(FlatMappedDStream.scala:35)
at org.apache.spark.streaming.dstream.FlatMappedDStream$$anonfun$compute$1.apply(FlatMappedDStream.scala:35)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:346)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341)
at org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:346)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341)
at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:47)
at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:114)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:114)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)