Apache spark spark流媒体应用程序中的OOM异常

Apache spark spark流媒体应用程序中的OOM异常,apache-spark,Apache Spark,我在HDP2.3.2.0集群上使用Spark 1.4.1,我有一个简单的应用程序,可以创建一个从Kafka读取数据的数据流,并对其应用过滤器转换。此应用程序是使用主纱线客户端启动的。差不多一天之后,抛出以下异常: Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space at org.apache.spark.util.io.ByteArrayChunkOutputStre

我在HDP2.3.2.0集群上使用Spark 1.4.1,我有一个简单的应用程序,可以创建一个从Kafka读取数据的数据流,并对其应用过滤器转换。此应用程序是使用主纱线客户端启动的。差不多一天之后,抛出以下异常:

Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66)
at org.apache.spark.util.io.ByteArrayChunkOutputStream.write(ByteArrayChunkOutputStream.scala:55)
at org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:273)
at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:324)
at org.apache.spark.io.SnappyOutputStreamWrapper.close(CompressionCodec.scala:203)
at com.esotericsoftware.kryo.io.Output.close(Output.java:168)
at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:162)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1291)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:874)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1426)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
[Stage 53513:>                                                      (0 + 0) / 4]Exception in thread "JobGenerator" java.lang.OutOfMemoryError: GC overhead limit exceeded
at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:41)
at java.net.URL.openConnection(URL.java:972)
at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:237)
at java.lang.Class.getResourceAsStream(Class.java:2223)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:38)
at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:98)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:197)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:294)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:293)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.map(RDD.scala:293)
at org.apache.spark.streaming.dstream.MappedDStream$$anonfun$compute$1.apply(MappedDStream.scala:35)
at org.apache.spark.streaming.dstream.MappedDStream$$anonfun$compute$1.apply(MappedDStream.scala:35)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:35)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:339)
at org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
线程“dag调度程序事件循环”java.lang.OutOfMemoryError中的异常:java堆空间 位于org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66) 在org.apache.spark.util.io.ByteArrayChunkOutputStream.write上(ByteArrayChunkOutputStream.scala:55) 位于org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294) 位于org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:273) 位于org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:324) 位于org.apache.spark.io.SnappyOutputStreamWrapper.close(CompressionCodec.scala:203) 位于com.esotericsoftware.kryo.io.Output.close(Output.java:168) 位于org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:162) 位于org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203) 位于org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) 在org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) 位于org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) 在org.apache.spark.broadcast.broadcast上(BroadcastManager.scala:62) 位于org.apache.spark.SparkContext.broadcast(SparkContext.scala:1291) 位于org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitmissingstasks(DAGScheduler.scala:874) 位于org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815) 位于org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799) 位于org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1426) 位于org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) 位于org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) [阶段53513:>(0+0)/4]线程“JobGenerator”java.lang.OutOfMemoryError中出现异常:超出GC开销限制 位于sun.net.www.protocol.jar.Handler.openConnection(Handler.java:41) 位于java.net.URL.openConnection(URL.java:972) 位于java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:237) 位于java.lang.Class.getResourceAsStream(Class.java:2223) 位于org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:38) 位于org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:98) 位于org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:197) 位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132) 位于org.apache.spark.SparkContext.clean(SparkContext.scala:1893) 位于org.apache.spark.rdd.rdd$$anonfun$map$1.apply(rdd.scala:294) 位于org.apache.spark.rdd.rdd$$anonfun$map$1.apply(rdd.scala:293) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) 位于org.apache.spark.rdd.rdd.withScope(rdd.scala:286) 位于org.apache.spark.rdd.rdd.map(rdd.scala:293) 在org.apache.spark.streaming.dstream.MappedDStream$$anonfun$compute$1.apply上(MappedDStream.scala:35) 在org.apache.spark.streaming.dstream.MappedDStream$$anonfun$compute$1.apply上(MappedDStream.scala:35) 位于scala.Option.map(Option.scala:145) 位于org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:35) 在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:350) 在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:350) 在scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)中 位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1.apply(dstream.scala:349) 位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1.apply(dstream.scala:349) 位于org.apache.spark.streaming.dstream.dstream.createRDDWithLocalProperties(dstream.scala:399) 位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1.apply(dstream.scala:344) 位于org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1.apply(dstream.scala:342) 在scala.Option.orElse(Option.scala:257) 位于org.apache.spark.streaming.dstream.dstream.getOrCompute(dstream.scala:339) 位于org.apache.spark.streaming.dstream.filteredstream.compute(filteredstream.scala:35) 在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:350) 在org.apache.spark.streaming.dstream.dstream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(dstream.scala:350) 我转储了驱动程序进程的堆,似乎512 MB可用内存中的486.2 MB被org.apache.spark.deploy.warn.history.YarnHistoryService类的实例使用,该类包含大量org.apache.spark.deploy.history.warn.HandleSparkEvent类的实例。 我正在想办法解决这个问题,但到目前为止我还没有找到解决办法

有人能帮我解决这个问题吗


谢谢

在通过Spark用户列表()与Hortonworks工程师交换电子邮件后,我解决了这个问题。 基本上,有一个bug无法使驱动程序将对象org.apache.spark.deploy.history.warn.HandleSparkEvent发布到Thread Timeline服务器,因此随着时间的推移,这些消息的数量会不断增加,并将耗尽分配给驱动程序的内存。 为了避免这个问题,我从文件spark default.conf.

中删除了条目spark.swarn.services