Apache spark 火花流作业在运行约1小时后被终止

Apache spark 火花流作业在运行约1小时后被终止,apache-spark,apache-kafka,spark-streaming,Apache Spark,Apache Kafka,Spark Streaming,我有一个spark流媒体工作,从gnip读取推特流,并将其写入Kafak Spark和kafka在同一个集群上运行 我的集群由5个节点组成。卡夫卡b01。。。卡夫卡-b05 Spark master正在Kafak-b05上运行 下面是我们提交spark作业的方式 nohup sh$SPZRK_HOME/bin/spark submit——执行器内核总数5——class com.test.java.gnipStreaming.GnipSparkStreamer——主spark://kafka-b0

我有一个spark流媒体工作,从gnip读取推特流,并将其写入Kafak

Spark和kafka在同一个集群上运行

我的集群由5个节点组成。卡夫卡b01。。。卡夫卡-b05

Spark master正在Kafak-b05上运行

下面是我们提交spark作业的方式

nohup sh$SPZRK_HOME/bin/spark submit——执行器内核总数5——class com.test.java.gnipStreaming.GnipSparkStreamer——主spark://kafka-b05:7077 GnipStreamContainer.jar powertrack kafka-b01、kafka-b02、kafka-b03、kafka-b04、kafka-b05 gnip_live_stream 2&

大约一个小时后,火花作业被杀死

nohub文件中的日志显示以下异常

org.apache.spark.storage.BlockFetchException: Failed to fetch block from 2 locations. Most recent failure cause: 
        at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595) 
        at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585) 
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
        at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585) 
        at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570) 
        at org.apache.spark.storage.BlockManager.get(BlockManager.scala:630) 
        at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:48) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
        at org.apache.spark.scheduler.Task.run(Task.scala:89) 
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
        at java.lang.Thread.run(Thread.java:745) 
Caused by: io.netty.channel.ChannelException: Unable to create Channel from class class io.netty.channel.socket.nio.NioSocketChannel 
        at io.netty.bootstrap.AbstractBootstrap$BootstrapChannelFactory.newChannel(AbstractBootstrap.java:455) 
        at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:306) 
        at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:134) 
        at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116) 
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:211) 
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) 
        at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90) 
        at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) 
        at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) 
        at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:99) 
        at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:89) 
        at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:588) 
        ... 15 more 
Caused by: io.netty.channel.ChannelException: Failed to open a socket. 
        at io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:62) 
        at io.netty.channel.socket.nio.NioSocketChannel.<init>(NioSocketChannel.java:72) 
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
        at java.lang.Class.newInstance(Class.java:442) 
        at io.netty.bootstrap.AbstractBootstrap$BootstrapChannelFactory.newChannel(AbstractBootstrap.java:453) 
        ... 26 more 
Caused by: java.net.SocketException: Too many open files 
        at sun.nio.ch.Net.socket0(Native Method) 
        at sun.nio.ch.Net.socket(Net.java:411) 
        at sun.nio.ch.Net.socket(Net.java:404) 
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:105) 
        at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) 
        at io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:60) 
        ... 33 more
第二个例外(似乎)与卡夫卡有关,而不是火花

你认为问题是什么

编辑

根据尤瓦尔·伊扎科夫的评论,这里是拖缆的代码

主课


customer receiver类

问题在于,您在迭代
DStream.foreachPartition
时实例化了
Producer
的一个新实例。如果您有一个数据密集型流,这可能会导致分配大量生产者并尝试连接到Kafka

我要确保的第一件事是,一旦使用
finally
块发送完数据并调用:

public Void调用(JavaRDD-rdd)引发异常{
foreachPartition(新的VoidFunction(){
@凌驾
公共无效调用(迭代器itr)引发异常{
尝试
{
Producer-Producer=getProducer(主机);
while(itr.hasNext()){
试一试{
KeyedMessage消息=
新的KeyedMessage(主题,itr.next());
生产者。发送(消息);
}捕获(例外e){
e、 printStackTrace();
}
}最后{
制作人关闭()
}
}
});
返回null;
}

如果这仍然不起作用,并且您看到的连接太多,我会为Kafka生产者创建一个对象池,您可以按需为其创建对象池。这样,您可以显式控制正在使用的可用生产者的数量和打开的套接字的数量。

您需要向我们显示写入Kafka的代码。并发连接的数量s不应超过您在Kafka中设置的每个主题的分区数。增加文件句柄数只是一个补丁,您很可能在代码中出错。Thnx Yuval,我为代码添加了两个链接。顺便说一句,当我将同一作业提交给本地spark安装时,它可以正常工作,没有任何问题。我怀疑我认为卡夫卡在同一个火花簇上的存在是这个问题的根本原因。你认为呢?我在14小时前添加了制作人.close()行,工作进展顺利。谢谢Yuval。Yuval,关闭制作人使工作一直在运行,但现在工作人员的任务充满了“kafka.producer.ProducerClosedException:生产者已关闭"例外。你对此有什么想法吗。你是否要在finally块中关闭它们?是的,正如你提到的那样。我做了两个更改,解决了我所有的问题。第一个更改是在提交作业时使用--executor memory 2G。另一个更改是在while循环之外关闭生产者。作业到目前为止运行了4个小时,没有e在stderr中有任何异常。@Fanooos哎呀,我忘了我们在迭代器中。当然这是修复方法:)添加
最后在循环外尝试
java.nio.channels.ClosedChannelException 
        at kafka.network.BlockingChannel.send(BlockingChannel.scala:110) 
        at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:75) 
        at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:74) 
        at kafka.producer.SyncProducer.send(SyncProducer.scala:119) 
        at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59) 
        at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82) 
        at kafka.producer.BrokerPartitionInfo.getBrokerPartitionInfo(BrokerPartitionInfo.scala:49) 
        at kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$getPartitionListForTopic(DefaultEventHandler.scala:188) 
        at kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:152) 
        at kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:151) 
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
        at kafka.producer.async.DefaultEventHandler.partitionAndCollate(DefaultEventHandler.scala:151) 
        at kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:96) 
        at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73) 
        at kafka.producer.Producer.send(Producer.scala:77) 
        at kafka.javaapi.producer.Producer.send(Producer.scala:33) 
        at com.test.java.gnipStreaming.GnipSparkStreamer$1$1.call(GnipSparkStreamer.java:59) 
        at com.test.java.gnipStreaming.GnipSparkStreamer$1$1.call(GnipSparkStreamer.java:51) 
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:225) 
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:225) 
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) 
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) 
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) 
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) 
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
        at org.apache.spark.scheduler.Task.run(Task.scala:89) 
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
        at java.lang.Thread.run(Thread.java:745)
public Void call(JavaRDD<String> rdd) throws Exception {
    rdd.foreachPartition(new VoidFunction<Iterator<String>>() {

        @Override
        public void call(Iterator<String> itr) throws Exception {
                            try
                            {
               Producer<String, String> producer = getProducer(hosts);
               while(itr.hasNext()) {
                 try {
                    KeyedMessage<String, String> message = 
                        new KeyedMessage<String, String>(topic, itr.next());
                    producer.send(message);
                   } catch (Exception e) {
                    e.printStackTrace();
                   }
               } finally {
                                   producer.close()
                               }
        }
    });
    return null;
}