Apache spark Spark mesos群集在工作人员之间失去连通性

Apache spark Spark mesos群集在工作人员之间失去连通性,apache-spark,mesos,Apache Spark,Mesos,我有一个主设备和9个从设备,集群中总共有30GBRAM 这并不总是发生,但我失去了工人之间的联系 数据量很低

我有一个主设备和9个从设备,集群中总共有30GBRAM

这并不总是发生,但我失去了工人之间的联系

数据量很低<500 MB,我可以用笔记本电脑中的docker群集运行查询没有问题,这里的问题/方法是什么

在复杂过滤器的某个点上,此错误出现在stderr上:

    20/10/08 12:27:39 ERROR ShuffleBlockFetcherIterator: Failed to get block(s) from XXXXXX:34483
java.io.IOException: Failed to connect to XXXXX/XXXXX:34483
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:114)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: dub901mps501.kubikdata.aws/172.31.6.10:34483
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    ... 2 more
Caused by: java.net.ConnectException: Connection refused
在查看了失败的阶段之后,它看起来总是在计数中。这到底是怎么回事?数据集很小,是内存问题吗?随机读写是<1K.B

count at NativeMethodAccessorImpl.java:0 +details

org.apache.spark.sql.Dataset.count(Dataset.scala:2835)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
我检查了端口,它们已经打开了