Docker Spark executor将结果发送到一个随机端口,尽管所有端口都已显式设置

Docker Spark executor将结果发送到一个随机端口,尽管所有端口都已显式设置,docker,apache-spark,pyspark,Docker,Apache Spark,Pyspark,我正试图通过Docker中运行的Jupyter笔记本运行PySpark来运行spark作业。工人位于同一网络中的不同机器上。我正在对RDD执行take操作: data.take(元素的数量) 当元素数为2000时,一切正常。当值为20000时,会发生异常。在我看来,当结果的大小超过2GB时(或者在我看来是这样),它就会中断。2GB的想法来源于spark可以在一个块中发送小于2GB的结果,当结果大于2GB时,另一种机制开始工作,并在那里发生故障()。以下是executor日志中的异常: 19/1

我正试图通过Docker中运行的Jupyter笔记本运行PySpark来运行spark作业。工人位于同一网络中的不同机器上。我正在对RDD执行
take
操作:

data.take(元素的数量)
元素数
为2000时,一切正常。当值为20000时,会发生异常。在我看来,当结果的大小超过2GB时(或者在我看来是这样),它就会中断。2GB的想法来源于spark可以在一个块中发送小于2GB的结果,当结果大于2GB时,另一种机制开始工作,并在那里发生故障()。以下是executor日志中的异常:

19/11/05 10:27:14 INFO CodeGenerator: Code generated in 205.7623 ms
19/11/05 10:27:40 INFO PythonRunner: Times: total = 25421, boot = 3, init = 1751, finish = 23667
19/11/05 10:27:42 INFO MemoryStore: Block taskresult_4 stored as bytes in memory (estimated size 927.7 MB, free 6.4 GB)
19/11/05 10:27:42 INFO Executor: Finished task 0.0 in stage 3.0 (TID 4). 972788748 bytes result sent via BlockManager)
19/11/05 10:27:49 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1585998572000, chunkIndex=0}, buffer=org.apache.spark.storage.BlockManagerManagedBuffer@4399ad49} to /10.0.0.9:56222; closing connection
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
    at org.apache.spark.util.io.ChunkedByteBufferFileRegion.transferTo(ChunkedByteBufferFileRegion.scala:64)
    at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:121)
    at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
    at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
    at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.channel.DefaultChannelPipeline.flush(DefaultChannelPipeline.java:983)
    at io.netty.channel.AbstractChannel.flush(AbstractChannel.java:248)
    at io.netty.channel.nio.AbstractNioByteChannel$1.run(AbstractNioByteChannel.java:284)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
正如我们从日志中看到的,执行器尝试将结果发送到
10.0.0.9:56222
。它失败,因为端口未在docker compose中打开
10.0.0.9
是主节点的IP地址,但端口
56222
是随机的,尽管我在中明确设置了可以找到的所有端口以禁用随机端口选择:

spark=SparkSession.builder\
船长('spark://spark.cyber.com:7077')\
.appName(“我的应用程序”)\
.config('spark.task.maxFailures','16')\
.config('spark.driver.port','20002')\
.config('spark.driver.host','spark.cyber.com')\
.config('spark.driver.bindAddress','0.0.0.0')\
.config('spark.blockManager.port','6060')\
.config('spark.driver.blockManager.port','6060')\
.config('spark.shuffle.service.port','7070')\
.config('spark.driver.maxResultSize','14g')\
.getOrCreate()
我用docker compose映射了这些端口:

版本:“3”
服务:
朱皮特:
图片:jupyter/pyspark笔记本:最新版本
端口:
- "4040-4050:4040-4050"
- "6060:6060"
- "7070:7070"
- "8888:8888"
- "20000-20010:20000-20010"
我添加了

.config('spark.driver.memory', '14g')
正如@ML_TN所提议的,现在一切都正常了


从我的观点来看,内存设置影响spark使用的端口是很奇怪的。

您可能应该配置spark驱动程序内存以遵循docker容器内存设置。

您是否尝试过将docker容器的内存增加到4Go?例如,它是14GB。我认为,问题不在于内存,而在于ChunkedByteBufferFileRegion:@ML\u tn谢谢你,你的建议最终帮助了我。欢迎您发布答案,我将接受@ML\u TN