Java Spark无法分配请求的地址:服务驱动程序在重试16次后失败
我有三个工人。(worker-1、worker-2、worker-3)使用Spark 2.0.2运行 Spark Master在worker-1上启动 我用以下脚本提交我的申请:Java Spark无法分配请求的地址:服务驱动程序在重试16次后失败,java,apache-spark,Java,Apache Spark,我有三个工人。(worker-1、worker-2、worker-3)使用Spark 2.0.2运行 Spark Master在worker-1上启动 我用以下脚本提交我的申请: #!/bin/bash sparkMaster=spark://worker-1:6066 mainClass=my.package.Main jar=/path/to/my/jar-with-dependencies.jar driverPort=7079 blockPort=7082 deployMode=
#!/bin/bash
sparkMaster=spark://worker-1:6066
mainClass=my.package.Main
jar=/path/to/my/jar-with-dependencies.jar
driverPort=7079
blockPort=7082
deployMode=cluster
$SPARK_HOME/bin/spark-submit \
--conf "spark.driver.port=${driverPort}"\
--conf "spark.blockManager.port=${blockPort}"\
--class $mainClass \
--master $sparkMaster \
--deploy-mode $deployMode \
$jar
当我的驱动程序在worker-1(worker+Master)上启动时,一切正常,并且我的应用程序使用所有worker正确执行
但是,当我的驱动程序启动另一个worker(worker-2或worker-3)时,他失败并出现错误:
Launch Command: "/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.2-bin-hadoop2.7/conf/:/root/spark-2.0.2-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=my.package.Main" "-Dspark.driver.port=7083" "-Dspark.blockManager.port=7082" "-Dspark.master=spark://worker-1:7077" "-Dspark.jars=file:/path/to/my/jar-with-dependencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@worker-2:7078" "/data/spark/work/driver-20181001132624-0001/jar-with-dependencies.jar" "my.package.Main"
========================================
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
...
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'Driver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'Driver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
我的3个工人配置如下:
SPARK_LOCAL_IP=worker-[X]
SPARK_LOCAL_DIRS=/data/spark/tmp
SPARK_WORKER_PORT=7078
SPARK_WORKER_DIR=/data/spark/work
SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=86400 -Dspark.worker.cleanup.interval=1800"
在多次尝试解决此问题后,我试图通过在提交中添加以下选项来强制启动主机上的驱动程序:
--conf "spark.driver.host=worker-1"
但是驱动程序仍然是随机启动的,所以它不能解决我的问题
编辑:
当我使用spark.driver.host
选项提交时,该选项不会出现在启动命令日志中(但会出现spark.driver.port
,因此我不明白为什么这次不使用我的选项)
编辑2:
我做了一些更深入的测试:
我现在只有一个worker-2在运行,仍然从运行我的master的worker-1提交
提交申请时,我可以在员工日志上看到:
2018-10-04 11:27:39,794 | dispatcher-event-loop-6 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Asked to launch driver driver-20181004112739-0003
2018-10-04 11:27:39,833 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Copying user jar file:/path/to/myjar-with-depencies.jar to /data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar
2018-10-04 11:27:39,833 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Copying /path/to/myjar-with-depencies.jar to /data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar
2018-10-04 11:27:40,243 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Launch Command: "/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.0-bin-hadoop2.7/conf/:/root/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.driver.supervise=false" "-Dspark.history.fs.cleaner.interval=12h" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://worker-1:7077" "-Dspark.history.fs.cleaner.maxAge=1d" "-Dspark.app.name=my.package.Main" "-Dspark.jars=file:/path/to/myjar-with-depencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@worker-2:7078" "/data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar" "my.package.Main"
2018-10-04 11:27:42,692 | dispatcher-event-loop-8 | WARN | org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Driver driver-20181004112739-0003 exited with failure
我的驱动程序日志中仍然有相同的错误。
然后,我尝试手动运行由DriverRunner启动的命令:
"/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.0-bin-hadoop2.7/conf/:/root/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.driver.supervise=false" "-Dspark.history.fs.cleaner.interval=12h" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://worker-1:7077" "-Dspark.history.fs.cleaner.maxAge=1d" "-Dspark.app.name=my.package.Main" "-Dspark.jars=file:/path/to/myjar-with-depencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@worker-2:7078" "/data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar" "my.package.Main"
当我这样做时,应用程序正确启动(令人惊讶)
手动启动与驱动程序运行程序启动之间有什么区别,这会导致绑定错误
注:
- 我没有对Driver-Runner命令行进行任何修改
- 我在root中手动启动了命令行,spark也在root中运行
- 在Spark 2.0.0和Spark 2.0.2上具有相同的行为
为了避免这个问题,我创建了第四台机器,只运行Spark Master,没有Spark-env.sh文件,我从中运行Spark submit。您的意思是说在Master machine中不需要配置“Spark-env.sh”和“Spark-defaults.conf”文件。只需要使用从主机名或IP的数量配置从文件。正确的?