Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Spark无法分配请求的地址:服务驱动程序在重试16次后失败_Java_Apache Spark - Fatal编程技术网

Java Spark无法分配请求的地址:服务驱动程序在重试16次后失败

Java Spark无法分配请求的地址:服务驱动程序在重试16次后失败,java,apache-spark,Java,Apache Spark,我有三个工人。(worker-1、worker-2、worker-3)使用Spark 2.0.2运行 Spark Master在worker-1上启动 我用以下脚本提交我的申请: #!/bin/bash sparkMaster=spark://worker-1:6066 mainClass=my.package.Main jar=/path/to/my/jar-with-dependencies.jar driverPort=7079 blockPort=7082 deployMode=

我有三个工人。(worker-1、worker-2、worker-3)使用Spark 2.0.2运行

Spark Master在worker-1上启动

我用以下脚本提交我的申请:

#!/bin/bash

sparkMaster=spark://worker-1:6066
mainClass=my.package.Main

jar=/path/to/my/jar-with-dependencies.jar

driverPort=7079
blockPort=7082

deployMode=cluster

$SPARK_HOME/bin/spark-submit \
  --conf "spark.driver.port=${driverPort}"\
  --conf "spark.blockManager.port=${blockPort}"\
  --class $mainClass \
  --master $sparkMaster \
  --deploy-mode $deployMode \
  $jar
当我的驱动程序在worker-1(worker+Master)上启动时,一切正常,并且我的应用程序使用所有worker正确执行

但是,当我的驱动程序启动另一个worker(worker-2或worker-3)时,他失败并出现错误:

Launch Command: "/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.2-bin-hadoop2.7/conf/:/root/spark-2.0.2-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=my.package.Main" "-Dspark.driver.port=7083" "-Dspark.blockManager.port=7082" "-Dspark.master=spark://worker-1:7077" "-Dspark.jars=file:/path/to/my/jar-with-dependencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@worker-2:7078" "/data/spark/work/driver-20181001132624-0001/jar-with-dependencies.jar" "my.package.Main"
========================================

org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
...
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.

Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'Driver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'Driver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:748)
我的3个工人配置如下:

SPARK_LOCAL_IP=worker-[X]
SPARK_LOCAL_DIRS=/data/spark/tmp
SPARK_WORKER_PORT=7078
SPARK_WORKER_DIR=/data/spark/work
SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=86400 -Dspark.worker.cleanup.interval=1800"
在多次尝试解决此问题后,我试图通过在提交中添加以下选项来强制启动主机上的驱动程序:

  --conf "spark.driver.host=worker-1"
但是驱动程序仍然是随机启动的,所以它不能解决我的问题

编辑:

当我使用
spark.driver.host
选项提交时,该选项不会出现在启动命令日志中(但会出现
spark.driver.port
,因此我不明白为什么这次不使用我的选项)

编辑2:

我做了一些更深入的测试: 我现在只有一个worker-2在运行,仍然从运行我的master的worker-1提交

提交申请时,我可以在员工日志上看到:

2018-10-04 11:27:39,794 | dispatcher-event-loop-6 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Asked to launch driver driver-20181004112739-0003
2018-10-04 11:27:39,833 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Copying user jar file:/path/to/myjar-with-depencies.jar to /data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar
2018-10-04 11:27:39,833 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Copying /path/to/myjar-with-depencies.jar to /data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar
2018-10-04 11:27:40,243 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Launch Command: "/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.0-bin-hadoop2.7/conf/:/root/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.driver.supervise=false" "-Dspark.history.fs.cleaner.interval=12h" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://worker-1:7077" "-Dspark.history.fs.cleaner.maxAge=1d" "-Dspark.app.name=my.package.Main" "-Dspark.jars=file:/path/to/myjar-with-depencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@worker-2:7078" "/data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar" "my.package.Main"
2018-10-04 11:27:42,692 | dispatcher-event-loop-8 | WARN | org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Driver driver-20181004112739-0003 exited with failure
我的驱动程序日志中仍然有相同的错误。 然后,我尝试手动运行由DriverRunner启动的命令:

"/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.0-bin-hadoop2.7/conf/:/root/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.driver.supervise=false" "-Dspark.history.fs.cleaner.interval=12h" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://worker-1:7077" "-Dspark.history.fs.cleaner.maxAge=1d" "-Dspark.app.name=my.package.Main" "-Dspark.jars=file:/path/to/myjar-with-depencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@worker-2:7078" "/data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar" "my.package.Main"
当我这样做时,应用程序正确启动(令人惊讶)

手动启动与驱动程序运行程序启动之间有什么区别,这会导致绑定错误

注:

  • 我没有对Driver-Runner命令行进行任何修改
  • 我在root中手动启动了命令行,spark也在root中运行
  • 在Spark 2.0.0和Spark 2.0.2上具有相同的行为

所以,当我发现这种奇怪行为的原因时,我回答了自己的问题

当我从一台有spark-env.sh文件的机器上运行spark submit时,确实会发生这种情况。更准确地说,当在本机中设置SPARK_LOCAL_IP时


为了避免这个问题,我创建了第四台机器,只运行Spark Master,没有Spark-env.sh文件,我从中运行Spark submit。

您的意思是说在Master machine中不需要配置“Spark-env.sh”和“Spark-defaults.conf”文件。只需要使用从主机名或IP的数量配置从文件。正确的?