Java SparkAppHandle状态在提交后丢失,但驱动程序运行正常
我正在使用spark java API向本地spark集群(1个主集群+1个工作集群)提交一个驱动程序。 在连接了侦听器的情况下调用startApplication后,对stateChanged的第一次调用将给出丢失状态 驱动程序提交正常,在worker中运行良好 我试过使用等待循环而不是侦听器 我试过Spark版本2.3.1和2.4.3 我试过OSX和Ubuntu 我已尝试将Spark Master主机更改为机器的IP而不是名称Java SparkAppHandle状态在提交后丢失,但驱动程序运行正常,java,apache-spark,Java,Apache Spark,我正在使用spark java API向本地spark集群(1个主集群+1个工作集群)提交一个驱动程序。 在连接了侦听器的情况下调用startApplication后,对stateChanged的第一次调用将给出丢失状态 驱动程序提交正常,在worker中运行良好 我试过使用等待循环而不是侦听器 我试过Spark版本2.3.1和2.4.3 我试过OSX和Ubuntu 我已尝试将Spark Master主机更改为机器的IP而不是名称 SparkLauncher launcher = new Spa
SparkLauncher launcher = new SparkLauncher(env)
.setAppResource(path)
.setMainClass("full.package.name.RTADriver")
.setMaster("spark://" + sparkMasterHost + ":" + sparkMasterPort)
.setAppName("rta_scala_app_")
.setDeployMode("cluster")
.setConf("spark.ui.enabled", "true")
.addAppArgs(runnerStr)
.setVerbose(true);
SparkAppHandle handle = launcher.startApplication();
while (!handle.getState().equals(SparkAppHandle.State.FINISHED)){
System.out.println("Wait Loop: App_ID: " + handle.getAppId() + " state: " + handle.getState());
Thread.sleep(10000);
}
我的代码中System.out的日志:
重要的spark提交日志:
我只是遇到了同样的情况。我的猜测是由于部署模式“集群”,spark驱动程序进程与spark launcher进程在不同的主机上运行;因此,启动器进程“丢失”了与spark应用程序的连接
First State App_ID: null state: UNKNOWN
Wait Loop: App_ID: null state: UNKNOWN
Wait Loop: App_ID: null state: LOST
Wait Loop: App_ID: null state: LOST
...
INFO: 19/06/04 11:27:54 INFO Utils: Successfully started service 'driverClient' on port 52077.
INFO: 19/06/04 11:27:54 INFO TransportClientFactory: Successfully created connection to /10.10.0.179:7077 after 34 ms (0 ms spent in bootstraps)
INFO: 19/06/04 11:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20190604112754-0030
INFO: 19/06/04 11:27:54 INFO ClientEndpoint: ... waiting before polling master for driver state
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: ... polling master for driver state
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: State of driver-20190604112754-0030 is RUNNING
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: Driver running on 10.10.0.179:49705 (worker-20190603154544-10.10.0.179-49705)
INFO: 19/06/04 11:27:59 INFO ShutdownHookManager: Shutdown hook called
INFO: 19/06/04 11:27:59 INFO ShutdownHookManager: Deleting directory /private/var/folders/90/pgndgkk11lj0qb4q5qw_f03c0000gn/T/spark-8d8d92b9-8d0c-43a1-8bb9-3d08f1519c53
Wait Loop: App_ID: null state: LOST
...