Akka 即使作业正在运行,Flink作业提交也会失败

Akka 即使作业正在运行,Flink作业提交也会失败,akka,apache-flink,flink-streaming,akka-actor,Akka,Apache Flink,Flink Streaming,Akka Actor,我使用的是独立的Flink集群,有1个jobManager和n个TaskManager。当我尝试通过命令行提交作业时,作业提交失败,错误消息如下 org.apache.flink.client.program.ProgramInvocationException:无法提交作业(作业ID:f839aefee74aa4483ce8f8fd2e49b69e) 在jobManager实例上,在作业从部署切换到运行之前,一切正常。 在这之后,一旦akka timeut过期,我会看到下面的stacktrac

我使用的是独立的Flink集群,有1个jobManager和n个TaskManager。当我尝试通过命令行提交作业时,作业提交失败,错误消息如下
org.apache.flink.client.program.ProgramInvocationException:无法提交作业(作业ID:f839aefee74aa4483ce8f8fd2e49b69e)

在jobManager实例上,在作业从部署切换到运行之前,一切正常。 在这之后,一旦akka timeut过期,我会看到下面的stacktrace

akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-177004106]] after [100000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
    at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
    at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
    at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
    at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
    at java.lang.Thread.run(Thread.java:745)
我在github上查看了flink代码,执行作业所需的所有步骤似乎都运行良好。但是,当jobManager必须向触发作业的flink客户端发出作业提交确认时,JobSubmitMandler在akka调度程序上超时,据我所知,akka调度程序负责与作业客户端通信

Flink作业由1个源(卡夫卡)、2个操作员和1个接收器(自定义接收器)组成。 以下链接显示jobManager日志:

一旦调度程序超时,所有其他Flink UI调用也会超时,但有相同的异常

以下是用于通过命令行提交作业的flink客户端日志

2019-09-28 19:34:21,321 INFO  org.apache.flink.client.cli.CliFrontend                       - --------------------------------------------------------------------------------
2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend                       -  Starting Command Line Client (Version: 1.8.0, Rev:<unknown>, Date:<unknown>)
2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend                       -  OS current user: root
2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend                       -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend                       -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.5-b02
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -  Maximum heap size: 2677 MiBytes
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -  JAVA_HOME: (not set)
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -  No Hadoop Dependency available
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -  JVM Options:
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -     -Dlog.file=/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/log/flink-root-client-fulfillment-stream-processor-flink-task-manager-2-8047357.log
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -     -Dlog4j.configuration=file:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/conf/log4j-cli.properties
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -     -Dlogback.configurationFile=file:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/conf/logback.xml
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -  Program Arguments:
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -     run
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -     -d
2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend                       -     -c
2019-09-28 19:34:21,324 INFO  org.apache.flink.client.cli.CliFrontend                       -     /home/fse/flink-kafka-relayer-0.2.jar
2019-09-28 19:34:21,324 INFO  org.apache.flink.client.cli.CliFrontend                       -  Classpath: /var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/log4j-1.2.17.jar:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/slf4j-log4j12-1.7.15.jar:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/flink-dist_2.11-1.8.0.jar:::
2019-09-28 19:34:21,324 INFO  org.apache.flink.client.cli.CliFrontend                       - --------------------------------------------------------------------------------
2019-09-28 19:34:21,328 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, <job-manager-ip>
2019-09-28 19:34:21,328 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-09-28 19:34:21,328 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-09-28 19:34:21,329 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-09-28 19:34:21,329 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 4
2019-09-28 19:34:21,329 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2019-09-28 19:34:21,329 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.jmx.class, org.apache.flink.metrics.jmx.JMXReporter
2019-09-28 19:34:21,329 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.jmx.port, 8789
2019-09-28 19:34:21,333 WARN  org.apache.flink.client.cli.CliFrontend                       - Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli.
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:259)
    at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1230)
    at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1190)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1115)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 5 more
2019-09-28 19:34:21,343 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2019-09-28 19:34:21,545 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2019-09-28 19:34:21,560 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2019-09-28 19:34:21,561 INFO  org.apache.flink.client.cli.CliFrontend                       - Running 'run' command.
2019-09-28 19:34:21,566 INFO  org.apache.flink.client.cli.CliFrontend                       - Building program from JAR file
2019-09-28 19:34:21,744 INFO  org.apache.flink.configuration.Configuration                  - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2019-09-28 19:34:21,896 INFO  org.apache.flink.runtime.rest.RestClient                      - Rest client endpoint started.
2019-09-28 19:34:21,898 INFO  org.apache.flink.client.cli.CliFrontend                       - Starting execution of program
2019-09-28 19:34:21,898 INFO  org.apache.flink.client.program.rest.RestClusterClient        - Starting program in interactive mode (detached: true)
2019-09-28 19:34:22,594 WARN  org.apache.flink.streaming.api.environment.StreamContextEnvironment  - Job was executed in detached mode, the results will be available on completion.
2019-09-28 19:34:22,632 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, <job-manager-ip>
2019-09-28 19:34:22,632 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-09-28 19:34:22,632 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-09-28 19:34:22,633 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-09-28 19:34:22,633 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 4
2019-09-28 19:34:22,633 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2019-09-28 19:34:22,633 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.jmx.class, org.apache.flink.metrics.jmx.JMXReporter
2019-09-28 19:34:22,633 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.jmx.port, 8789
2019-09-28 19:34:22,635 INFO  org.apache.flink.client.program.rest.RestClusterClient        - Submitting job f839aefee74aa4483ce8f8fd2e49b69e (detached: true).
2019-09-28 19:36:04,341 INFO  org.apache.flink.runtime.rest.RestClient                      - Shutting down rest endpoint.
2019-09-28 19:36:04,343 INFO  org.apache.flink.runtime.rest.RestClient                      - Rest endpoint shutdown complete.
2019-09-28 19:36:04,343 ERROR org.apache.flink.client.cli.CliFrontend                       - Error while running the command.
org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: f839aefee74aa4483ce8f8fd2e49b69e)
    at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:250)
    at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
    at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
    at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:429)
    at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
    at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
    at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
    at org.apache.flink.client.cli.CliFrontend$$Lambda$5/1971851377.call(Unknown Source)
    at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
    at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:388)
    at org.apache.flink.client.program.rest.RestClusterClient$$Lambda$17/788892554.apply(Unknown Source)
    at java.util.concurrent.CompletableFuture$ExceptionCompletion.run(CompletableFuture.java:1246)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
    at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210)
    at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
    at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210)
    at java.util.concurrent.CompletableFuture$ThenCopy.run(CompletableFuture.java:1333)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361)
    at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:207)
    at org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$34/1092254958.accept(Unknown Source)
    at java.util.concurrent.CompletableFuture$WhenCompleteCompletion.run(CompletableFuture.java:1298)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
    at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210)
    at java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:626)
    at java.util.concurrent.CompletableFuture$Async.run(CompletableFuture.java:428)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error., <Exception on server side:
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-177004106]] after [100000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
    at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
    at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
    at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
    at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
    at java.lang.Thread.run(Thread.java:745)

End of exception on server side>]
    at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:389)
    at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:373)
    at org.apache.flink.runtime.rest.RestClient$$Lambda$33/1155836850.apply(Unknown Source)
    at java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:604)
    ... 4 more
2019-09-28 19:34:21321信息org.apache.flink.client.cli.CliFrontend---------------------------------------------------------------------------------
2019-09-28 19:34:21322 INFO org.apache.flink.client.cli.CliFrontend-启动命令行客户端(版本:1.8.0,版本:,日期:)
2019-09-28 19:34:21322 INFO org.apache.flink.client.cli.CliFrontend-操作系统当前用户:root
2019-09-28 19:34:21322 INFO org.apache.flink.client.cli.CliFrontend-当前Hadoop/Kerberos用户:
2019-09-28 19:34:21322 INFO org.apache.flink.client.cli.CliFrontend-JVM:Java热点(TM)64位服务器虚拟机-Oracle公司-1.8/25.5-b02
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend-最大堆大小:2677兆字节
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend-JAVA_HOME:(未设置)
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend-没有可用的Hadoop依赖项
2019-09-28 19:34:21323信息org.apache.flink.client.cli.CliFrontend-JVM选项:
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend--Dlog.file=/var/lib/fulfillment stream processor/flink executables/flink executables/log/flink-root-client-fulfillment-stream-processor-flink-task-manager-2-8047357.log
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend--Dlog4j.configuration=file:/var/lib/fulfillment stream processor/flink executables/flink executables/conf/log4j-cli.properties
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend--Dlogback.configurationFile=file:/var/lib/fulfillment stream processor/flink executables/flink executables/conf/logback.xml
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend-程序参数:
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend-运行
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend--d
2019-09-28 19:34:21323 INFO org.apache.flink.client.cli.CliFrontend--c
2019-09-28 19:34:21324 INFO org.apache.flink.client.cli.CliFrontend-/home/fse/flink-kafka-relayer-0.2.jar
2019-09-28 19:34:21324 INFO org.apache.flink.client.cli.CliFrontend-类路径:/var/lib/fulfillment stream processor/flink executables/flink executables/lib/log4j-1.2.17.jar:/var/lib/fulfillment stream processor/flink executables/flink executables/flink executables/lib/flink-dist_2.11-1.8.0.jar::
2019-09-28 19:34:21324 INFO org.apache.flink.client.cli.CliFrontend---------------------------------------------------------------------------------
2019-09-28 19:34:21328 INFO org.apache.flink.configuration.GlobalConfiguration-加载配置属性:jobmanager.rpc.address,
2019-09-28 19:34:21328 INFO org.apache.flink.configuration.GlobalConfiguration-加载配置属性:jobmanager.rpc.port,6123
2019-09-28 19:34:21328 INFO org.apache.flink.configuration.GlobalConfiguration-加载配置属性:jobmanager.heap.size,1024m
2019-09-28 19:34:21329 INFO org.apache.flink.configuration.GlobalConfiguration-加载配置属性:taskmanager.heap.size,1024m
2019-09-28 19:34:21329 INFO org.apache.flink.configuration.GlobalConfiguration-加载配置属性:taskmanager.numberOfTaskSlots,4
2019-09-28 19:34:21329 INFO org.apache.flink.configuration.GlobalConfiguration-加载配置属性:parallelism.default,1
2019-09-28 19:34:21329信息org.apache.flink.configuration.GlobalConfiguration-加载配置属性:metrics.reporter.jmx.class,org.apache.flink.metrics.jmx.JMXReporter
2019-09-28 19:34:21329 INFO org.apache.flink.configuration.GlobalConfiguration-加载配置属性:metrics.reporter.jmx.port,8789
2019-09-28 19:34:21333警告org.apache.flink.client.cli.CliFrontend-无法加载cli类org.apache.flink.WARN.cli.FlinkYarnSessionCli。
java.lang.NoClassDefFoundError:org/apache/hadoop/thread/exceptions/YarnException
位于java.lang.Class.forName0(本机方法)
位于java.lang.Class.forName(Class.java:259)
位于org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1230)
在