Java Flink 1.9:独立群集:无法从TaskExecutor id:`AskTimeoutException'传输文件`

Java Flink 1.9:独立群集:无法从TaskExecutor id:`AskTimeoutException'传输文件`,java,timeout,akka,apache-flink,Java,Timeout,Akka,Apache Flink,背景 正在尝试创建Apache Flink独立群集 环境:AWS 工作经理:1 任务管理器:2 配置: 实例类型:t2介质(2个CPU 4 GB内存) 安全组端口已打开:6123808150100-50200 操作系统:CentOS Linux 7.6.1810版(核心版) 爪哇: 群集已启动并正常运行 UI显示所有集群详细信息 问题 任务提交不起作用 java.util.concurrent.CompletionException: akka.pattern.AskTimeoutE

背景

正在尝试创建Apache Flink独立群集

环境:AWS
工作经理:1
任务管理器:2
配置:

实例类型:t2介质(2个CPU 4 GB内存)
安全组端口已打开:6123808150100-50200

操作系统:CentOS Linux 7.6.1810版(核心版)

爪哇:

  • 群集已启动并正常运行
  • UI显示所有集群详细信息

问题

任务提交不起作用

java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn'
t send a reply.
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:871)
        at akka.dispatch.OnComplete.internal(Future.scala:263)
        at akka.dispatch.OnComplete.internal(Future.scala:261)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:644)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
        ... 9 more
2020-02-04 23:25:16,125 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler  - Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
        at java.lang.Thread.run(Thread.java:748)

有人能解释一下吗?问题是否与端口/防火墙有关,或者某些设置出错?

问题在于安全组端口权限。当打开从
0到65565的整个范围时,一切都开始工作。但这对于生产系统来说还不够好,因此最终对于
flink-conf.yaml
config文件中的工作人员来说,键
taskmanager.data.port
被分配了一个特定的端口,这就成功了。通过这种方式,可以将任务管理器配置为侦听范围内的特定端口

对于这样一个小型集群,ask超时通常意味着您在创建工作期间Flink无法连接到外部系统。例如,某些服务的端口错误或被防火墙阻止。任务管理器日志显示连接到端口39493/34094上的metrics服务器的尝试失败,可能这延迟了心跳响应我添加的整个端口范围为0-65535。这就成功了。但是有没有办法配置或控制这个范围?我尝试配置'taskmanager.rpc.port:50100-50200',但没有帮助。还有其他参数吗?嗨,你得到解决方案了吗?是的。上面的回答提到了这一点。设置
taskmanager.data.port
我正在Kubernetes上运行我的flink群集,并在那里自动分配端口。此外,我尝试为其分配一个特定端口,但收到相同的错误,无法控制请求超时设置。
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) 
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://ip:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http:/ip:8081.
org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - ResourceManager akka.tcp://flink@ip:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl  - Starting the SlotManager.
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher akka.tcp://flink@ip:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Recovering all persisted jobs.
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Registering TaskManager with ResourceID f2c7f664378b40ce44463713ae98e1c4 (akka.tcp://flink@TaskManager1Ip:38566/user/taskmanager_0) at ResourceManager
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Registering TaskManager with ResourceID 354a785f637751fb3b034618a47480ed (akka.tcp://flink@TaskManager2Ip:34400/user/taskmanager_0) at ResourceManager
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn'
t send a reply.
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:871)
        at akka.dispatch.OnComplete.internal(Future.scala:263)
        at akka.dispatch.OnComplete.internal(Future.scala:261)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:644)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
        ... 9 more
2020-02-04 23:25:16,125 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler  - Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
        at java.lang.Thread.run(Thread.java:748)