Apache flink 迁移到Flink 1.10.0,现在我的作业失败并杀死了我的TaskManager

Apache flink 迁移到Flink 1.10.0,现在我的作业失败并杀死了我的TaskManager,apache-flink,flink-streaming,Apache Flink,Flink Streaming,我从1.9.1版迁移到1.10.0版,现在我的程序无法运行。我正在运行一个简单的计算拓扑,如下图所示。 我尝试在我的程序和flink dist文件夹中调整依赖项,但没有效果。我将hadoop依赖项移到了plugins/s3,但Flink似乎找不到jar。JobManager报告: 2020-02-14 18:52:02,376 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --------------

我从1.9.1版迁移到1.10.0版,现在我的程序无法运行。我正在运行一个简单的计算拓扑,如下图所示。

我尝试在我的程序和flink dist文件夹中调整依赖项,但没有效果。我将hadoop依赖项移到了
plugins/s3
,但Flink似乎找不到jar。JobManager报告:

2020-02-14 18:52:02,376 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - --------------------------------------------------------------------------------
2020-02-14 18:52:02,376 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Starting TaskManager (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ 19:18:19 CET)
2020-02-14 18:52:02,376 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  OS current user: ubuntu
2020-02-14 18:52:02,376 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 64-Bit Server VM - Private Build - 1.8/25.242-b08
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap size: 4432 MiBytes
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  No Hadoop Dependency available
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM Options:
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:+UseG1GC
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xmx4647288761
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xms4647288761
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:MaxDirectMemorySize=1090519054
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:MaxMetaspaceSize=100663296
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Dlog.file=/home/ubuntu/flink-1.10.0/log/flink-ubuntu-taskexecutor-0-ip-10-0-1-68.log
2020-02-14 18:52:02,377 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Dlog4j.configuration=file:/home/ubuntu/flink-1.10.0/conf/log4j.properties
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Dlogback.configurationFile=file:/home/ubuntu/flink-1.10.0/conf/logback.xml
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Program Arguments:
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     --configDir
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     /home/ubuntu/flink-1.10.0/conf
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.memory.framework.off-heap.size=134217728b
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.memory.network.max=956301326b
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.memory.network.min=956301326b
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.memory.framework.heap.size=134217728b
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,378 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.memory.managed.size=3825205305b
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.cpu.cores=8.0
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.memory.task.heap.size=4513071033b
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     taskmanager.memory.task.off-heap.size=0b
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Classpath: /home/ubuntu/flink-1.10.0/lib/flink-metrics-datadog-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/flink-metrics-statsd-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/flink-table-blink_2.11-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/flink-table_2.11-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/log4j-1.2.17.jar:/home/ubuntu/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar:/home/ubuntu/flink-1.10.0/lib/flink-dist_2.11-1.10.0.jar::/home/ubuntu/app/conf:
2020-02-14 18:52:02,379 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - --------------------------------------------------------------------------------

这看起来像来自JobManager的日志;)您可以看一下taskmanager日志吗?“应该有一些东西。”多明尼科夫斯基问题是没有比我在这里提供的更多的信息。如果尝试将StreamingFileLink路径设置为本地目录,则作业将运行。当我切换回
s3(a)://…
时,如上所述,它会失败。我怀疑出于某种原因,Flink没有拿起s3 hadoop罐子。正如JobManager日志所述,
没有可用的Hadoop依赖项
。您可以尝试通过将依赖项添加到您的作业中,并声明它已添加到创建的胖jar中来验证这一点。@Dominikowsiński因此我在
../plugins
中创建了一个目录,其中没有任何内容,Flink抱怨它是空的。这让我觉得它正在被阅读。然而,我放在里面的是
flink-s3-fs-hadoop-1.10.0.jar
。如果我将此文件移动到
../lib
中,那么我将停止在日志中获取
没有可用的Hadoop依赖项
,而是识别出
Hadoop 3.1.0
。将
flink-s3-fs-hadoop-1.10.0
显式添加到我的
build.sbt
中不会改变任何内容。当您在
lib
文件夹中添加内容时,它会自动加载,这就是它工作的原因。将库作为编译依赖项添加到build/pom应该会对您有所帮助,因为他们将库作为提供的依赖项添加。检查jar,如果缺少hadoop,则说明添加方式不正确。这看起来像来自JobManager的日志;)您可以看一下taskmanager日志吗?“应该有一些东西。”多明尼科夫斯基问题是没有比我在这里提供的更多的信息。如果尝试将StreamingFileLink路径设置为本地目录,则作业将运行。当我切换回
s3(a)://…
时,如上所述,它会失败。我怀疑出于某种原因,Flink没有拿起s3 hadoop罐子。正如JobManager日志所述,
没有可用的Hadoop依赖项
。您可以尝试通过将依赖项添加到您的作业中,并声明它已添加到创建的胖jar中来验证这一点。@Dominikowsiński因此我在
../plugins
中创建了一个目录,其中没有任何内容,Flink抱怨它是空的。这让我觉得它正在被阅读。然而,我放在里面的是
flink-s3-fs-hadoop-1.10.0.jar
。如果我将此文件移动到
../lib
中,那么我将停止在日志中获取
没有可用的Hadoop依赖项
,而是识别出
Hadoop 3.1.0
。将
flink-s3-fs-hadoop-1.10.0
显式添加到我的
build.sbt
中不会改变任何内容。当您在
lib
文件夹中添加内容时,它会自动加载,这就是它工作的原因。将库作为编译依赖项添加到build/pom应该会对您有所帮助,因为他们将库作为提供的依赖项添加。检查jar,如果hadoop在那里丢失了,那么您添加它的方式不正确。
2020-02-14 20:06:24
org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
    at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110)
    at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76)
    at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
    at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:186)
    at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:180)
    at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:484)
    at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49)
    at org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1703)
    at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1252)
    at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1220)
    at org.apache.flink.runtime.executiongraph.Execution.fail(Execution.java:955)
    at org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.signalPayloadRelease(SingleLogicalSlot.java:173)
    at org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.release(SingleLogicalSlot.java:165)
    at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:732)
    at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537)
    at org.apache.flink.runtime.jobmaster.slotpool.AllocatedSlot.releasePayload(AllocatedSlot.java:149)
    at org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseTaskManagerInternal(SlotPoolImpl.java:815)
    at org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseTaskManager(SlotPoolImpl.java:774)
    at org.apache.flink.runtime.jobmaster.JobMaster.disconnectTaskManager(JobMaster.java:429)
    at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1147)
    at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:109)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
    at akka.actor.ActorCell.invoke(ActorCell.scala:561)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 56a63a7493bc527e7af383dc008a4800 timed out.
    ... 26 more