Apache flink 迁移到Flink 1.10.0,现在我的作业失败并杀死了我的TaskManager
我从1.9.1版迁移到1.10.0版,现在我的程序无法运行。我正在运行一个简单的计算拓扑,如下图所示。 我尝试在我的程序和flink dist文件夹中调整依赖项,但没有效果。我将hadoop依赖项移到了Apache flink 迁移到Flink 1.10.0,现在我的作业失败并杀死了我的TaskManager,apache-flink,flink-streaming,Apache Flink,Flink Streaming,我从1.9.1版迁移到1.10.0版,现在我的程序无法运行。我正在运行一个简单的计算拓扑,如下图所示。 我尝试在我的程序和flink dist文件夹中调整依赖项,但没有效果。我将hadoop依赖项移到了plugins/s3,但Flink似乎找不到jar。JobManager报告: 2020-02-14 18:52:02,376 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --------------
plugins/s3
,但Flink似乎找不到jar。JobManager报告:
2020-02-14 18:52:02,376 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --------------------------------------------------------------------------------
2020-02-14 18:52:02,376 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Starting TaskManager (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ 19:18:19 CET)
2020-02-14 18:52:02,376 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - OS current user: ubuntu
2020-02-14 18:52:02,376 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM: OpenJDK 64-Bit Server VM - Private Build - 1.8/25.242-b08
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum heap size: 4432 MiBytes
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - No Hadoop Dependency available
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM Options:
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -XX:+UseG1GC
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Xmx4647288761
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Xms4647288761
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -XX:MaxDirectMemorySize=1090519054
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -XX:MaxMetaspaceSize=100663296
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Dlog.file=/home/ubuntu/flink-1.10.0/log/flink-ubuntu-taskexecutor-0-ip-10-0-1-68.log
2020-02-14 18:52:02,377 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Dlog4j.configuration=file:/home/ubuntu/flink-1.10.0/conf/log4j.properties
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Dlogback.configurationFile=file:/home/ubuntu/flink-1.10.0/conf/logback.xml
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Program Arguments:
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --configDir
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - /home/ubuntu/flink-1.10.0/conf
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.memory.framework.off-heap.size=134217728b
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.memory.network.max=956301326b
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.memory.network.min=956301326b
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.memory.framework.heap.size=134217728b
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,378 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.memory.managed.size=3825205305b
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.cpu.cores=8.0
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.memory.task.heap.size=4513071033b
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -D
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - taskmanager.memory.task.off-heap.size=0b
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /home/ubuntu/flink-1.10.0/lib/flink-metrics-datadog-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/flink-metrics-statsd-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/flink-table-blink_2.11-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/flink-table_2.11-1.10.0.jar:/home/ubuntu/flink-1.10.0/lib/log4j-1.2.17.jar:/home/ubuntu/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar:/home/ubuntu/flink-1.10.0/lib/flink-dist_2.11-1.10.0.jar::/home/ubuntu/app/conf:
2020-02-14 18:52:02,379 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --------------------------------------------------------------------------------
这看起来像来自JobManager的日志;)您可以看一下taskmanager日志吗?“应该有一些东西。”多明尼科夫斯基问题是没有比我在这里提供的更多的信息。如果尝试将StreamingFileLink路径设置为本地目录,则作业将运行。当我切换回
s3(a)://…
时,如上所述,它会失败。我怀疑出于某种原因,Flink没有拿起s3 hadoop罐子。正如JobManager日志所述,没有可用的Hadoop依赖项
。您可以尝试通过将依赖项添加到您的作业中,并声明它已添加到创建的胖jar中来验证这一点。@Dominikowsiński因此我在../plugins
中创建了一个目录,其中没有任何内容,Flink抱怨它是空的。这让我觉得它正在被阅读。然而,我放在里面的是flink-s3-fs-hadoop-1.10.0.jar
。如果我将此文件移动到../lib
中,那么我将停止在日志中获取没有可用的Hadoop依赖项
,而是识别出Hadoop 3.1.0
。将flink-s3-fs-hadoop-1.10.0
显式添加到我的build.sbt
中不会改变任何内容。当您在lib
文件夹中添加内容时,它会自动加载,这就是它工作的原因。将库作为编译依赖项添加到build/pom应该会对您有所帮助,因为他们将库作为提供的依赖项添加。检查jar,如果缺少hadoop,则说明添加方式不正确。这看起来像来自JobManager的日志;)您可以看一下taskmanager日志吗?“应该有一些东西。”多明尼科夫斯基问题是没有比我在这里提供的更多的信息。如果尝试将StreamingFileLink路径设置为本地目录,则作业将运行。当我切换回s3(a)://…
时,如上所述,它会失败。我怀疑出于某种原因,Flink没有拿起s3 hadoop罐子。正如JobManager日志所述,没有可用的Hadoop依赖项
。您可以尝试通过将依赖项添加到您的作业中,并声明它已添加到创建的胖jar中来验证这一点。@Dominikowsiński因此我在../plugins
中创建了一个目录,其中没有任何内容,Flink抱怨它是空的。这让我觉得它正在被阅读。然而,我放在里面的是flink-s3-fs-hadoop-1.10.0.jar
。如果我将此文件移动到../lib
中,那么我将停止在日志中获取没有可用的Hadoop依赖项
,而是识别出Hadoop 3.1.0
。将flink-s3-fs-hadoop-1.10.0
显式添加到我的build.sbt
中不会改变任何内容。当您在lib
文件夹中添加内容时,它会自动加载,这就是它工作的原因。将库作为编译依赖项添加到build/pom应该会对您有所帮助,因为他们将库作为提供的依赖项添加。检查jar,如果hadoop在那里丢失了,那么您添加它的方式不正确。
2020-02-14 20:06:24
org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:186)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:180)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:484)
at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1703)
at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1252)
at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1220)
at org.apache.flink.runtime.executiongraph.Execution.fail(Execution.java:955)
at org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.signalPayloadRelease(SingleLogicalSlot.java:173)
at org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.release(SingleLogicalSlot.java:165)
at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:732)
at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537)
at org.apache.flink.runtime.jobmaster.slotpool.AllocatedSlot.releasePayload(AllocatedSlot.java:149)
at org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseTaskManagerInternal(SlotPoolImpl.java:815)
at org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseTaskManager(SlotPoolImpl.java:774)
at org.apache.flink.runtime.jobmaster.JobMaster.disconnectTaskManager(JobMaster.java:429)
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1147)
at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:109)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 56a63a7493bc527e7af383dc008a4800 timed out.
... 26 more