Apache flink 关于Flink中NoResourceAvailableException的问题
以下是错误消息:Apache flink 关于Flink中NoResourceAvailableException的问题,apache-flink,Apache Flink,以下是错误消息: 2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (34/40) (95aac9e47f777ddc73c7a29cc1091911) switched from CREATED to SCHEDULED. 2019-10-27 05:32:57,087 INFO org.apache.flink.runtime
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (34/40) (95aac9e47f777ddc73c7a29cc1091911) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (35/40) (5181fb35b0a2eab588dd7ed2eb902bbd) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (36/40) (bf4aac9423bdecaeeb7e6ac37001d73d) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (37/40) (31f8ee4d7adbcfd5de21b4cbb83c5e05) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (38/40) (8ba11f69e8e5ee2aacaa276136ad3bd0) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (39/40) (1a1e38ede6b8d398b50b8fe7de2c6cb2) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (40/40) (7fbb095da45b2d2392874fe4fa5c916d) switched from CREATED to SCHEDULED.
2019-10-27 05:37:57,088 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
2019-10-27 05:37:57,088 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 152, slots allocated: 150, previous allocation IDs: []
我的并行度设置:
source : 32
flatmap : 80
sink : 40
是
jobManager
尝试从resourceManager
请求152个插槽,但rm没有足够的插槽,最终导致失败。当插槽不再可用时,无法resourceManager
从其他taskmanager
获取更多插槽吗?可用插槽的数量为NumberOfTaskManager
xtaskmanager.numberOfTaskSlots
(例如,75个taskmanager和2个插槽将产生150个插槽)。Flink本身无法触发任何类型的动态缩放。您所能做的就是手动启动更多任务管理器,或者更改任务管理器配置并重新启动任务管理器
如果taskmanager在作业运行时死亡,则可以定义重新启动策略(请记住,需要为此启用检查点):
如果您的TaskManager死亡并且没有重新启动,则很可能是一个问题。您好,我认为插槽已经足够了,因为错误发生在子任务从创建切换到计划之后。我认为问题在于jobManager与taskManager失去了连接。您无法预先查看是否有足够的可用插槽。但也许我把你的问题搞错了。你是在纱线上运行你的集群,对吗?是的。我在Thread上运行了它,并且在相当长的一段时间内运行良好。我是否正确地理解了任务管理器从Thread重新启动的过程?如果是,请检查您的重新启动策略(请参阅更新的答案),我是否正确地告诉您任务管理器已从Thread重新启动?如果是,请检查您的重启策略(参见更新的答案)