Apache spark 针对长运行作业和多个小运行作业的EMR中的资源优化/利用
我的用例:Apache spark 针对长运行作业和多个小运行作业的EMR中的资源优化/利用,apache-spark,hadoop,yarn,amazon-emr,long-running-processes,Apache Spark,Hadoop,Yarn,Amazon Emr,Long Running Processes,我的用例: 我们有一个长期的工作。在此之后调用,LRJ。此作业每周运行一次 我们有多个小的运行作业,可以在任何时候来。这些 作业的优先级高于长时间运行的作业 为了解决这个问题,我们创建了如下纱线队列: yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator yarn.scheduler.capacit
- 我们有一个长期的工作。在此之后调用,LRJ。此作业每周运行一次李>
- 我们有多个小的运行作业,可以在任何时候来。这些 作业的优先级高于长时间运行的作业
yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.queues: Q1,Q2
yarn.scheduler.capacity.root.Q2.capacity: 60
yarn.scheduler.capacity.root.Q1.capacity: 40
yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
yarn.scheduler.capacity.root.Q2.disable_preemption: true
yarn.scheduler.capacity.root.Q1.disable_preemption: false
yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.resourcemanager.scheduler.monitor.enable: true
yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
已创建用于资源管理的队列。为长时间运行的作业配置了Q1队列,为小时间运行的作业配置了Q2队列
Config:
Q1 : capacity = 50% and it can go upto 100%
capacity on CORE nodes = 50% and maximum 100%
Q2 : capacity = 50% and it can go upto 100%
capacity on CORE nodes = 50% and maximum 100%
我们面临的问题:
yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.queues: Q1,Q2
yarn.scheduler.capacity.root.Q2.capacity: 60
yarn.scheduler.capacity.root.Q1.capacity: 40
yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
yarn.scheduler.capacity.root.Q2.disable_preemption: true
yarn.scheduler.capacity.root.Q1.disable_preemption: false
yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.resourcemanager.scheduler.monitor.enable: true
yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
当LRJ正在进行时,它将获取所有资源。当LRJ获得所有资源时,多个正在运行的小作业将等待。一旦集群扩展,新资源可用,运行中的小作业就会获得资源。然而,由于集群需要时间来扩展活动,这在为这些作业分配资源时造成了很大的延迟
更新1:
我们已经尝试根据使用
最大容量
配置,但它没有像我在另一个问题中发布的那样起作用经过更多的分析,包括与一些无名英雄的讨论,我们决定根据我们的用例对纱线队列应用抢占
当发生以下事件序列时,Q1队列上的作业将被抢占:
yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.queues: Q1,Q2
yarn.scheduler.capacity.root.Q2.capacity: 60
yarn.scheduler.capacity.root.Q1.capacity: 40
yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
yarn.scheduler.capacity.root.Q2.disable_preemption: true
yarn.scheduler.capacity.root.Q1.disable_preemption: false
yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.resourcemanager.scheduler.monitor.enable: true
yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
纱线站点配置:
yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.queues: Q1,Q2
yarn.scheduler.capacity.root.Q2.capacity: 60
yarn.scheduler.capacity.root.Q1.capacity: 40
yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
yarn.scheduler.capacity.root.Q2.disable_preemption: true
yarn.scheduler.capacity.root.Q1.disable_preemption: false
yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.resourcemanager.scheduler.monitor.enable: true
yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
使用上述方法,您必须根据您的用例在特定队列上指定作业