Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 针对长运行作业和多个小运行作业的EMR中的资源优化/利用_Apache Spark_Hadoop_Yarn_Amazon Emr_Long Running Processes - Fatal编程技术网

Apache spark 针对长运行作业和多个小运行作业的EMR中的资源优化/利用

Apache spark 针对长运行作业和多个小运行作业的EMR中的资源优化/利用,apache-spark,hadoop,yarn,amazon-emr,long-running-processes,Apache Spark,Hadoop,Yarn,Amazon Emr,Long Running Processes,我的用例: 我们有一个长期的工作。在此之后调用,LRJ。此作业每周运行一次 我们有多个小的运行作业,可以在任何时候来。这些 作业的优先级高于长时间运行的作业 为了解决这个问题,我们创建了如下纱线队列: yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator yarn.scheduler.capacit

我的用例:

  • 我们有一个长期的工作。在此之后调用,LRJ。此作业每周运行一次
  • 我们有多个小的运行作业,可以在任何时候来。这些 作业的优先级高于长时间运行的作业
为了解决这个问题,我们创建了如下纱线队列:

        yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
        yarn.scheduler.capacity.root.queues: Q1,Q2
        yarn.scheduler.capacity.root.Q2.capacity: 60
        yarn.scheduler.capacity.root.Q1.capacity: 40
        yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
        yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
        yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
        yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
        yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
        yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
        yarn.scheduler.capacity.root.Q2.disable_preemption: true
        yarn.scheduler.capacity.root.Q1.disable_preemption: false
        yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
        yarn.resourcemanager.scheduler.monitor.enable: true
        yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
        yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
        yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
        yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
        yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
        yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
已创建用于资源管理的队列。为长时间运行的作业配置了Q1队列,为小时间运行的作业配置了Q2队列

Config:
     Q1 : capacity = 50% and it can go upto 100%
          capacity on CORE nodes = 50% and maximum 100%   
     Q2 : capacity = 50% and it can go upto 100%
          capacity on CORE nodes = 50% and maximum 100% 
我们面临的问题:

        yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
        yarn.scheduler.capacity.root.queues: Q1,Q2
        yarn.scheduler.capacity.root.Q2.capacity: 60
        yarn.scheduler.capacity.root.Q1.capacity: 40
        yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
        yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
        yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
        yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
        yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
        yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
        yarn.scheduler.capacity.root.Q2.disable_preemption: true
        yarn.scheduler.capacity.root.Q1.disable_preemption: false
        yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
        yarn.resourcemanager.scheduler.monitor.enable: true
        yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
        yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
        yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
        yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
        yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
        yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
当LRJ正在进行时,它将获取所有资源。当LRJ获得所有资源时,多个正在运行的小作业将等待。一旦集群扩展,新资源可用,运行中的小作业就会获得资源。然而,由于集群需要时间来扩展活动,这在为这些作业分配资源时造成了很大的延迟

更新1:
我们已经尝试根据使用
最大容量
配置,但它没有像我在另一个问题中发布的那样起作用

经过更多的分析,包括与一些无名英雄的讨论,我们决定根据我们的用例对纱线队列应用抢占

当发生以下事件序列时,Q1队列上的作业将被抢占:

  • Q1队列使用的容量超过了指定的容量(例如:LRJ作业 正在使用的资源多于队列上指定的资源)
  • 突然,Q2队列上的作业被调度(例如:突然触发多个正在运行的小作业)
  • 要了解抢占权,请阅读并

    以下是我们在AWS CloudFormation脚本中用于启动EMR群集的示例配置:

    容量计划程序配置:

            yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
            yarn.scheduler.capacity.root.queues: Q1,Q2
            yarn.scheduler.capacity.root.Q2.capacity: 60
            yarn.scheduler.capacity.root.Q1.capacity: 40
            yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
            yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
            yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
            yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
            yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
            yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
            yarn.scheduler.capacity.root.Q2.disable_preemption: true
            yarn.scheduler.capacity.root.Q1.disable_preemption: false
    
            yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
            yarn.resourcemanager.scheduler.monitor.enable: true
            yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
            yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
            yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
            yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
            yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
            yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
    
    纱线站点配置:

            yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
            yarn.scheduler.capacity.root.queues: Q1,Q2
            yarn.scheduler.capacity.root.Q2.capacity: 60
            yarn.scheduler.capacity.root.Q1.capacity: 40
            yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
            yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
            yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
            yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
            yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
            yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
            yarn.scheduler.capacity.root.Q2.disable_preemption: true
            yarn.scheduler.capacity.root.Q1.disable_preemption: false
    
            yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
            yarn.resourcemanager.scheduler.monitor.enable: true
            yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
            yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
            yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
            yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
            yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
            yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
    
    使用上述方法,您必须根据您的用例在特定队列上指定作业