Google cloud platform GCP Dataproc-配置纱线展计划程序_Google Cloud Platform_Yarn_Google Cloud Dataproc

Google cloud platform GCP Dataproc-配置纱线展计划程序

google-cloud-platform

Google cloud platform GCP Dataproc-配置纱线展计划程序,google-cloud-platform,yarn,google-cloud-dataproc,Google Cloud Platform,Yarn,Google Cloud Dataproc,我试图建立一个dataproc集群，该集群一次只计算一个作业（或指定的最大作业），其余的将在队列中我已经找到了这个解决方案，但由于我一直在创建一个新集群，所以我需要将其自动化。我已将此添加到群集创建中： "softwareConfig": { "properties": { "yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.schedule

我试图建立一个dataproc集群，该集群一次只计算一个作业（或指定的最大作业），其余的将在队列中

我已经找到了这个解决方案，但由于我一直在创建一个新集群，所以我需要将其自动化。我已将此添加到群集创建中：

"softwareConfig": {
    "properties": {
        "yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
        "yarn:yarn.scheduler.fair.user-as-default-queue":"false",
        "yarn:yarn.scheduler.fair.allocation.file":"$HADOOP_CONF_DIR/fair-scheduler.xml",
     }
}

使用init操作脚本中的另一行：

sudo echo "<allocations><queueMaxAppsDefault>1</queueMaxAppsDefault></allocations>" > /etc/hadoop/conf/fair-scheduler.xml

文件fair-scheduler.xml还包含指定的代码（所有内容都在一行中，但我认为这可能不是问题所在）

在所有这些之后，集群的行为仍然像是由容量调度器负责。不知道为什么。任何建议都会有所帮助。

谢谢。

由于创建集群后正在运行init actions脚本，因此当脚本修改warn-site.xml时，warn服务已经在运行

因此，在修改xml配置文件并创建另一个xml文件后，需要重新启动Thread服务。可以使用以下命令执行此操作：

sudo systemctl restart hadoop-yarn-resourcemanager.service

另外，由于$HADOOP_CONF_DIR没有设置（我认为应该设置），因此需要输入文件的整个路径。但是，在这之后，初始的纱线服务将不会启动，因为它找不到稍后在init actions脚本中创建的文件。因此，我所做的是将最后几行添加到init actions脚本中的warn-site.xml中。 init actions脚本的代码如下所示：

ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
    echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
    echo "  <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
    echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml

    sed -i '$ d' /etc/hadoop/conf/yarn-site.xml

    echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
    echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
    systemctl restart hadoop-yarn-resourcemanager.service
fi

ROLE=$（/usr/share/google/get\u metadata\u value attributes/dataproc ROLE）
如果[[“${ROLE}”=='Master']]；然后
echo”“>/etc/hadoop/conf/fair-scheduler.xml
echo“1”>>/etc/hadoop/conf/fair-scheduler.xml
echo”“>>/etc/hadoop/conf/fair-scheduler.xml
sed-i'$d'/etc/hadoop/conf/warn-site.xml
echo”“>>/etc/hadoop/conf/warn-site.xml
echo“warn.scheduler.fair.allocation.file”>>/etc/hadoop/conf/warn-site.xml
echo“/etc/hadoop/conf/fair scheduler.xml”>>/etc/hadoop/conf/warn-site.xml
echo”“>>/etc/hadoop/conf/warn-site.xml
echo”“>>/etc/hadoop/conf/warn-site.xml
systemctl重新启动hadoop-Thread-resourcemanager.service
fi

ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
    echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
    echo "  <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
    echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml

    sed -i '$ d' /etc/hadoop/conf/yarn-site.xml

    echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
    echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
    systemctl restart hadoop-yarn-resourcemanager.service
fi