Google cloud platform GCP Dataproc-配置纱线展计划程序

Google cloud platform GCP Dataproc-配置纱线展计划程序,google-cloud-platform,yarn,google-cloud-dataproc,Google Cloud Platform,Yarn,Google Cloud Dataproc,我试图建立一个dataproc集群,该集群一次只计算一个作业(或指定的最大作业),其余的将在队列中 我已经找到了这个解决方案,但由于我一直在创建一个新集群,所以我需要将其自动化。我已将此添加到群集创建中: "softwareConfig": { "properties": { "yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.schedule

我试图建立一个dataproc集群,该集群一次只计算一个作业(或指定的最大作业),其余的将在队列中

我已经找到了这个解决方案,但由于我一直在创建一个新集群,所以我需要将其自动化。我已将此添加到群集创建中:

"softwareConfig": {
    "properties": {
        "yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
        "yarn:yarn.scheduler.fair.user-as-default-queue":"false",
        "yarn:yarn.scheduler.fair.allocation.file":"$HADOOP_CONF_DIR/fair-scheduler.xml",
     }
}
使用init操作脚本中的另一行:

sudo echo "<allocations><queueMaxAppsDefault>1</queueMaxAppsDefault></allocations>" > /etc/hadoop/conf/fair-scheduler.xml
文件fair-scheduler.xml还包含指定的代码(所有内容都在一行中,但我认为这可能不是问题所在)

在所有这些之后,集群的行为仍然像是由容量调度器负责。不知道为什么。任何建议都会有所帮助。
谢谢。

由于创建集群后正在运行init actions脚本,因此当脚本修改warn-site.xml时,warn服务已经在运行

因此,在修改xml配置文件并创建另一个xml文件后,需要重新启动Thread服务。 可以使用以下命令执行此操作:

sudo systemctl restart hadoop-yarn-resourcemanager.service
另外,由于$HADOOP_CONF_DIR没有设置(我认为应该设置),因此需要输入文件的整个路径。但是,在这之后,初始的纱线服务将不会启动,因为它找不到稍后在init actions脚本中创建的文件。因此,我所做的是将最后几行添加到init actions脚本中的warn-site.xml中。 init actions脚本的代码如下所示:

ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
    echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
    echo "  <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
    echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml

    sed -i '$ d' /etc/hadoop/conf/yarn-site.xml

    echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
    echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
    systemctl restart hadoop-yarn-resourcemanager.service
fi
ROLE=$(/usr/share/google/get\u metadata\u value attributes/dataproc ROLE)
如果[[“${ROLE}”=='Master']];然后
echo”“>/etc/hadoop/conf/fair-scheduler.xml
echo“1”>>/etc/hadoop/conf/fair-scheduler.xml
echo”“>>/etc/hadoop/conf/fair-scheduler.xml
sed-i'$d'/etc/hadoop/conf/warn-site.xml
echo”“>>/etc/hadoop/conf/warn-site.xml
echo“warn.scheduler.fair.allocation.file”>>/etc/hadoop/conf/warn-site.xml
echo“/etc/hadoop/conf/fair scheduler.xml”>>/etc/hadoop/conf/warn-site.xml
echo”“>>/etc/hadoop/conf/warn-site.xml
echo”“>>/etc/hadoop/conf/warn-site.xml
systemctl重新启动hadoop-Thread-resourcemanager.service
fi
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
    echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
    echo "  <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
    echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml

    sed -i '$ d' /etc/hadoop/conf/yarn-site.xml

    echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
    echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
    systemctl restart hadoop-yarn-resourcemanager.service
fi