Apache spark 火花纺纱资源的过度利用_Apache Spark_Apache Spark Sql_Hql_Yarn

Apache spark 火花纺纱资源的过度利用

apache-spark

Apache spark 火花纺纱资源的过度利用,apache-spark,apache-spark-sql,hql,yarn,Apache Spark,Apache Spark Sql,Hql,Yarn,我有以下配置的EMR集群 Data Nodes : 6 RAM per Node : 56 GB Cores per Node: 32 Instance Type: M4*4xLarge 我在下面运行sparksql，并行执行5个配置单元脚本 spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive1.hql & spark

我有以下配置的EMR集群

Data Nodes : 6
RAM per Node : 56 GB
Cores per Node: 32
Instance Type: M4*4xLarge

我在下面运行

sparksql

，并行执行5个配置单元脚本

spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive1.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive2.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive3.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive4.hql & spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G -f hive5.hql

但是，纱线正在利用270GB的内存

根据给定命令中的参数

每个spark作业应仅使用120 GB RAM

1*20+4=24 GB RAM

5个作业=5*24=120 GB

但是，为什么纱线使用270 GB内存？（群集中没有运行其他Hadoop作业）

我需要包括任何额外的参数来限制纱线资源利用率吗？

在spark-defaults.conf（../../spark/spark-x.x/conf/spark defaults.conf）中将其设置为“spark.dynamicAllocation.enabled”false

这将帮助您限制/避免资源的动态分配。

即使我们在命令中设置了executor memory，如果集群中有可用的资源，spark也会动态分配内存。要将内存使用限制为仅执行器内存，spark dynamic memory allocation参数应设置为false

您可以直接在spark配置文件中对其进行更改，也可以将其作为配置参数传递给命令

spark-sql --master yarn --num-executors 1 --executor-memory 20G --executor-cores 20 --driver-memory 4G --conf spark.dynamicAllocation.enabled=false -f hive1.hql

明显地这是启用时的预期行为。Spark提供了一种机制，可以根据工作负载动态调整应用程序占用的资源。这意味着，如果不再使用资源，您的应用程序可能会将资源返还给集群，并在以后有需求时再次请求这些资源。如果多个应用程序共享Spark群集中的资源，则此功能特别有用。是否为此群集中使用EMR设置：maximizeResourceAllocation true？如果您从控制台启动它，默认情况下将使用它。是。默认情况下，此群集的动态资源分配为true。将其更改为false解决了我的问题。请查看答案部分以了解更多解释。