Apache spark 火花执行器、驱动器、执行器内核、执行器内存的值

Apache spark 火花执行器、驱动器、执行器内核、执行器内存的值,apache-spark,Apache Spark,我对Spark executor、driver、executor cores和executor memory的值有一些疑问 如果集群上没有运行的应用程序,如果您提交作业,那么Spark executor、executor core和executor memory的默认值是多少? 如果我们要计算您要提交的作业所需的Spark executor、executor core、executor memory的值,您将如何计算? 如果集群上没有运行的应用程序,如果您提交作业,那么Spark executo

我对Spark executor、driver、executor cores和executor memory的值有一些疑问

如果集群上没有运行的应用程序,如果您提交作业,那么Spark executor、executor core和executor memory的默认值是多少? 如果我们要计算您要提交的作业所需的Spark executor、executor core、executor memory的值,您将如何计算? 如果集群上没有运行的应用程序,如果您提交作业,那么Spark executor、executor core和executor memory的默认值是多少

默认值存储在安装spark的集群中的spark-defaults.conf中。因此,您可以验证这些值。通常默认值为

检查默认值。请参考这个

如果我们要计算您要提交的作业所需的Spark executor、executor core、executor memory的值,您将如何计算

取决于以下几点

您的工作类型,即洗牌密集型或仅地图操作。如果是shuffle,您可能需要更多内存

数据大小越大,数据大小越大,内存使用率越高

集群约束。你能负担多少内存

基于这些因素,您需要从一些数字开始,然后查看spark UI,您需要了解瓶颈并增加或减少内存占用


一个警告是,将执行器内存保持在40G以上可能会导致coulter的生产效率降低,因为JVM GC变得更慢。同时,拥有过多的内核可能会减慢进程

Avishek的回答涉及默认值。我将说明如何计算最佳值。举个例子,

示例:6个节点,每个节点具有16个内核和64Gb RAM

每个执行器都是JVM实例。因此,可以在节点上执行多个执行器

让我们从选择每个执行器的核心数开始:

现在,计算执行者的数量:

As discussed earlier, there are 15 cores available for each node and we are planning for 5 cores per executors.

Thus number of executors per node = 15/5 = 3
Total number of executors = 3*6 = 18

Out of all executors, 1 executor is needed for AM management by YARN.
Thus, final executors count = 18-1 = 17 executors.
每个执行器的内存:

Executor per node = 3
RAM available per node = 63 Gb (as 1Gb is needed for OS and Hadoop Daemon)
Memory per executor = 63/3 = 21 Gb.

Some memory overhead is required by spark. Which is max(384, 7% of memory per executor).
Thus, 7% of 21 = 1.47
As 1.47Gb > 384Mb, subtract 1.47 from 21.
Hence, 21 - 1.47 ~ 19 Gb
最终数字:

Executors - 17, Cores 5, Executor Memory - 19 GB
注:

1. Sometimes one may feel to allocate lesser memory than 19 Gb. As memory decreases, the number of executors will increase and the number of cores will decrease. As discussed earlier, number of cores = 5 is best value. However, if you reduce it will still give good results. Just dont exceed value beyond 5.

2. Memory per executor should be less than 40 else there will be a considerable GC overhead.
1. Sometimes one may feel to allocate lesser memory than 19 Gb. As memory decreases, the number of executors will increase and the number of cores will decrease. As discussed earlier, number of cores = 5 is best value. However, if you reduce it will still give good results. Just dont exceed value beyond 5.

2. Memory per executor should be less than 40 else there will be a considerable GC overhead.