Apache spark Spark中执行器和任务的内存分配
我的集群配置如下:- 7个节点,每个节点具有32个内核和252 GB内存 纱线结构如下所示:-Apache spark Spark中执行器和任务的内存分配,apache-spark,memory,memory-management,heap-memory,Apache Spark,Memory,Memory Management,Heap Memory,我的集群配置如下:- 7个节点,每个节点具有32个内核和252 GB内存 纱线结构如下所示:- yarn.scheduler.maximum-allocation-mb - 10GB yarn.scheduler.minimum-allocation-mb - 2GB yarn.nodemanager.vmem-pmem-ratio - 2.1 yarn.nodemanager.resource.memory-mb - 22GB yarn.scheduler.maximum-allocation
yarn.scheduler.maximum-allocation-mb - 10GB
yarn.scheduler.minimum-allocation-mb - 2GB
yarn.nodemanager.vmem-pmem-ratio - 2.1
yarn.nodemanager.resource.memory-mb - 22GB
yarn.scheduler.maximum-allocation-vcores - 25
yarn.scheduler.minimum-allocation-vcores - 1
yarn.nodemanager.resource.cpu-vcores - 25
mapreduce.map.java.opts - -Xmx1638m
mapreduce.map.memory.mb - 2GB
mapreduce.reduce.java.opts - -Xmx3276m
mapreduce.reduce.memory.mb - 4Gb
map reduce配置如下所示:-
yarn.scheduler.maximum-allocation-mb - 10GB
yarn.scheduler.minimum-allocation-mb - 2GB
yarn.nodemanager.vmem-pmem-ratio - 2.1
yarn.nodemanager.resource.memory-mb - 22GB
yarn.scheduler.maximum-allocation-vcores - 25
yarn.scheduler.minimum-allocation-vcores - 1
yarn.nodemanager.resource.cpu-vcores - 25
mapreduce.map.java.opts - -Xmx1638m
mapreduce.map.memory.mb - 2GB
mapreduce.reduce.java.opts - -Xmx3276m
mapreduce.reduce.memory.mb - 4Gb
火花配置如下所示:-
spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384
现在,我尝试运行spark shell,将值设置为master Thread,并为executor memory、num executors和executor cores设置不同的值
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
8478 hdp66-ss 20 0 13.5g 1.1g 25m S 1.9 0.4 2:11.28
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
5256 hdp66-ss 20 0 13.2g 1.1g 25m S 2.6 0.4 1:25.25
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
21518 hdp66-ss 20 0 19.2g 1.4g 25m S 3.9 0.6 2:24.46
所以虚拟内存是13.5G,物理内存是1.1g
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
8478 hdp66-ss 20 0 13.5g 1.1g 25m S 1.9 0.4 2:11.28
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
5256 hdp66-ss 20 0 13.2g 1.1g 25m S 2.6 0.4 1:25.25
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
21518 hdp66-ss 20 0 19.2g 1.4g 25m S 3.9 0.6 2:24.46
所以虚拟内存是13.2G,物理内存是1.1g
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
8478 hdp66-ss 20 0 13.5g 1.1g 25m S 1.9 0.4 2:11.28
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
5256 hdp66-ss 20 0 13.2g 1.1g 25m S 2.6 0.4 1:25.25
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
21518 hdp66-ss 20 0 19.2g 1.4g 25m S 3.9 0.6 2:24.46
所以虚拟内存是19.2G,物理内存是1.4g
有人能告诉我这些记忆和遗嘱执行人是如何开始的吗。为什么spark UI上显示的内存占重新请求的executor内存的67%?以及如何为每个执行器决定虚拟内存和物理内存。Spark几乎总是为用户请求的执行器分配65%到70%的内存。火花的这种行为是由火花JIRA票“火花-12579”引起的
if(conf.contains(“spark.executor.memory”)){
val executorMemory=conf.getSizeAsBytes(“spark.executor.memory”)
if(执行器内存<分钟系统内存){
抛出新的IllegalArgumentException“Executor memory$Executor memory必须至少为”+
s“$minSystemMemory。请使用”+
s”--执行器内存选项或spark配置中的spark.executor.memory。“)
}
}
val usableMemory=systemMemory-reservedMemory
val memoryFraction=conf.getDouble(“spark.memory.fraction”,0.6)
(usableMemory*memoryFraction)。toLong
}
上面的代码对您看到的行为负责。对于集群可能没有用户请求的内存的情况,这是一种安全防护措施