Apache spark 为什么spark streaming从卡夫卡接收数据比<；executorMemory*executorCount+；司机管理>；？_Apache Spark_Spark Streaming

Apache spark 为什么spark streaming从卡夫卡接收数据比<；executorMemory*executorCount+；司机管理>；？

apache-spark

Apache spark 为什么spark streaming从卡夫卡接收数据比<；executorMemory*executorCount+；司机管理>；？,apache-spark,spark-streaming,Apache Spark,Spark Streaming,我向具有客户端模式的纱线集群提交了Spark Streaming应用程序，如下所示： ./spark-submit \ --jars $JARS \ --class $APPCLS \ --master yarn-client \ --driver-memory 64m \ --executor-memory 64m \ --conf spark.shuffle.service.enabled=false \ --conf spark.dynamicAllocation.enabled=fals

我向具有客户端模式的纱线集群提交了Spark Streaming应用程序，如下所示：

./spark-submit \
--jars $JARS \
--class $APPCLS \
--master yarn-client \
--driver-memory 64m \
--executor-memory 64m \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false  \
--num-executors 6 \
/data/apps/app.jar

执行器内存*executorCount+driverMemory=64m*6+64m=448m

但是应用程序实际使用了3968mb。为什么会发生这种情况以及如何减少内存使用？

有Spark配置参数

Spark.Thread.executor.memoryOverhead

和

Spark.Thread.driver.memoryOverhead

，在您的情况下默认为384 MB（）

还有一个事实，Thread的内存分配粒度（

Thread.scheduler.increment allocation mb

）默认为512 mb。所以所有的东西都是这个的倍数

还有一个最小分配大小（

warn.scheduler.minimum allocation mb

），默认为1 GB。它在您的情况下被设置得更低，或者您没有正确地查看内存分配

与内存使用相比，所有这些开销应该可以忽略不计。您应该将

--executor memory

设置为20 GB或更大。为什么要配置低得离谱的内存？

谢谢您的详细说明。在我的例子中，执行者只需要处理5秒的数据。所以我想减少使用的内存。