Apache spark Apache Spark WARN MemoryStore:空间不足_Apache Spark

Apache spark Apache Spark WARN MemoryStore:空间不足

apache-spark

Apache spark Apache Spark WARN MemoryStore:空间不足,apache-spark,Apache Spark,我使用Sparklung Water，从拼花地板文件中读取数据我的spark-default.conf部分： `spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 1g spark.driver.memory 40g spark.executor.memory 40g spark.driver.maxResultSize 0 spark.python.wo

我使用Sparklung Water，从拼花地板文件中读取数据

我的spark-default.conf部分：

`spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 1g 
spark.driver.memory 40g 
spark.executor.memory 40g 
spark.driver.maxResultSize 0 
spark.python.worker.memory 30g 
spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
spark.storage.safetyFraction 0.9 
spark.storage.memoryFraction 0.0

实际上，Spark只使用了它可以使用的内存的一部分，而且在分配内存方面存在很多错误。Spark开始在硬盘上写入数据，而不是使用RAM。为什么会这样？也许我应该更改conf文件中的某些内容？我如何更改Java用作“tmp”的目录

谢谢大家!

Spark开始在硬盘上写入数据，而不是使用RAM。为什么会这样

这应该是因为在某个地方，您的持久性设置配置为使用选项内存和磁盘
从文档--> 从源代码->
这一点呢

// Initial memory to request before unrolling any block private val unrollMemoryThreshold: Long = conf.get(STORAGE_UNROLL_MEMORY_THRESHOLD)

// Whether there is still enough memory for us to continue unrolling this block var keepUnrolling = true // Initial per-task memory to request for unrolling blocks (bytes). val initialMemoryThreshold = unrollMemoryThreshold // How often to check whether we need to request more memory val memoryCheckPeriod = conf.get(UNROLL_MEMORY_CHECK_PERIOD) // Memory currently reserved by this task for this particular unrolling operation var memoryThreshold = initialMemoryThreshold // Memory to request as a multiple of current vector size val memoryGrowthFactor = conf.get(UNROLL_MEMORY_GROWTH_FACTOR) // Keep track of unroll memory used by this particular block / putIterator() operation var unrollMemoryUsedByThisBlock = 0L
再往下走，你会发现这一点

// Initial memory to request before unrolling any block private val unrollMemoryThreshold: Long = conf.get(STORAGE_UNROLL_MEMORY_THRESHOLD)

// Whether there is still enough memory for us to continue unrolling this block var keepUnrolling = true // Initial per-task memory to request for unrolling blocks (bytes). val initialMemoryThreshold = unrollMemoryThreshold // How often to check whether we need to request more memory val memoryCheckPeriod = conf.get(UNROLL_MEMORY_CHECK_PERIOD) // Memory currently reserved by this task for this particular unrolling operation var memoryThreshold = initialMemoryThreshold // Memory to request as a multiple of current vector size val memoryGrowthFactor = conf.get(UNROLL_MEMORY_GROWTH_FACTOR) // Keep track of unroll memory used by this particular block / putIterator() operation var unrollMemoryUsedByThisBlock = 0L
这就是你看到的错误的来源

// Request enough memory to begin unrolling keepUnrolling = reserveUnrollMemoryForThisTask(blockId, initialMemoryThreshold, memoryMode) if (!keepUnrolling) { logWarning(s"Failed to reserve initial memory threshold of " + s"${Utils.bytesToString(initialMemoryThreshold)} for computing block $blockId in memory.") } else { unrollMemoryUsedByThisBlock += initialMemoryThreshold }
因此，要么像在本博客中那样在应用程序级别启用OFF_HEAP--> 或者调整集群/计算机配置，并按此处所述启用此设置-->

最后，如果以上这些都没有帮助，在我的例子中，重新启动节点可以消除警告。
如果您进入本文，仍然想知道发生了什么，请参考上面的答案，了解您是如何以及为什么出现此错误的
对我来说，我真的会看看
（到目前为止计算了3.2MB）
，然后开始担心
然而，为了解决：在创建
sparkContext
时，将
spark.storage.memoryFraction
标志设置为
1
，以使用高达XXGb的内存，默认为提供的总内存的0.6。也考虑设置：

rdd.compression
到
true
及

StorageLevel
as
MEMORY\u ONLY\u SER
如果您的数据比可用内存大一些。（您也可以尝试
内存和磁盘服务器）只是浏览了一些旧邮件，偶然发现了以下特性： **spark.shuffle.spill.numElementsForceSpillThreshold** 我们将其设置为--conf spark.shuffle.spill.numElementsForceSpillThreshold=50000，解决了这个问题，但是这个值需要针对特定的用例进行调整（尝试将该值降低到40000或30000）截至目前，spark有两个新参数： -spark.shuffle.spill.map.maxRecordsSizeForSpillThreshold -spark.shuffle.spill.reduce.maxRecordsSizeForSpillThreshold 参考：希望有帮助！干杯请看这个：在我们的一个例子中，通过使用--conf spark.shuffle.spill.numElementsForceSpillThreshold=50000，我们解决了类似的问题。尽管对于大型洗牌：