Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Apache Spark WARN MemoryStore:空间不足_Apache Spark - Fatal编程技术网

Apache spark Apache Spark WARN MemoryStore:空间不足

Apache spark Apache Spark WARN MemoryStore:空间不足,apache-spark,Apache Spark,我使用Sparklung Water,从拼花地板文件中读取数据 我的spark-default.conf部分: `spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 1g spark.driver.memory 40g spark.executor.memory 40g spark.driver.maxResultSize 0 spark.python.wo

我使用Sparklung Water,从拼花地板文件中读取数据

我的spark-default.conf部分:

`spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 1g 
spark.driver.memory 40g 
spark.executor.memory 40g 
spark.driver.maxResultSize 0 
spark.python.worker.memory 30g 
spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
spark.storage.safetyFraction 0.9 
spark.storage.memoryFraction 0.0
`

实际上,Spark只使用了它可以使用的内存的一部分,而且在分配内存方面存在很多错误。Spark开始在硬盘上写入数据,而不是使用RAM。为什么会这样?也许我应该更改conf文件中的某些内容?我如何更改Java用作“tmp”的目录

谢谢大家!

Spark开始在硬盘上写入数据,而不是使用RAM。为什么会这样

这应该是因为在某个地方,您的持久性设置配置为使用选项内存和磁盘

从文档--> 从源代码->

这一点呢

// Initial memory to request before unrolling any block
private val unrollMemoryThreshold: Long =
conf.get(STORAGE_UNROLL_MEMORY_THRESHOLD)
// Whether there is still enough memory for us to continue unrolling this block
var keepUnrolling = true
// Initial per-task memory to request for unrolling blocks (bytes).
val initialMemoryThreshold = unrollMemoryThreshold
// How often to check whether we need to request more memory
val memoryCheckPeriod = conf.get(UNROLL_MEMORY_CHECK_PERIOD)
// Memory currently reserved by this task for this particular unrolling operation
var memoryThreshold = initialMemoryThreshold
// Memory to request as a multiple of current vector size
val memoryGrowthFactor = conf.get(UNROLL_MEMORY_GROWTH_FACTOR)
// Keep track of unroll memory used by this particular block / putIterator() operation
var unrollMemoryUsedByThisBlock = 0L 
再往下走,你会发现这一点

// Initial memory to request before unrolling any block
private val unrollMemoryThreshold: Long =
conf.get(STORAGE_UNROLL_MEMORY_THRESHOLD)
// Whether there is still enough memory for us to continue unrolling this block
var keepUnrolling = true
// Initial per-task memory to request for unrolling blocks (bytes).
val initialMemoryThreshold = unrollMemoryThreshold
// How often to check whether we need to request more memory
val memoryCheckPeriod = conf.get(UNROLL_MEMORY_CHECK_PERIOD)
// Memory currently reserved by this task for this particular unrolling operation
var memoryThreshold = initialMemoryThreshold
// Memory to request as a multiple of current vector size
val memoryGrowthFactor = conf.get(UNROLL_MEMORY_GROWTH_FACTOR)
// Keep track of unroll memory used by this particular block / putIterator() operation
var unrollMemoryUsedByThisBlock = 0L 
这就是你看到的错误的来源

    // Request enough memory to begin unrolling
keepUnrolling =
  reserveUnrollMemoryForThisTask(blockId, initialMemoryThreshold, memoryMode)

if (!keepUnrolling) {
  logWarning(s"Failed to reserve initial memory threshold of " +
    s"${Utils.bytesToString(initialMemoryThreshold)} for computing block $blockId in memory.")
} else {
  unrollMemoryUsedByThisBlock += initialMemoryThreshold
}
因此,要么像在本博客中那样在应用程序级别启用OFF_HEAP--> 或者调整集群/计算机配置,并按此处所述启用此设置-->


最后,如果以上这些都没有帮助,在我的例子中,重新启动节点可以消除警告。

如果您进入本文,仍然想知道发生了什么,请参考上面的答案,了解您是如何以及为什么出现此错误的

对我来说,我真的会看看
(到目前为止计算了3.2MB)
,然后开始担心

然而,为了解决: 在创建
sparkContext
时,将
spark.storage.memoryFraction
标志设置为
1
,以使用高达XXGb的内存,默认为提供的总内存的0.6。 也考虑设置:

rdd.compression
true

StorageLevel
as
MEMORY\u ONLY\u SER
如果您的数据比可用内存大一些。(您也可以尝试
内存和磁盘服务器

只是浏览了一些旧邮件,偶然发现了以下特性:

**spark.shuffle.spill.numElementsForceSpillThreshold**
我们将其设置为--conf spark.shuffle.spill.numElementsForceSpillThreshold=50000,解决了这个问题,但是这个值需要针对特定的用例进行调整(尝试将该值降低到40000或30000)

截至目前,spark有两个新参数: -
spark.shuffle.spill.map.maxRecordsSizeForSpillThreshold
-
spark.shuffle.spill.reduce.maxRecordsSizeForSpillThreshold

参考:


希望有帮助!干杯

请看这个:在我们的一个例子中,通过使用--conf spark.shuffle.spill.numElementsForceSpillThreshold=50000,我们解决了类似的问题。尽管对于大型洗牌: