Apache kafka Kafka流限制堆外内存
我们正在运行kafka streams应用程序,并经常遇到堆外内存问题。我们的应用程序已经部署完毕,kubernetes吊舱也在继续重启 我正在做一些调查,发现我们可以通过实现RocksDBConfigSetter来限制堆外内存,如下例所示Apache kafka Kafka流限制堆外内存,apache-kafka,apache-kafka-streams,rocksdb,Apache Kafka,Apache Kafka Streams,Rocksdb,我们正在运行kafka streams应用程序,并经常遇到堆外内存问题。我们的应用程序已经部署完毕,kubernetes吊舱也在继续重启 我正在做一些调查,发现我们可以通过实现RocksDBConfigSetter来限制堆外内存,如下例所示 public static class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter { // See #1 below private static org.rocksdb.C
public static class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {
// See #1 below
private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(TOTAL_OFF_HEAP_MEMORY, -1, false, INDEX_FILTER_BLOCK_RATIO);
private static org.rocksdb.WriteBufferManager writeBufferManager = new org.rocksdb.WriteBufferManager(TOTAL_MEMTABLE_MEMORY, cache);
@Override
public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
// These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
// These options are recommended to be set when bounding the total memory
// See #2 below
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true);
tableConfig.setPinTopLevelIndexAndFilter(true);
// See #3 below
tableConfig.setBlockSize(BLOCK_SIZE);
options.setMaxWriteBufferNumber(N_MEMTABLES);
options.setWriteBufferSize(MEMTABLE_SIZE);
options.setTableFormatConfig(tableConfig);
}
@Override
public void close(final String storeName, final Options options) {
// Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.
}
}
消耗的最大堆外内存
240 * ( 50 (Block cache) + 16*3(memcache) + filters(unknown))
- 240 * ~110 MB
- 26400 MB
- 25 GB
这似乎是一个很大的数字。计算正确吗?我知道实际上我们不应该达到这个最大值,但计算正确吗
此外,如果我们实现RocksDBConfigSetter并将最大堆外内存设置为4 GB。如果rocksdb请求更多内存(因为它预计大约25GB),应用程序会抱怨(崩溃OOM)吗
更新:
我将LRU减少到1GB,我的streams应用程序开始抛出LRU完全异常
2021-02-07 23:20:47,443 15448195 [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] ERROR o.a.k.s.p.internals.StreamThread - stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] Encountered the following exception during processing and the thread is going to shut down:
org.apache.kafka.streams.errors.ProcessorStateException: stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] task [29_4] Exception caught while trying to restore state from dp-Corrigo-InvTreeObject-Store-changelog-4
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:425)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restoreChangelog(StoreChangelogReader.java:562)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:461)
at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:744)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:625)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error restoring batch to store InvTreeObject-Store
at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:647)
at org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$0(StateRestoreCallbackAdapter.java:42)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:422)
... 6 common frames omitted
Caused by: org.rocksdb.RocksDBException: Insert failed due to LRU cache being full.
at org.rocksdb.RocksDB.write0(Native Method)
at org.rocksdb.RocksDB.write(RocksDB.java:806)
at org.apache.kafka.streams.state.internals.RocksDBStore.write(RocksDBStore.java:439)
at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:645)
... 8 common frames omitted
不确定您得到了多少RocksDB实例。这取决于程序的结构。您应该查看
TopologyDescription
(通过Topology#descripe()
)。子拓扑被实例化为任务(基于分区的数量),每个任务都有自己的RocksDB来维护每个存储的总体状态碎片
我建议查看卡夫卡峰会演讲“卡夫卡溪流州立商店的性能调优RocksDB”:
此外,如果我们实现RocksDBConfigSetter并将最大堆外内存设置为4 GB。如果rocksdb请求更多内存(因为它预计大约25GB),应用程序会抱怨(崩溃OOM)吗
它不会崩溃。RocksDB将溢出到磁盘。能够溢出到磁盘是我们默认使用持久状态存储(而不是内存中状态存储)的原因。它允许保存大于主内存的状态。在使用Kubernetes时,应该将相应的卷附加到容器并相应地调整其大小(cf)。您可能还想观看卡夫卡峰会演讲“使用Docker和Kubernetes部署卡夫卡流应用程序”:
如果状态大于主内存,如果遇到per问题,您可能还需要监视RocksDB指标,以相应地调整不同的“缓冲区”:不确定您获得了多少RocksDB实例。这取决于程序的结构。您应该查看
TopologyDescription
(通过Topology#descripe()
)。子拓扑被实例化为任务(基于分区的数量),每个任务都有自己的RocksDB来维护每个存储的总体状态碎片
我建议查看卡夫卡峰会演讲“卡夫卡溪流州立商店的性能调优RocksDB”:
此外,如果我们实现RocksDBConfigSetter并将最大堆外内存设置为4 GB。如果rocksdb请求更多内存(因为它预计大约25GB),应用程序会抱怨(崩溃OOM)吗
它不会崩溃。RocksDB将溢出到磁盘。能够溢出到磁盘是我们默认使用持久状态存储(而不是内存中状态存储)的原因。它允许保存大于主内存的状态。在使用Kubernetes时,应该将相应的卷附加到容器并相应地调整其大小(cf)。您可能还想观看卡夫卡峰会演讲“使用Docker和Kubernetes部署卡夫卡流应用程序”:
如果状态大于主内存,如果遇到per问题,您可能还需要监控RocksDB指标,以相应地调整不同的“缓冲区”:对于内存监控,请查看引入RocksDB指标的KIP-607()。关于您的计算,请参阅您发布的配置设置程序中的注释:
这三个选项的组合将限制RocksDB使用的内存,使其达到传递给块缓存的大小(总堆内存)
。因此,您不需要单独考虑memtable和筛选器。所有这些都将计入块缓存。我的一个假设是,如果缓存很小,那么它将写入磁盘,但它似乎开始抛出异常并导致streams应用程序失败。更新的问题不完全确定,但错误似乎发生在状态恢复期间:RocksDBBatchingRestoreCallback.restoreAll
——在恢复期间,我们以不同的方式配置RocksDB,特别是在“批量加载”模式下打开RocksDB(至少对于大多数KafkaStreams版本…)。是否相关?对于内存监控,请查看引入RocksDB度量的KIP-607()。关于您的计算,请参阅您发布的配置设置程序中的注释:这三个选项组合将限制RocksDB使用的内存为传递到块缓存的大小(总堆内存)
。因此,您不需要单独考虑memtable和筛选器。所有这些都将计入块缓存。我的一个假设是,如果缓存很小,那么它将写入磁盘,但它似乎开始抛出异常并导致streams应用程序失败。更新的问题不完全确定,但错误似乎发生在状态恢复期间:RocksDBBatchingRestoreCallback.restoreAll
——在恢复期间,我们以不同的方式配置RocksDB,特别是在“批量加载”模式下打开RocksDB(至少对于大多数KafkaStreams版本…)。有关系吗?
2021-02-07 23:20:47,443 15448195 [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] ERROR o.a.k.s.p.internals.StreamThread - stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] Encountered the following exception during processing and the thread is going to shut down:
org.apache.kafka.streams.errors.ProcessorStateException: stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] task [29_4] Exception caught while trying to restore state from dp-Corrigo-InvTreeObject-Store-changelog-4
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:425)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restoreChangelog(StoreChangelogReader.java:562)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:461)
at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:744)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:625)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error restoring batch to store InvTreeObject-Store
at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:647)
at org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$0(StateRestoreCallbackAdapter.java:42)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:422)
... 6 common frames omitted
Caused by: org.rocksdb.RocksDBException: Insert failed due to LRU cache being full.
at org.rocksdb.RocksDB.write0(Native Method)
at org.rocksdb.RocksDB.write(RocksDB.java:806)
at org.apache.kafka.streams.state.internals.RocksDBStore.write(RocksDBStore.java:439)
at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:645)
... 8 common frames omitted