Apache kafka Kafka流限制堆外内存

Apache kafka Kafka流限制堆外内存,apache-kafka,apache-kafka-streams,rocksdb,Apache Kafka,Apache Kafka Streams,Rocksdb,我们正在运行kafka streams应用程序,并经常遇到堆外内存问题。我们的应用程序已经部署完毕,kubernetes吊舱也在继续重启 我正在做一些调查,发现我们可以通过实现RocksDBConfigSetter来限制堆外内存,如下例所示 public static class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter { // See #1 below private static org.rocksdb.C

我们正在运行kafka streams应用程序,并经常遇到堆外内存问题。我们的应用程序已经部署完毕,kubernetes吊舱也在继续重启

我正在做一些调查,发现我们可以通过实现RocksDBConfigSetter来限制堆外内存,如下例所示

public static class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {

  // See #1 below
  private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(TOTAL_OFF_HEAP_MEMORY, -1, false, INDEX_FILTER_BLOCK_RATIO);
  private static org.rocksdb.WriteBufferManager writeBufferManager = new org.rocksdb.WriteBufferManager(TOTAL_MEMTABLE_MEMORY, cache);

  @Override
  public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {

    BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();

    // These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)
    tableConfig.setBlockCache(cache);
    tableConfig.setCacheIndexAndFilterBlocks(true);
    options.setWriteBufferManager(writeBufferManager);

    // These options are recommended to be set when bounding the total memory
    // See #2 below
    tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true);
    tableConfig.setPinTopLevelIndexAndFilter(true);
    // See #3 below
    tableConfig.setBlockSize(BLOCK_SIZE);
    options.setMaxWriteBufferNumber(N_MEMTABLES);
    options.setWriteBufferSize(MEMTABLE_SIZE);

    options.setTableFormatConfig(tableConfig);
  }

  @Override
  public void close(final String storeName, final Options options) {
    // Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.
  }
}
消耗的最大堆外内存

 240 * ( 50 (Block cache)  + 16*3(memcache) + filters(unknown))
- 240 * ~110 MB
- 26400 MB
- 25 GB
这似乎是一个很大的数字。计算正确吗?我知道实际上我们不应该达到这个最大值,但计算正确吗

此外,如果我们实现RocksDBConfigSetter并将最大堆外内存设置为4 GB。如果rocksdb请求更多内存(因为它预计大约25GB),应用程序会抱怨(崩溃OOM)吗

更新: 我将LRU减少到1GB,我的streams应用程序开始抛出LRU完全异常

2021-02-07 23:20:47,443 15448195 [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] ERROR o.a.k.s.p.internals.StreamThread - stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] Encountered the following exception during processing and the thread is going to shut down: 
org.apache.kafka.streams.errors.ProcessorStateException: stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] task [29_4] Exception caught while trying to restore state from dp-Corrigo-InvTreeObject-Store-changelog-4
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:425)
    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restoreChangelog(StoreChangelogReader.java:562)
    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:461)
    at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:744)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:625)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error restoring batch to store InvTreeObject-Store
    at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:647)
    at org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$0(StateRestoreCallbackAdapter.java:42)
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:422)
    ... 6 common frames omitted
Caused by: org.rocksdb.RocksDBException: Insert failed due to LRU cache being full.
    at org.rocksdb.RocksDB.write0(Native Method)
    at org.rocksdb.RocksDB.write(RocksDB.java:806)
    at org.apache.kafka.streams.state.internals.RocksDBStore.write(RocksDBStore.java:439)
    at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:645)
    ... 8 common frames omitted

不确定您得到了多少RocksDB实例。这取决于程序的结构。您应该查看
TopologyDescription
(通过
Topology#descripe()
)。子拓扑被实例化为任务(基于分区的数量),每个任务都有自己的RocksDB来维护每个存储的总体状态碎片

我建议查看卡夫卡峰会演讲“卡夫卡溪流州立商店的性能调优RocksDB”:

此外,如果我们实现RocksDBConfigSetter并将最大堆外内存设置为4 GB。如果rocksdb请求更多内存(因为它预计大约25GB),应用程序会抱怨(崩溃OOM)吗

它不会崩溃。RocksDB将溢出到磁盘。能够溢出到磁盘是我们默认使用持久状态存储(而不是内存中状态存储)的原因。它允许保存大于主内存的状态。在使用Kubernetes时,应该将相应的卷附加到容器并相应地调整其大小(cf)。您可能还想观看卡夫卡峰会演讲“使用Docker和Kubernetes部署卡夫卡流应用程序”:


如果状态大于主内存,如果遇到per问题,您可能还需要监视RocksDB指标,以相应地调整不同的“缓冲区”:

不确定您获得了多少RocksDB实例。这取决于程序的结构。您应该查看
TopologyDescription
(通过
Topology#descripe()
)。子拓扑被实例化为任务(基于分区的数量),每个任务都有自己的RocksDB来维护每个存储的总体状态碎片

我建议查看卡夫卡峰会演讲“卡夫卡溪流州立商店的性能调优RocksDB”:

此外,如果我们实现RocksDBConfigSetter并将最大堆外内存设置为4 GB。如果rocksdb请求更多内存(因为它预计大约25GB),应用程序会抱怨(崩溃OOM)吗

它不会崩溃。RocksDB将溢出到磁盘。能够溢出到磁盘是我们默认使用持久状态存储(而不是内存中状态存储)的原因。它允许保存大于主内存的状态。在使用Kubernetes时,应该将相应的卷附加到容器并相应地调整其大小(cf)。您可能还想观看卡夫卡峰会演讲“使用Docker和Kubernetes部署卡夫卡流应用程序”:


如果状态大于主内存,如果遇到per问题,您可能还需要监控RocksDB指标,以相应地调整不同的“缓冲区”:

对于内存监控,请查看引入RocksDB指标的KIP-607()。关于您的计算,请参阅您发布的配置设置程序中的注释:
这三个选项的组合将限制RocksDB使用的内存,使其达到传递给块缓存的大小(总堆内存)
。因此,您不需要单独考虑memtable和筛选器。所有这些都将计入块缓存。我的一个假设是,如果缓存很小,那么它将写入磁盘,但它似乎开始抛出异常并导致streams应用程序失败。更新的问题不完全确定,但错误似乎发生在状态恢复期间:
RocksDBBatchingRestoreCallback.restoreAll
——在恢复期间,我们以不同的方式配置RocksDB,特别是在“批量加载”模式下打开RocksDB(至少对于大多数KafkaStreams版本…)。是否相关?对于内存监控,请查看引入RocksDB度量的KIP-607()。关于您的计算,请参阅您发布的配置设置程序中的注释:
这三个选项组合将限制RocksDB使用的内存为传递到块缓存的大小(总堆内存)
。因此,您不需要单独考虑memtable和筛选器。所有这些都将计入块缓存。我的一个假设是,如果缓存很小,那么它将写入磁盘,但它似乎开始抛出异常并导致streams应用程序失败。更新的问题不完全确定,但错误似乎发生在状态恢复期间:
RocksDBBatchingRestoreCallback.restoreAll
——在恢复期间,我们以不同的方式配置RocksDB,特别是在“批量加载”模式下打开RocksDB(至少对于大多数KafkaStreams版本…)。有关系吗?
2021-02-07 23:20:47,443 15448195 [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] ERROR o.a.k.s.p.internals.StreamThread - stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] Encountered the following exception during processing and the thread is going to shut down: 
org.apache.kafka.streams.errors.ProcessorStateException: stream-thread [dp-Corrigo-67c5563a-9e3c-4d79-bc1e-23175e2cba6c-StreamThread-2] task [29_4] Exception caught while trying to restore state from dp-Corrigo-InvTreeObject-Store-changelog-4
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:425)
    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restoreChangelog(StoreChangelogReader.java:562)
    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:461)
    at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:744)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:625)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error restoring batch to store InvTreeObject-Store
    at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:647)
    at org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$0(StateRestoreCallbackAdapter.java:42)
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(ProcessorStateManager.java:422)
    ... 6 common frames omitted
Caused by: org.rocksdb.RocksDBException: Insert failed due to LRU cache being full.
    at org.rocksdb.RocksDB.write0(Native Method)
    at org.rocksdb.RocksDB.write(RocksDB.java:806)
    at org.apache.kafka.streams.state.internals.RocksDBStore.write(RocksDBStore.java:439)
    at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDBStore.java:645)
    ... 8 common frames omitted