Apache kafka 如何防止我的Kafka Streams应用程序进入错误状态?

Apache kafka 如何防止我的Kafka Streams应用程序进入错误状态?,apache-kafka,apache-kafka-streams,Apache Kafka,Apache Kafka Streams,我注意到,我的Kafka Streams应用程序在一段时间内无法与Kafka通信后将进入错误状态。我想找到一种方法,使卡夫卡流本质上是“永远重试”,而不是进入错误状态。唯一的解决办法是重新启动Kafka Streams应用程序,这并不理想 我在Kafka Streams配置中设置了request.timeout.ms=2147483647。我注意到这很有帮助(它过去在大约一分钟后进入错误状态,现在发生的频率较低,但最终还是会发生) 这是我的卡夫卡流配置: commit.interval.ms:

我注意到,我的Kafka Streams应用程序在一段时间内无法与Kafka通信后将进入
错误
状态。我想找到一种方法,使卡夫卡流本质上是“永远重试”,而不是进入
错误
状态。唯一的解决办法是重新启动Kafka Streams应用程序,这并不理想

我在Kafka Streams配置中设置了
request.timeout.ms=2147483647
。我注意到这很有帮助(它过去在大约一分钟后进入
错误状态,现在发生的频率较低,但最终还是会发生)

这是我的卡夫卡流配置:

 commit.interval.ms: 10000
 cache.max.bytes.buffering: 0
 retries: 2147483647
 request.timeout.ms: 2147483647
 retry.backoff.ms: 5000
 num.stream.threads: 1
 state.dir: /tmp/kafka-streams
 producer.batch.size: 102400
 producer.max.request.size: 31457280
 producer.buffer.memory: 314572800
 producer.max.in.flight.requests.per.connection: 10
 producer.linger.ms: 0
 consumer.max.partition.fetch.bytes: 31457280
 consumer.receive.buffer.bytes: 655360
这是Kafka Streams日志的相关部分:

[2019-06-07T22:18:07,223Z {StreamThread-1} WARN  org.apache.kafka.clients.NetworkClient] [Consumer clientId=StreamThread-1-consumer, groupId=app-stream] 20 partitions have leader brokers without a matching listener, including [app-stream-tmp-store-changelog-5, app-stream-tmp-store-changelog-13, app-stream-tmp-store-changelog-9, app-stream-tmp-store-changelog-1, __consumer_offsets-10, __consumer_offsets-30, __consumer_offsets-18, __consumer_offsets-22, __consumer_offsets-34, __consumer_offsets-6]
[2019-06-07T22:18:08,662Z {StreamThread-1} ERROR org.apache.kafka.streams.processor.internals.AssignedStreamsTasks] stream-thread [StreamThread-1] Failed to commit stream task 0_14 due to the following error:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {global-14=OffsetAndMetadata{offset=33038702, leaderEpoch=null, metadata=''}}
[2019-06-07T22:18:08,662Z {StreamThread-1} ERROR org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {global-2=OffsetAndMetadata{offset=25537237, leaderEpoch=null, metadata=''}}
[2019-06-07T22:18:08,662Z {StreamThread-1} INFO  org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
[2019-06-07T22:18:08,662Z {StreamThread-1} INFO  org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] Shutting down
[2019-06-07T22:18:08,704Z {StreamThread-1} INFO  org.apache.kafka.clients.consumer.KafkaConsumer] [Consumer clientId=StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
[2019-06-07T22:18:08,704Z {StreamThread-1} INFO  org.apache.kafka.clients.producer.KafkaProducer] [Producer clientId=StreamThread-1-producer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
[2019-06-07T22:18:08,728Z {StreamThread-1} INFO  org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
[2019-06-07T22:18:08,728Z {StreamThread-1} INFO  org.apache.kafka.streams.KafkaStreams] stream-client [usxapgutpd01-] State transition from RUNNING to ERROR
[2019-06-07T22:18:08,728Z {StreamThread-1} ERROR org.apache.kafka.streams.KafkaStreams] stream-client [usxapgutpd01-] All stream threads have died. The instance will be in error state and should be closed.
[2019-06-07T22:18:08,728Z {StreamThread-1} INFO  org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] Shutdown complete

请看一看。如果在生成消息或处理/转换消息期间发生任何异常,则流将进入
错误
状态,并停止处理。为了处理这个问题,您需要实现
ProductionExceptionHandler
,我认为另一个问题在这里没有帮助。如果我使用ProductionExceptionHandler并返回ProductionExceptionHandlerResponse.CONTINUE(如示例中所示),您可能希望增加
default.api.timeout.ms
?@VasiliySarzhynskyi,如果我正确地解释了这一点,是否会跳过记录,导致数据丢失?我想要一种永远不会跳过记录的方法,而只是不断重试Kafka连接。@MatthiasJ.Sax default.api.timeout.ms看起来很有希望(作为Kafka流的使用者配置,我想应该是consumer.default.api.timeout.ms)。我会试试看。@NathanLinebarger您是否克服了这个问题,default.api.timeout是否有帮助?