Apache kafka 卡夫卡流-无法重新平衡错误_Apache Kafka_Apache Kafka Streams

Apache kafka 卡夫卡流-无法重新平衡错误

apache-kafka

Apache kafka 卡夫卡流-无法重新平衡错误,apache-kafka,apache-kafka-streams,Apache Kafka,Apache Kafka Streams,我有一个基本的Kafka Streams应用程序，它可以读取in_主题，执行滚动聚合，并执行连接以发布到out_主题。这已经运行了好几个星期了，但今天早上它崩溃了，不再启动。我认为这与代码无关。发生错误之前的日志为： 2019-01-21 17:46:32,803 localhost org.apache.kafka.clients.producer.KafkaProducer: [Producer clientId=rtt-healthscore-stream-7d679951-913b-49

我有一个基本的Kafka Streams应用程序，它可以读取

in_主题

，执行滚动聚合，并执行连接以发布到

out_主题

。这已经运行了好几个星期了，但今天早上它崩溃了，不再启动。我认为这与代码无关。发生错误之前的日志为：

2019-01-21 17:46:32,803 localhost org.apache.kafka.clients.producer.KafkaProducer: [Producer clientId=rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1-0_0-producer, transactionalId=rtt-healthscore-stream-0_0] Instantiated a transactional producer.
2019-01-21 17:46:32,803 localhost org.apache.kafka.clients.producer.KafkaProducer: [Producer clientId=rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1-0_0-producer, transactionalId=rtt-healthscore-stream-0_0] Overriding the default acks to all since idempotence is enabled.
2019-01-21 17:46:32,818 localhost org.apache.kafka.common.utils.AppInfoParser: Kafka version : 2.0.0
2019-01-21 17:46:32,818 localhost org.apache.kafka.common.utils.AppInfoParser: Kafka commitId : 3402a8361b734732
2019-01-21 17:46:32,832 localhost org.apache.kafka.clients.producer.internals.TransactionManager: [Producer clientId=rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1-0_0-producer, transactionalId=rtt-healthscore-stream-0_0] ProducerId set to -1 with epoch -1
2019-01-21 17:47:32,833 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Error caught during partition assignment, will abort the current process and re-throw at the end of rebalance: {}
org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
2019-01-21 17:47:32,843 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] partition assignment took 60062 ms.
    current active tasks: []
    current standby tasks: []
    previous active tasks: []

2019-01-21 17:47:32,845 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] State transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2019-01-21 17:47:32,845 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Shutting down
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.KafkaStreams: stream-client [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804] State transition from REBALANCING to ERROR
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.KafkaStreams: stream-client [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804] All stream threads have died. The instance will be in error state and should be closed.
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Shutdown complete
Exception in thread "rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Failed to rebalance.
    at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:870)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:810)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.

所有kafka设置/配置均未更改，并且所有代理都可用。我的卡夫卡版本是2.0。我能够从控制台使用者中读取主题中的

，因此在此应用程序之前的一切都很好。感谢所有的帮助
 升级到Kafka 2.1后，我们的项目出现了相同的超时故障，我们还不知道原因
我们的临时解决办法是禁用一次配置，它跳过初始化事务状态。
在升级到2.1之后，我们也遇到了这些错误（我想以前升级到早期版本时也是如此）
我们在kubernetes环境中运行，在该环境中，在滚动升级之后，代理可能会更改IP地址。从代理日志：
[2019-02-20 02:20:20,085] WARN [TransactionCoordinator id=1001] Connection 
to node 0 (khaki-joey-kafka-0.khaki-joey-kafka-headless.hyperspace-dev/10.233.124.181:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2019-02-20 02:20:57,205] WARN [TransactionCoordinator id=1001] Connection to node 1 (khaki-joey-kafka-1.khaki-joey-kafka-headless.hyperspace-dev/10.233.122.67:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

我可以看到，事务协调器仍在为2个代理使用过时的IP地址，这2个代理在升级后（升级后一天）重新启动
可能的选择：

如上所述，为您的拖缆关闭一次。然后它就不使用事务了，而且似乎一切正常。如果您需要EOS或某些其他客户端代码需要事务，则这将不起作用
重新启动所有报告警告的代理，以强制它们重新解析IP地址。它们需要以不改变IP地址的方式重新启动。在库伯内特斯通常不可能

提出的缺陷
更新2017-02-20今天发布的卡夫卡2.1.1（融合5.1.2）可能已经解决了这一问题。请参阅链接的问题。
升级后已解决
It's resolved after upgrade
https://kafka.apache.org/25/documentation/streams/developer-guide/write-streams.html

<dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams</artifactId>
        <version>2.5.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>2.5.0</version>
    </dependency>
    <!-- Optionally include Kafka Streams DSL for Scala for Scala 2.12 -->
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams-scala_2.12</artifactId>
        <version>2.5.0</version>
    </dependency>

https://kafka.apache.org/25/documentation/streams/developer-guide/write-streams.html
org.apache.kafka
卡夫卡河
2.5.0
org.apache.kafka
卡夫卡客户
2.5.0
org.apache.kafka
卡夫卡-溪流-scala_2.12
2.5.0
您可以检查代理日志中是否有任何错误或警告消息吗？：这些都是应用程序停止处理数据时的所有日志。我试着只更改坏掉的应用程序的app_id，一切正常。因此，这似乎是一个与app_id相关的访问问题。可能是因为它试图访问损坏的数据，并被暂停/不知道在其他地方查找该数据。我们有2个和4个代理的复制。因此，为了跟进您的情况，我尝试了完整的应用程序重置（全局/本地），但仍然存在相同的问题。巧合的是，其中一个代理节点在发生此错误的同一时间发生故障。根据我的经验（Kafka 2.3.x），此问题尚未得到解决（Confluent version 5.3.x）。