Apache kafka 卡夫卡高可用功能不工作

Apache kafka 卡夫卡高可用功能不工作,apache-kafka,Apache Kafka,我正在尝试卡夫卡文档的快速入门,链接是。 我部署了3个代理并创建了一个主题 ➜ kafka_2.10-0.10.1.0 bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs: Topic: m

我正在尝试卡夫卡文档的快速入门,链接是。 我部署了3个代理并创建了一个主题

➜  kafka_2.10-0.10.1.0 bin/kafka-topics.sh --describe --zookeeper 
    localhost:2181 --topic my-replicated-topic
    Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 
    Configs:
    Topic: my-replicated-topic  Partition: 0    Leader: 2   Replicas: 2,0,1 
    Isr: 2,1,0
然后,我使用bin/kafka-console-producer.sh-broker列表localhost:9092-topic我的复制主题来测试producer。 并使用bin/kafka-console-consumer.sh-bootstrap server localhost:9092-from start-topic my replicated topic测试消费者 生产者和消费者工作得很好。 如果我杀死服务器1或服务器2,生产者和消费者都能正常工作

但如果我杀死服务器0,并在生产者终端中键入消息,消费者将无法读取新消息。 当我终止服务器0时,使用者打印日志:

[2017-06-23 17:29:52,750] WARN Auto offset commit failed for group console-consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:52,974] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,085] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,195] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,302] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,409] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
然后我重新启动服务器0,消费者打印消息和一些警告日志:

hhhh
hello
[2017-06-23 17:32:32,795] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:32:32,902] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should 
retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
这让我很困惑。为什么服务器0如此特殊,而服务器0不是领导者

我注意到服务器0打印的服务器日志包含以下信息:

[2017-06-23 17:32:33,640] INFO [Group Metadata Manager on Broker 0]: Finished 
loading offsets from [__consumer_offsets,23] in 38 milliseconds. 
(kafka.coordinator.GroupMetadataManager)
[2017-06-23 17:32:33,641] INFO [Group Metadata Manager on Broker 0]: Loading 
offsets and group metadata from [__consumer_offsets,26] 
(kafka.coordinator.GroupMetadataManager)
[2017-06-23 17:32:33,646] INFO [Group Metadata Manager on Broker 0]: Finished 
loading offsets from [__consumer_offsets,26] in 4 milliseconds. 
(kafka.coordinator.GroupMetadataManager)
[2017-06-23 17:32:33,646] INFO [Group Metadata Manager on Broker 0]: Loading 
offsets and group metadata from [__consumer_offsets,29] 
(kafka.coordinator.GroupMetadataManager)
但是server1和serve2日志没有这样的内容

谁能给我解释一下,非常感谢

已解决:
_consumer-offsets主题中的复制因素是根本原因。这是一个问题:issues.apache.org/jira/browse/KAFKA-3959

服务器分担管理消费者群体的负载

通常,每个独立消费者都有一个唯一的消费者组ID,当您想要在多个消费者之间拆分消费过程时,您使用相同的组ID

这就是说:对于集群中的Kafka服务器来说,作为主要代理只是为了协调其他代理。leader与当前管理组ID并为特定使用者提交的服务器没有直接关系

因此,无论何时订阅,都会指定一个服务器来处理组的偏移提交,而这与领导人选举无关


关闭该服务器,您的组消费可能会出现问题,直到Kafka群集再次稳定,重新分配您的消费者以将组管理移动到其他服务器,或者等待节点再次响应。。。我没有足够的专家告诉您故障切换是如何发生的。

kafka console producer默认为acks=1,因此根本不具备容错能力。添加flag或config参数以设置acks=all,如果您的主题和_consumer-offset主题都是在复制因子为3的情况下创建的,则测试将正常工作。

可能,主题u consumer-offset的副本设置为0。 要确认这一点,请验证主题“消费者偏移量”:

kafka-topics.sh-引导服务器本地主机:9092-描述-主题\uu消费者\u

Topic: __consumer_offsets   PartitionCount: 50  ReplicationFactor: 1    Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets   Partition: 0    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 1    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 3    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 4    Leader: 0   Replicas: 0 Isr: 0
...
Topic: __consumer_offsets   Partition: 49   Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   PartitionCount: 50  ReplicationFactor: 3    Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets   Partition: 0    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets   Partition: 1    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets   Partition: 3    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
...
Topic: __consumer_offsets   Partition: 49   Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
请注意副本:0 Isr:0。这就是当您停止代理0时,消费者不再获得消息的原因

要更正此问题,您需要更改主题u消费者u偏移的副本,包括其他代理

创建类似以下config/inc-replication-factor-consumer_offsets.json的json文件: 执行以下命令: kafka-reassign-partitions.sh-引导服务器localhost:9092-zookeeper localhost:2181-重新分配json文件../config/inc-replication-factor-consumer_offsets.json-执行

确认复制副本: kafka-topics.sh-引导服务器本地主机:9092-描述-主题\uu消费者\u

Topic: __consumer_offsets   PartitionCount: 50  ReplicationFactor: 1    Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets   Partition: 0    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 1    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 3    Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 4    Leader: 0   Replicas: 0 Isr: 0
...
Topic: __consumer_offsets   Partition: 49   Leader: 0   Replicas: 0 Isr: 0
Topic: __consumer_offsets   PartitionCount: 50  ReplicationFactor: 3    Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets   Partition: 0    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets   Partition: 1    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets   Partition: 3    Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
...
Topic: __consumer_offsets   Partition: 49   Leader: 0   Replicas: 0,1,2 Isr: 0,2,1
现在,您只能停止代理0,生成一些消息并在使用者上查看结果。
谢谢你的回答。在我的测试中,我只有一个生产者和一个消费者。当我关闭服务器0时,我等待了很长时间。但消费者仍然无法读取邮件。如果我只是在所有3台服务器都关闭后才启动server1和Server2,那么我将创建一个具有副本因子2的新主题,然后启动生产者和消费者脚本,消费者仍然无法读取新主题的消息,除非启动服务器0。服务器0非常特殊,不符合高可用性功能。我是否未正确部署它们?服务器0是集群中的第一个代理加入这是根据经验,而不是我在Kafka的源代码中进行的事实检查。通常我们应用以下原则:每当kafka代理停机或重建时,集群对于消费者来说可能变得不稳定,消费和生产(包括提交)都可能发生故障。集群成员似乎知道他们处于无效状态。使用幂等开发原则和手动偏移提交,我们只需等待使用者最终崩溃/挂起,并且每当Kafka服务器完成警告和记录不良的异常时,作业就会恢复。如果服务器0关闭,群集将无法正常工作。如果我想永远从集群中删除代理0,我应该怎么做?根本原因是_consumer-offsets主题中的复制因素。这是一个问题:issues.apache.org/jira/browse/KAFKA-3959复制因子是一个在创建主题时设置的参数。生产者不需要设置此参数。很抱歉,我指的是acks。Console producer默认为acks=1,这也会
不保证至少交付一次。发布时,您应该指定acks=all,以确保所有三个副本都获得数据,前提是创建主题时复制因子为3,我想您是这么说的。编辑我的答案以显示asks=all和replication factor=3是必需的。如果我仅在所有3台服务器关闭后启动服务器1和服务器2,则我创建了一个具有副本因子2的新主题,然后启动生产者和消费者脚本,除非启动服务器0,否则消费者仍无法读取新主题的消息。服务器0非常特殊,不符合高可用性功能。我是否没有正确地部署它们?您是将所有三个代理都放在引导服务器列表中,还是仅使用-bootstrap server localhost:9092?如果服务器0上的Kafka代理已关闭,并且您从服务器0运行此命令,则无法获取元数据。另一件要检查的事情是_consumer-offset主题上的复制因子。当您只有一个节点时,它可能创建不正确,因此它仅存储在服务器0上