Apache kafka 丢失网络中丢失Kafka节点的快速检测

Apache kafka 丢失网络中丢失Kafka节点的快速检测,apache-kafka,Apache Kafka,我们在3节点设置中运行Kafka,每个节点上都有Kafka和Zookeeper。这些主题有1个分区和2个副本,如: Topic:someTopic PartitionCount:1 ReplicationFactor:2 Configs:retention.ms=600000 Topic: someTopic Partition: 0 Leader: 2 Replicas: 2,0 Isr: 2,0 我们使用以下设置 消费者设置: fetch.

我们在3节点设置中运行Kafka,每个节点上都有Kafka和Zookeeper。这些主题有1个分区和2个副本,如:

Topic:someTopic    PartitionCount:1    ReplicationFactor:2    Configs:retention.ms=600000
    Topic: someTopic    Partition: 0    Leader: 2    Replicas: 2,0    Isr: 2,0
我们使用以下设置

消费者设置:

fetch.min.bytes=1
enable.auto.commit=true
max.partition.fetch.bytes=1073741824
metadata.fetch.timeout.ms=1000
制片人设置:

fetch.min.bytes=1
enable.auto.commit=true
max.partition.fetch.bytes=1073741824
metadata.fetch.timeout.ms=1000
如果我们在一个节点上使用“kill-9”停止Kafka和Zookeeper,Kafka会在几秒钟内检测到先导丢失,并将先导切换到另一个副本,使用者将继续接收消息

另一方面,如果我们使用“ifdown eth0”(这将断开与该节点上Kafka和Zookeeper的连接)关闭同一节点的网络,则Kafka似乎无法检测到代理丢失,并且需要花费2分钟才能在受影响的主题上使用更多消息

可以在使用者上看到以下日志:

[2017-05-04 15:44:26,916] WARN Auto offset commit failed for group console-consumer-75510: Commit offsets failed with retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
关于制作人:

May 04 15:44:18: 15:44:18.420 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.435 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.440 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.442 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.444 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.446 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.448 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.449 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
。。。将继续打印一段时间

当一个节点由于网络丢失而关闭时,有没有办法让卡夫卡检测并重新平衡,就好像卡夫卡刚刚被杀死一样