Apache kafka 卡夫卡能保证零信息丢失吗?

Apache kafka 卡夫卡能保证零信息丢失吗?,apache-kafka,Apache Kafka,我读到了关于这一点的相互矛盾的观点。我有一个关键的应用程序,每一条信息都很重要。kafka是否能保证在与IBM MQ等其他传统消息传递系统相同的级别上实现零消息丢失?每个主题都是一个特定的数据流(类似于数据库中的表)。主题被分为多个分区(可以任意多个),分区中的每条消息都会获得一个增量id,称为偏移量,如下所示 分区0: +---+---+---+-----+ | 0 | 1 | 2 | ... | +---+---+---+-----+ +---+---+---+---+----+ | 0

我读到了关于这一点的相互矛盾的观点。我有一个关键的应用程序,每一条信息都很重要。kafka是否能保证在与IBM MQ等其他传统消息传递系统相同的级别上实现零消息丢失?

每个主题都是一个特定的数据流(类似于数据库中的表)。主题被分为多个分区(可以任意多个),分区中的每条消息都会获得一个增量id,称为偏移量,如下所示

分区0:

+---+---+---+-----+
| 0 | 1 | 2 | ... |
+---+---+---+-----+
+---+---+---+---+----+
| 0 | 1 | 2 | 3 | .. |
+---+---+---+---+----+
分区1:

+---+---+---+-----+
| 0 | 1 | 2 | ... |
+---+---+---+-----+
+---+---+---+---+----+
| 0 | 1 | 2 | 3 | .. |
+---+---+---+---+----+
现在,卡夫卡集群由多个代理组成。每个代理都用一个ID标识,并且可以包含某些主题分区

2个主题的示例(每个主题分别有3个和2个分区):

经纪人1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
经纪人2:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
经纪人3:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
请注意,数据是分布式的(Broker 3不保存主题2的任何数据)

主题应该有一个
复制因子
>1(通常为2或3),这样当一个代理关闭时,另一个代理可以提供主题的数据。例如,假设一个主题有两个分区,其中
复制因子设置为3,如下所示:

经纪人1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
经纪人2:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
经纪人3:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 0     |
+-------------------+
+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+
现在假设Broker 2失败了。代理1和3仍然可以为主题1提供数据。因此,
复制因子
为3始终是一个好主意,因为它允许出于维护目的关闭一个代理,也允许意外关闭另一个代理因此,Apache Kafka提供了强大的耐用性和容错保证。

关于领导者的注意事项:
在任何时候,只有一个代理可以是分区的负责人,并且只有该负责人可以接收和服务该分区的数据。其余的代理将只同步数据(同步副本)。还要注意,当
复制因子设置为1时,当代理失败时,无法将领导移动到其他位置。通常,当一个分区的所有副本出现故障或脱机时,
leader
将自动设置为
-1

“零消息丢失”是非常广泛和模糊的。你能澄清你的意思吗?是端到端交货吗?系统中是否有一次耐久性?至少一次或恰好一次语义?这里我指的是端到端的交付。卡夫卡能保证吗?我的理解是,应该这样做。但我在这里和那里都读到,它不会,而且它适用于几乎没有消息丢失的情况,比如日志记录、传感器数据……等等