Apache kafka 重新平衡消费者正在重新读取,导致旧消息重新进入流
我使用kafka在两个微服务之间传输消息,但在解决这一“错误”时遇到了一个问题。我已启用自动提交,当我收到此错误时:Apache kafka 重新平衡消费者正在重新读取,导致旧消息重新进入流,apache-kafka,Apache Kafka,我使用kafka在两个微服务之间传输消息,但在解决这一“错误”时遇到了一个问题。我已启用自动提交,当我收到此错误时: kafka.errors.CommitFailedError: CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time b
kafka.errors.CommitFailedError: CommitFailedError: Commit cannot be completed since the group has already
rebalanced and assigned the partitions to another member. This means that the time between subsequent calls
to poll() was longer than the configured max_poll_interval_ms, which typically implies that the poll loop
is spending too much time message processing. You can address this either by increasing the rebalance
timeout with max_poll_interval_ms, or by reducing the maximum size of batches returned in poll() with
max_poll_records.
这是我从这个错误消息中得到的理解。自动提交失败,因此使用者偏移量未更新,下次发送新邮件时,它还会发送以前的邮件(从偏移量开始),从而导致重复邮件。对吗
我试着使用max\u poll\u interval\u ms
和max\u poll\u记录
值,但没有效果。我很可能将它们设置为不正确的值。我是否应该将自动偏移量\u重置设置为最新
有几个参数被注释掉了,因为这些配置没有解决问题。这是我目前正在运行的:
consumer = KafkaConsumer(
bootstrap_servers=[
...
],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='my-client',
# max_poll_records=1,
# max_poll_interval_ms=5000,
# heartbeat_interval_ms=1000,
# session_timeout_ms=5000,
api_version=(2, 2, 1))
更新
当我打印消费记录时,偏移量似乎在增加
ConsumerRecord(topic='data-topic-one', partition=0, offset=62, timestamp=1601451089085, timestamp_type=0, key=None, value=b'{"val": 1})
...
...
ConsumerRecord(topic='data-topic-one', partition=0, offset=63, timestamp=1601451089085, timestamp_type=0, key=None, value=b'{"val": 2})
...
...
ConsumerRecord(topic='data-topic-one', partition=0, offset=64, timestamp=1601451089085, timestamp_type=0, key=None, value=b'{"val": 3})
在处理并保存数据库中的记录之后,我最终得到了
首次插入结果
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 1
},
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 1
},
{
"val": 2
},
{
"val": 3
}
]
}
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 3
}
]
}
第二次插入结果
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 1
},
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 1
},
{
"val": 2
},
{
"val": 3
}
]
}
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 3
}
]
}
第三次插入结果
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 1
},
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 1
},
{
"val": 2
},
{
"val": 3
}
]
}
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 3
}
]
}
预期的结果应该是
首次插入结果
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 1
},
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 1
},
{
"val": 2
},
{
"val": 3
}
]
}
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 3
}
]
}
第二次插入结果
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 1
},
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 1
},
{
"val": 2
},
{
"val": 3
}
]
}
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 3
}
]
}
第三次插入结果
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 1
},
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 1
},
{
"val": 2
},
{
"val": 3
}
]
}
{
"id": "123",
"value": [
{
"val": 1
}
]
}
{
"id": "456",
"value": [
{
"val": 2
}
]
}
{
"id": "789",
"value": [
{
"val": 3
}
]
}
这篇文章可能会有所帮助:这是有道理的,但很难找出正确的值来使用。如果我使用max_poll_记录,我是否也应该更改max_poll_interval_ms和heartbeat_interval_ms?是否也可以将auto_offset_reset更改为最新帮助?将auto_auto_offset设置为最新只会推迟问题,我想:)这是真的-我会处理这些值,但只会根据所有可能的解决方案,如果我禁用了自动提交,而在完成数据处理时提交了偏移量,该怎么办?