Apache kafka 卡夫卡主题中的最新记录/消息
有没有办法获取卡夫卡主题中最近1000条记录/消息?类似于linux中文件的Apache kafka 卡夫卡主题中的最新记录/消息,apache-kafka,Apache Kafka,有没有办法获取卡夫卡主题中最近1000条记录/消息?类似于linux中文件的tail-f 1000,您可以使用KafkaConsumer类的seek方法-您需要找到每个分区的当前偏移量,然后执行计算以找到正确的偏移量 consumer = KafkaConsumer() partition = TopicPartition('foo', 0) start = 1234 end = 2345 consumer.assign([partition]) consumer.seek(partition,
tail-f 1000
,您可以使用KafkaConsumer
类的seek
方法-您需要找到每个分区的当前偏移量,然后执行计算以找到正确的偏移量
consumer = KafkaConsumer()
partition = TopicPartition('foo', 0)
start = 1234
end = 2345
consumer.assign([partition])
consumer.seek(partition, start)
for msg in consumer:
if msg.offset > end:
break
else:
print msg
我认为使用Python卡夫卡!!!我找到了这个方法来获取最后一条消息 将其配置为获取n条最后消息,但确保主题为空时有足够的消息。这看起来像是用于流式处理的作业,即Kafka streams或Kafka SQL
#!/usr/bin/env python
from kafka import KafkaConsumer, TopicPartition
TOPIC = 'example_topic'
GROUP = 'demo'
BOOTSTRAP_SERVERS = ['bootstrap.kafka:9092']
consumer = KafkaConsumer(
bootstrap_servers=BOOTSTRAP_SERVERS,
group_id=GROUP,
# enable_auto_commit=False,
auto_commit_interval_ms=0,
max_poll_records=1
)
candidates = []
consumer.commit()
msg = None
partitions = consumer.partitions_for_topic(TOPIC)
for p in partitions:
tp = TopicPartition(TOPIC, p)
consumer.assign([tp])
committed = consumer.committed(tp)
consumer.seek_to_end(tp)
last_offset = consumer.position(tp)
print(f"\ntopic: {TOPIC} partition: {p} committed: {committed} last: {last_offset} lag: {(last_offset - committed)}")
consumer.poll(
timeout_ms=100,
# max_records=1
)
# consumer.assign([partition])
consumer.seek(tp, last_offset-4)
for message in consumer:
# print(f"Message is of type: {type(message)}")
print(message)
# print(f'message.offset: {message.offset}')
# TODO find out why the number is -1
if message.offset == last_offset-1:
candidates.append(message)
# print(f' {message}')
# comment if you don't want the messages committed
consumer.commit()
break
print('\n\ngooch\n\n')
latest_msg = candidates[0]
for msg in candidates:
print(f'finalists:\n {msg}')
if msg.timestamp > latest_msg.timestamp:
latest_msg = msg
consumer.close()
print(f'\n\nlatest_message:\n{latest_msg}')
我知道,在Java/Scala Kafka流中,有可能创建一个表,即仅包含另一主题中最后一个条目的子主题,因此c中的confluence Kafka库可能提供一种更优雅、更高效的方法。它具有python和java绑定以及kafkacat CLI