Apache kafka 卡夫卡主题中的最新记录/消息_Apache Kafka

Apache kafka 卡夫卡主题中的最新记录/消息

apache-kafka

Apache kafka 卡夫卡主题中的最新记录/消息,apache-kafka,Apache Kafka,有没有办法获取卡夫卡主题中最近1000条记录/消息？类似于linux中文件的tail-f 1000，您可以使用KafkaConsumer类的seek方法-您需要找到每个分区的当前偏移量，然后执行计算以找到正确的偏移量 consumer = KafkaConsumer() partition = TopicPartition('foo', 0) start = 1234 end = 2345 consumer.assign([partition]) consumer.seek(partition,

有没有办法获取卡夫卡主题中最近1000条记录/消息？类似于linux中文件的

tail-f 1000

，

您可以使用

KafkaConsumer

类的

seek

方法-您需要找到每个分区的当前偏移量，然后执行计算以找到正确的偏移量

consumer = KafkaConsumer()
partition = TopicPartition('foo', 0)
start = 1234
end = 2345
consumer.assign([partition])
consumer.seek(partition, start)
for msg in consumer:
    if msg.offset > end:
        break
    else:
        print msg

我认为使用Python卡夫卡！！！我找到了这个方法来获取最后一条消息

将其配置为获取n条最后消息，但确保主题为空时有足够的消息。这看起来像是用于流式处理的作业，即Kafka streams或Kafka SQL

#!/usr/bin/env python
from kafka import KafkaConsumer, TopicPartition

TOPIC = 'example_topic'
GROUP = 'demo'
BOOTSTRAP_SERVERS = ['bootstrap.kafka:9092']

consumer = KafkaConsumer(
    bootstrap_servers=BOOTSTRAP_SERVERS,
    group_id=GROUP,
    # enable_auto_commit=False,
    auto_commit_interval_ms=0,
    max_poll_records=1
)

candidates = []
consumer.commit()

msg = None
partitions = consumer.partitions_for_topic(TOPIC)

for p in partitions:
    tp = TopicPartition(TOPIC, p)
    consumer.assign([tp])
    committed = consumer.committed(tp)
    consumer.seek_to_end(tp)
    last_offset = consumer.position(tp)
    print(f"\ntopic: {TOPIC} partition: {p} committed: {committed} last: {last_offset} lag: {(last_offset - committed)}")

    consumer.poll(
        timeout_ms=100,
        # max_records=1
    )

    # consumer.assign([partition])
    consumer.seek(tp, last_offset-4)

    for message in consumer:
        # print(f"Message is of type: {type(message)}")
        print(message)
        # print(f'message.offset: {message.offset}')

        # TODO find out why the number is -1
        if message.offset == last_offset-1:
            candidates.append(message)
            # print(f'  {message}')

            # comment if you don't want the messages committed
            consumer.commit()
            break

print('\n\ngooch\n\n')

latest_msg = candidates[0]

for msg in candidates:
    print(f'finalists:\n {msg}')
    if msg.timestamp > latest_msg.timestamp:
        latest_msg = msg

consumer.close()


print(f'\n\nlatest_message:\n{latest_msg}')

我知道，在Java/Scala Kafka流中，有可能创建一个表，即仅包含另一主题中最后一个条目的子主题，因此c中的confluence Kafka库可能提供一种更优雅、更高效的方法。它具有python和java绑定以及kafkacat CLI