Python 分析来自卡夫卡消费者的消息
我建立了一个卡夫卡消费者-生产者系统,我需要处理传输的消息。这些是JSON文件中的行,如Python 分析来自卡夫卡消费者的消息,python,json,apache-kafka,kafka-consumer-api,Python,Json,Apache Kafka,Kafka Consumer Api,我建立了一个卡夫卡消费者-生产者系统,我需要处理传输的消息。这些是JSON文件中的行,如 ConsumerRecord(topic=u'json_data103052', partition=0, offset=676, timestamp=1542710197257, timestamp_type=0, key=None, value='{"Name": "Simone", "Surname": "Zimbolli", "gender": "Other", "email": "zzz@uiuc
ConsumerRecord(topic=u'json_data103052', partition=0, offset=676, timestamp=1542710197257, timestamp_type=0, key=None, value='{"Name": "Simone", "Surname": "Zimbolli", "gender": "Other", "email": "zzz@uiuc.edu", "country": "Nigeria", "date": "11/07/2018"}', checksum=354265828, serialized_key_size=-1, serialized_value_size=189)
我正在寻找一个易于实施的解决方案
- 定义一个流窗口
- 分析窗口中的消息(统计唯一用户和类似事件的数量)
from kafka import KafkaConsumer
from random import randint
from time import sleep
bootstrap_servers = ['localhost:9092']
%store -r topicName # Get the topic name from the kafka producer
print topicName
consumer = KafkaConsumer(bootstrap_servers = bootstrap_servers,
auto_offset_reset='earliest'
)
consumer.subscribe([topicName])
for message in consumer:
print (message)
我想使用Kafka Streams API是您所需要的。您拥有窗口化所需的所有功能。 您可以在此处找到有关卡夫卡流的更多信息:
我想使用Kafka Streams API是您所需要的。您拥有窗口化所需的所有功能。 您可以在此处找到有关卡夫卡流的更多信息:
对于您的场景,卡夫卡流似乎很合适。它支持以下4种类型的窗口化:
Tumbling time window - Time-based Fixed-size, non-overlapping, gap-less windows
Hopping time window- Time-based Fixed-size, overlapping windows
Sliding time window- Time-based Fixed-size, overlapping windows that work on differences between record timestamps
Session window
对于python,有一个库:
这可能对您有用。对于您的场景,卡夫卡流似乎很合适。它支持以下4种类型的窗口化:
Tumbling time window - Time-based Fixed-size, non-overlapping, gap-less windows
Hopping time window- Time-based Fixed-size, overlapping windows
Sliding time window- Time-based Fixed-size, overlapping windows that work on differences between record timestamps
Session window
对于python,有一个库:
这可能对你有用。
你可以考虑卡夫卡流API,它对于你的窗口操作来说是非常有用的。在回答中添加了更多的细节。你可以考虑卡夫卡流API,这对于你的窗口场景很有用。不过,thoughOP似乎在寻求Python解决方案