Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 分析来自卡夫卡消费者的消息_Python_Json_Apache Kafka_Kafka Consumer Api - Fatal编程技术网

Python 分析来自卡夫卡消费者的消息

Python 分析来自卡夫卡消费者的消息,python,json,apache-kafka,kafka-consumer-api,Python,Json,Apache Kafka,Kafka Consumer Api,我建立了一个卡夫卡消费者-生产者系统,我需要处理传输的消息。这些是JSON文件中的行,如 ConsumerRecord(topic=u'json_data103052', partition=0, offset=676, timestamp=1542710197257, timestamp_type=0, key=None, value='{"Name": "Simone", "Surname": "Zimbolli", "gender": "Other", "email": "zzz@uiuc

我建立了一个卡夫卡消费者-生产者系统,我需要处理传输的消息。这些是JSON文件中的行,如

ConsumerRecord(topic=u'json_data103052', partition=0, offset=676, timestamp=1542710197257, timestamp_type=0, key=None, value='{"Name": "Simone", "Surname": "Zimbolli", "gender": "Other", "email": "zzz@uiuc.edu", "country": "Nigeria", "date": "11/07/2018"}', checksum=354265828, serialized_key_size=-1, serialized_value_size=189)
我正在寻找一个易于实施的解决方案

  • 定义一个流窗口
  • 分析窗口中的消息(统计唯一用户和类似事件的数量)
有人对如何进行有什么建议吗?谢谢

我在使用Spark时遇到问题,所以我更愿意避免使用它。我正在使用Jupyter编写Python脚本

这是我的密码:

from kafka import KafkaConsumer
from random import randint
from time import sleep

bootstrap_servers = ['localhost:9092']

%store -r topicName    # Get the topic name from the kafka producer
print topicName

consumer = KafkaConsumer(bootstrap_servers = bootstrap_servers,
                         auto_offset_reset='earliest'
                        )
consumer.subscribe([topicName])

for message in consumer:
    print (message)

我想使用Kafka Streams API是您所需要的。您拥有窗口化所需的所有功能。 您可以在此处找到有关卡夫卡流的更多信息:


我想使用Kafka Streams API是您所需要的。您拥有窗口化所需的所有功能。 您可以在此处找到有关卡夫卡流的更多信息:


对于您的场景,卡夫卡流似乎很合适。它支持以下4种类型的窗口化

Tumbling time window - Time-based   Fixed-size, non-overlapping, gap-less windows
Hopping time window- Time-based Fixed-size, overlapping windows
Sliding time window- Time-based Fixed-size, overlapping windows that work on differences between record timestamps
Session window
对于python,有一个库:


这可能对您有用。

对于您的场景,卡夫卡流似乎很合适。它支持以下4种类型的窗口化

Tumbling time window - Time-based   Fixed-size, non-overlapping, gap-less windows
Hopping time window- Time-based Fixed-size, overlapping windows
Sliding time window- Time-based Fixed-size, overlapping windows that work on differences between record timestamps
Session window
对于python,有一个库:


这可能对你有用。

你可以考虑卡夫卡流API,它对于你的窗口操作来说是非常有用的。在回答中添加了更多的细节。你可以考虑卡夫卡流API,这对于你的窗口场景很有用。不过,thoughOP似乎在寻求Python解决方案