带有kafka io流数据的Apache beam python groupbykey
我正在尝试使用ApacheBeam2.23创建10秒的固定窗口,并将kafka作为数据源。 即使我尝试将AfterProcessingtime触发器设置为15,并且尝试使用GroupByKey时抛出以下错误,似乎每个记录都会被触发。 错误:KeyError:0[运行'[17]:FixedWindow'] 数据模拟:带有kafka io流数据的Apache beam python groupbykey,python,apache-kafka,apache-beam,windowing,Python,Apache Kafka,Apache Beam,Windowing,我正在尝试使用ApacheBeam2.23创建10秒的固定窗口,并将kafka作为数据源。 即使我尝试将AfterProcessingtime触发器设置为15,并且尝试使用GroupByKey时抛出以下错误,似乎每个记录都会被触发。 错误:KeyError:0[运行'[17]:FixedWindow'] 数据模拟: from kafka import KafkaProducer import time producer = KafkaProducer() id_val = 1001 while(
from kafka import KafkaProducer
import time
producer = KafkaProducer()
id_val = 1001
while(1):
message = {}
message['id_val'] = str(id_val)
message['sensor_1'] = 10
if (id_val<1003):
id_val = id_val+1
else:
id_val=1001
time.sleep(2)
print(time.time())
producer.send('test', str(message).encode())
class AddTimestampFn(beam.DoFn):
def process(self, element):
timestamp = int(time.time())
yield beam.window.TimestampedValue(element, timestamp)
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).streaming = True
p = beam.Pipeline(options=pipeline_options)
with beam.Pipeline() as p:
lines = p | "Reading messages from Kafka" >> kafkaio.KafkaConsume(kafka_config)
groups = (
lines
| 'ParseEventFn' >> beam.Map(lambda x: (ast.literal_eval(x[1])))
| 'Add timestamp' >> beam.ParDo(AddTimestampFn())
| 'After timestamp add ' >> beam.ParDo(PrintFn("timestamp add"))
| 'FixedWindow' >> beam.WindowInto(
beam.window.FixedWindows(10*1),allowed_lateness = 30)
| 'Group ' >> beam.GroupByKey())
| 'After group' >> beam.ParDo(PrintFn("after group")))
我做错了什么?我刚开始使用beam,所以它可能真的很傻。我也遇到了同样的问题,你是使用kafkaio.KafkaConsume作为beam的库还是由你定义的?你找到了问题的根源吗?我也遇到了同样的问题,您是使用kafkaio.KafkaConsume作为BEAM的库,还是由您定义的库?您是否了解了这一点?