Apache kafka 合并kafka流的序列化错误
我有N个卡夫卡主题包含相同的值类型。我希望将这些主题合并为一个主题,每个键都有限制事件 以下是我目前拥有的代码:Apache kafka 合并kafka流的序列化错误,apache-kafka,apache-kafka-streams,Apache Kafka,Apache Kafka Streams,我有N个卡夫卡主题包含相同的值类型。我希望将这些主题合并为一个主题,每个键都有限制事件 以下是我目前拥有的代码: KStream<Long, Event> allEvents = null; for (String topic : EventsTopics.split(",")) { KStream<Long, Event> events = builder.stream(topic, Consumed.with(lon
KStream<Long, Event> allEvents = null;
for (String topic : EventsTopics.split(",")) {
KStream<Long, Event> events = builder.stream(topic,
Consumed.with(longAvroSerde, EventsAvroSerde));
if (allEvents == null) {
allEvents = events;
} else {
allEvents = allEvents.merge(events);
}
}
allEvents
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(10)).grace(Duration.ofMillis(0)))
.reduce((value1, value2) -> value2)
.suppress(Suppressed.untilWindowCloses(unbounded()))
.toStream()
.peek((key, value) -> System.out.printf("key=%s, value=%s\n", key, value.toString()))
.to(mergeTopic);
但是,当包含多个主题时,会出现序列化错误
Exception in thread "merge-c30bd85c-2b6e-4460-ae3d-b7a5ffa117c5-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: stream-thread [merge-c30bd85c-2b6e-4460-ae3d-b7a5ffa117c5-StreamThread-1] task [0_0] Failed to flush state store KSTREAM-REDUCE-STATE-STORE-0000000003
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:453)
at org.apache.kafka.streams.processor.internals.StreamTask.prepareCommit(StreamTask.java:357)
at org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:955)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:851)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:714)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
Caused by: java.lang.IllegalArgumentException: Unsupported Avro type. Supported types are null, Boolean, Integer, Long, Float, Double, String, byte[] and IndexedRecord
at io.confluent.kafka.schemaregistry.avro.AvroSchemaUtils.getSchema(AvroSchemaUtils.java:121)
下面是合并的KStreamImpl在调试器中的外观:
allEvents = {KStreamImpl@1525}
repartitionRequired = false
repartitionNode = null
name = "KSTREAM-MERGE-0000000002"
keySerde = null
valSerde = null
subTopologySourceNodes = {HashSet@1527} size = 2
streamsGraphNode = {ProcessorGraphNode@1528} "ProcessorNode{processorParameters=ProcessorParameters{processor class=class org.apache.kafka.streams.kstream.internals.PassThrough$PassThroughProcessor, processor name='KSTREAM-MERGE-0000000002'}} StreamsGraphNode{nodeName='KSTREAM-MERGE-0000000002', buildPriority=3, hasWrittenToTopology=false, keyChangingOperation=false, valueChangingOperation=false, mergeNode=true, parentNodes=[KSTREAM-SOURCE-0000000000, KSTREAM-SOURCE-0000000001]}"
builder = {InternalStreamsBuilder@1529}
我是卡夫卡流的新手,所以不知道如何调查。非常感谢任何提示。您可以研究使用cogroup。您可以对每个流进行分组,然后从所有分组的流中创建一个共分组流。然后您可以打开窗口并聚合该流。它也比合并多个流然后将它们分组更有效
KTable<K, CG> cogrouped =
grouped1
.cogroup(aggregator1)
.cogroup(grouped2, aggregator2)
.cogroup(grouped3, aggregator3)
.windowedBy(TimeWindows.of(Duration.ofSeconds(10)).grace(Duration.ofMillis(0)))
.aggregate(initializer1, materialized1);
KTable同组=
分组1
.cogroup(聚合器1)
.cogroup(grouped2,aggregator2)
.cogroup(grouped3,aggregator3)
.windowedBy(TimeWindows.of(持续时间秒(10)).grace(持续时间毫秒(0)))
.合计(初始值设定项1,具体化1);
至于serdes错误,有时当有多个数据流传入时,数据流很难验证serdes是否可以使用,它依赖于配置的默认值。我建议您确保设置正确。这可能是当您有多个流时它们为空的原因。问题似乎与您正在使用的
eventsavrosard
有关,并且似乎与架构注册表相关,因此不是直接的KafkaStreams问题。-错误显示为不支持的Avro类型。
并来自试图写入KSTREAM-REDUCE-STATE-STORE-000000000 3
l的ProcessorStateManager.flush
,因此,它与REDUCE()
步骤有关。
KTable<K, CG> cogrouped =
grouped1
.cogroup(aggregator1)
.cogroup(grouped2, aggregator2)
.cogroup(grouped3, aggregator3)
.windowedBy(TimeWindows.of(Duration.ofSeconds(10)).grace(Duration.ofMillis(0)))
.aggregate(initializer1, materialized1);