Java 多对一记录卡夫卡流_Java_Apache Kafka_Apache Kafka Streams

Java 多对一记录卡夫卡流

java apache-kafka

Java 多对一记录卡夫卡流,java,apache-kafka,apache-kafka-streams,Java,Apache Kafka,Apache Kafka Streams,我想把许多记录转换成一条信息。我尝试了很多方法，比如自定义还原和聚合器，但它们仍然会发送一对一的记录。例如，我想将许多字符串转换为一个字符串。如果我的流是具有相同键但不同值的消息，“the”，“sky”，“is”，“blue”，那么我想在新主题“the，sky，is，blue”中返回它们的一个串联。我收到的是4条信息“the，”，“the，sky，”，“the，sky，is，”，“the，sky，is，blue，”。当我向卡夫卡消费者发送第二条消息时，它将连接到上一个聚合上，我最终收到这个“天空

我想把许多记录转换成一条信息。我尝试了很多方法，比如自定义还原和聚合器，但它们仍然会发送一对一的记录。例如，我想将许多字符串转换为一个字符串。如果我的流是具有相同键但不同值的消息，“the”，“sky”，“is”，“blue”，那么我想在新主题“the，sky，is，blue”中返回它们的一个串联。我收到的是4条信息“the，”，“the，sky，”，“the，sky，is，”，“the，sky，is，blue，”。当我向卡夫卡消费者发送第二条消息时，它将连接到上一个聚合上，我最终收到这个“天空，是，蓝色，天空，是，蓝色，”

我还尝试使用一个自定义的storebuilder，并更改了很多设置，看看这是否有用


     Map<String, String> changelogConfig = new HashMap<>();
            changelogConfig.put("message.down.conversion.enable", "true");
            changelogConfig.put("flush.messages", "0");
            changelogConfig.put("flush.ms", "0");

     StoreBuilder<KeyValueStore<String, String>> aggStoreSupplier = Stores.keyValueStoreBuilder(
                    Stores.persistentKeyValueStore("AggStore"),
                    Serdes.String(),
                    Serdes.String())
                    .withLoggingEnabled(changelogConfig);

     KStream<String, String> results = source // single message get processed and eventually i get these string results I need to concatenate
            .groupByKey() // this kgroupedstream has the N records, which was how many were sent in the message
            .reduce(new Reducer<String>() {
                        @Override
                        public String apply(String aggValue, String value) {
                            return value + "," + aggValue;
                        }
                    }, Materialized.as("AggStore"))
                    .toStream();

     results.to("results", Produced.with(Serdes.String(), Serdes.String()));
     final Topology topology = builder.build(); // to describe topology
     System.out.println(topology.describe()); // to print description
     final KafkaStreams streams = new KafkaStreams(topology, props);

     final CountDownLatch latch = new CountDownLatch(1);
     // attach shutdown handler to catch control-c
     Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
       @Override
       public void run() {
                streams.close();
                latch.countDown();
            }
       });

       try {
            streams.cleanUp();
            streams.start();
            latch.await();
       } catch (Throwable e) {
            System.exit(1);
       }
     System.exit(0);


Map changelogConfig=newhashmap（）；
changelogConfig.put（“message.down.conversion.enable”，“true”）；
changelogConfig.put（“flush.messages”，“0”）；
changelogConfig.put（“flush.ms”，“0”）；
StoreBuilder aggStoreSupplier=Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore（“AggStore”），
Serdes.String（），
Serdes.String（））
.已启用日志（changelogConfig）；
KStream results=source//处理单个消息，最终得到需要连接的字符串结果
.groupByKey（）//此kgroupedstream具有N条记录，这是消息中发送的记录数
.减速器（新减速器（）{
@凌驾
公共字符串应用（字符串值、字符串值）{
返回值+“，”+aggValue；
}
}，具体化。作为（“AggStore”））
.toStream（）；
results.to（“results”，producted.with（Serdes.String（），Serdes.String（））；
最终拓扑=builder.build（）；//描述拓扑
System.out.println（topology.descripe（））；//打印说明
最终KafkaStreams streams=新的KafkaStreams（拓扑、道具）；
最终倒计时闩锁=新倒计时闩锁（1）；
//将关机处理程序连接到catch control-c
Runtime.getRuntime（）.addShutdownHook（新线程（“streams ShutdownHook”）{
@凌驾
公开募捐{
streams.close（）；
倒计时（）；
}
});
试一试{
streams.cleanUp（）；
streams.start（）；
satch.wait（）；
}捕获（可丢弃的e）{
系统出口（1）；
}
系统出口（0）；

我刚刚开始阅读处理器API文档。所以，如果有人能给我举一个例子，使用它也会很好。你怎么知道第一批消息和第二批消息应该产生两个输出消息（它们都有相同的密钥？）。使用DSL，您可能需要应用一些窗口？要仅获取窗口聚合的“最终”结果，可以使用

suppress（）

运算符。我知道批处理是批处理，因为它们源于单个消息。在这之前我做了一个一对多记录的步骤，现在我只想把我的多个记录合并成一个，然后发送回生成这些数据的API。我刚刚尝试了窗口聚合，但无法立即关闭窗口。我只想使用常规的reduce方法进行抑制。我也不想存储reduce方法的结果，因为Kafka不会用它做任何事情，只会用我的API。实际上，我正在跟进一些我如何知道它是一个批处理的问题。事实上，在处理完记录后，我并没有正确地重新键入它们。我会解决这个问题，然后重试所有操作。我明白了——总的来说，DSL可能不适合您。相反，您应该使用处理器API，它使您能够更灵活地将原始记录片段粘贴在一起。