Java 在传递json数组对象时,Kafka flapmapValues会将记录拆分为多个记录吗?

Java 在传递json数组对象时,Kafka flapmapValues会将记录拆分为多个记录吗?,java,json,apache-kafka,apache-kafka-streams,Java,Json,Apache Kafka,Apache Kafka Streams,我正在使用confluent 5.0.0版本* 我有一个JSON数组,如下所示: { "name" : "David,Corral,Babu", "age" : 23 } 通过使用kafka streams,我想根据“name”键值中的逗号标准将上述记录拆分为两个。输出应该类似于: { "name" : "David", "age" : 23 }, { "name" : "Corral", "age" : 23 }, { "n

我正在使用confluent 5.0.0版本*

我有一个JSON数组,如下所示:

{ 
    "name" : "David,Corral,Babu", 
    "age" : 23
}
通过使用kafka streams,我想根据“name”键值中的逗号标准将上述记录拆分为两个。输出应该类似于:

{ 
    "name" : "David", 
    "age" : 23
},
{ 
    "name" : "Corral", 
    "age" : 23
},
{
    "name" : "Babu", 
    "age" : 23
 }
为此,我使用“flatMapValues”。但到目前为止我还没能做到 预期结果

但是想检查“flatmapValues”是否是要使用的正确函数 满足我的要求

我使用了以下代码:


异常是因为您的
flatMapValues
生成了类型为
String
的值。在您的代码中,您不会将任何生成的
传递给
KStream::to
函数,因此它会尝试使用默认的一个(传入的属性),在您的情况下,它是
PersonSeder.class

您的值的类型为
String
,但
PersonSeder.class
用于序列化

如果你想分开它,你需要这样的东西

KStream输出=源
.flatMapValues(个人->
Arrays.stream(person.getName().split(“,”))
.map(name->newperson(name,Person.getAge()))
.collect(Collectors.toList());
我在序列化程序和反序列化程序中使用了以下代码,这是对称的(也使用了Gson),并且可以正常工作

Properties=newproperties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG,“app1”);
put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,“localhost:9092”);
put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT\u VALUE\u SERDE\u CLASS\u CONFIG,PersonSerdes.CLASS);
最终StreamsBuilder生成器=新StreamsBuilder();
KStream source=builder.stream(“输入”);
KStream输出=源
.flatMapValues(个人->
Arrays.stream(person.getName()
.拆分(“,”)
.map(name->newperson(name,Person.getAge()))
.collect(Collectors.toList());
输出。至(“输出”);
KafkaStreams streams=新的KafkaStreams(builder.build(),props);
streams.start();
Runtime.getRuntime().addShutdownHook(新线程(streams::close));
更新1:

根据您关于使用json代替POJO的问题,一切都取决于您的SEDE。如果您使用泛型Serdes,则可以对Json(Map)进行序列化和反序列化

下面是简单的MapSerdes,可用于此功能和使用示例代码

import com.google.gson.gson;
导入com.google.gson.reflect.TypeToken;
导入org.apache.kafka.common.serialization.Deserializer;
导入org.apache.kafka.common.serialization.Serde;
导入org.apache.kafka.common.serialization.Serializer;
导入java.lang.reflect.Type;
导入java.nio.charset.charset;
导入java.util.Map;
公共类MapSerdes实现了Serde{
私有静态最终字符集Charset=Charset.forName(“UTF-8”);
@凌驾
public void configure(映射配置,布尔isKey){}
@凌驾
公共void close(){}
@凌驾
公共序列化程序序列化程序(){
返回新的序列化程序(){
private Gson Gson=new Gson();
@凌驾
public void configure(映射配置,布尔isKey){}
@凌驾
公共字节[]序列化(字符串主题,映射数据){
String line=gson.toJson(data);//从字符串“line”返回字节
返回行.getBytes(字符集);
}
@凌驾
公共void close(){}
};
}
@凌驾
公共反序列化程序反序列化程序(){
返回新的反序列化程序(){
私有类型Type=new-TypeToken(){}.getType();
private Gson Gson=new Gson();
@凌驾
public void configure(映射配置,布尔isKey){}
@凌驾
公共映射反序列化(字符串主题,字节[]数据){
Map result=gson.fromJson(新字符串(数据),类型);
返回结果;
}
@凌驾
公共void close(){}
};
}
}
示例用法: 取而代之的是名称,这取决于您的地图,您可以使用不同的属性。

公共类GenericJsonSpliterApp{
公共静态void main(字符串[]args){
Properties props=新属性();
props.put(StreamsConfig.APPLICATION_ID_CONFIG,“app1”);
put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,“localhost:9092”);
put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT\u VALUE\u SERDE\u CLASS\u CONFIG,MapSerdes.CLASS);
最终StreamsBuilder生成器=新StreamsBuilder();
KStream source=builder.stream(“输入”);
KStream输出=源
.flatMapValues(地图->
Arrays.stream(map.get(“name”)
.拆分(“,”)
.map(名称->{
HashMap splittedMap=新的HashMap(map);
splittedMap.put(“name”,name);
返回splittedMap;
})
.collect(Collectors.toList());
输出。至(“输出”);
KafkaStreams streams=新的KafkaStreams(builder.build(),props);
streams.start();
Runtime.getRuntime().addShutdownHook(新线程(streams::close));
}
}

仍然收到相同的错误:由以下原因引起:org.apache.kafka.streams.errors.StreamsException:序列化程序(key:org.apache.kafka.common.serialization.StringSerializer/value:myapps.PersonSerializer)与实际的键或值类型(键类型:未知,因为键为null/值类型:java.lang)不兼容
package test;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.ValueMapper;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Produced;
import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;

public class RecordSplliter {

    public static void main(String[] args) throws Exception {
        System.out.println("** STARTING RecordSplliter STREAM APP **");
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "json-e44nric2315her");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, PersonSeder.class);

        final Serde<String> stringSerde = Serdes.String();
        final StreamsBuilder builder = new StreamsBuilder();

        // Consume JSON and enriches it
        KStream<String, Person> source = builder.stream("streams-plaintext-input");

        KStream<String, String> output = source
            .flatMapValues(person -> Arrays.asList(person.getName().split(",")));
        output.to("streams-output");

        final Topology topology = builder.build();
        final KafkaStreams streams = new KafkaStreams(topology, props);
        final CountDownLatch latch = new CountDownLatch(1);

        // Attach shutdown handler to catch control-c
        Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
            @Override
            public void run() {
                streams.close();
                latch.countDown();
            }
        });

        try {
            streams.start();
            latch.await();
        } catch (Throwable e) {
            System.exit(1);
        }
        System.exit(0);
    }
}
    08:31:10,822 ERROR 
    org.apache.kafka.streams.processor.internals.AssignedStreamsTasks  - 
    stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387- 
    StreamThread-1] Failed to process stream task 0_0 due to the following 
    error:
    org.apache.kafka.streams.errors.StreamsException: Exception caught in 
    process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=streams- 
    plaintext-input, partition=0, offset=0
    at 
 org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:304)
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
    at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409)
    at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:957)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: myapps.PersonSerializer) is not compatible to the actual key or value type (key type: unknown because key is null / value type: java.lang.String). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
    at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:94)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
    at org.apache.kafka.streams.kstream.internals.KStreamFlatMapValues$KStreamFlatMapValuesProcessor.process(KStreamFlatMapValues.java:42)
    at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
    at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:288)
    ... 6 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to myapps.Person
    at myapps.PersonSerializer.serialize(PersonSerializer.java:1)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:154)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:98)
    at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
    ... 18 more
08:31:10,827 INFO  org.apache.kafka.streams.processor.internals.StreamThread     - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
08:31:10,827 INFO  org.apache.kafka.streams.processor.internals.StreamThread     - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] Shutting down
08:31:10,833 INFO  org.apache.kafka.clients.producer.KafkaProducer               - [Producer clientId=json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1-producer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
08:31:10,843 INFO  org.apache.kafka.streams.processor.internals.StreamThread     - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
08:31:10,843 INFO  org.apache.kafka.streams.KafkaStreams                         - stream-client [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387] State transition from RUNNING to ERROR
08:31:10,843 WARN  org.apache.kafka.streams.KafkaStreams                         - stream-client [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387] All stream threads have died. The instance will be in error state and should be closed.
08:31:10,843 INFO  org.apache.kafka.streams.processor.internals.StreamThread     - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] Shutdown complete
Exception in thread "json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=streams-plaintext-input, partition=0, offset=0
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:304)
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
    at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409)
    at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:957)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: myapps.PersonSerializer) is not compatible to the actual key or value type (key type: unknown because key is null / value type: java.lang.String). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
    at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:94)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
    at org.apache.kafka.streams.kstream.internals.KStreamFlatMapValues$KStreamFlatMapValuesProcessor.process(KStreamFlatMapValues.java:42)
    at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
    at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:288)
    ... 6 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to myapps.Person
    at myapps.PersonSerializer.serialize(PersonSerializer.java:1)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:154)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:98)
    at 
 org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
    ... 18 more
KStream<String, Person> output = source
    .flatMapValues(person -> 
        Arrays.stream(person.getName().split(","))
            .map(name -> new Person(name, person.getAge()))
            .collect(Collectors.toList()));
public class GenericJsonSplitterApp {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "app1");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, MapSerdes.class);

        final StreamsBuilder builder = new StreamsBuilder();
        KStream<String, Map<String, String>> source = builder.stream("input");
        KStream<String, Map<String, String>> output = source
                .flatMapValues(map ->
                        Arrays.stream(map.get("name")
                                .split(","))
                                .map(name -> {
                                    HashMap<String, String> splittedMap = new HashMap<>(map);
                                    splittedMap.put("name", name);
                                    return splittedMap;
                                })
                                .collect(Collectors.toList()));
        output.to("output");

        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}