Serialization 如何从Kafka Connect SourceTask将AVRO序列化程序与架构注册表一起使用
我已经建立了Confluence数据平台并开始开发SourceConnector,在相应的SourceTask.poll()方法中,我执行了以下操作(下面是伪Java代码):Serialization 如何从Kafka Connect SourceTask将AVRO序列化程序与架构注册表一起使用,serialization,schema,apache-kafka,avro,Serialization,Schema,Apache Kafka,Avro,我已经建立了Confluence数据平台并开始开发SourceConnector,在相应的SourceTask.poll()方法中,我执行了以下操作(下面是伪Java代码): public List poll()引发InterruptedException{ .... 信封=新信封(); ByteArrayOutputStream out=新建ByteArrayOutputStream(); 编码器enc=EncoderFactory.get().binaryEncoder(out,null);
public List poll()引发InterruptedException{
....
信封=新信封();
ByteArrayOutputStream out=新建ByteArrayOutputStream();
编码器enc=EncoderFactory.get().binaryEncoder(out,null);
DatumWriter dw=新的反射DatumWriter(信封类);
写((信封)信封,附件);
附件:冲洗();
out.close();
Map sourcePartition=new HashMap();
sourcePartition.put(“stream”,streamName);
Map sourceOffset=newhashmap();
sourceOffset.put(“position”,Integer.parseInt(envelope.getTimestamp());
添加(新的SourceRecord(sourcePartition,sourceOffset,topic,org.apache.kafka.connect.data.Schema.BYTES_Schema,信封));
....
我希望使用模式注册表,以便使用注册表中的模式id标记要序列化的对象,然后序列化,然后通过poll()发布到Kafka主题如果任意对象的模式不在注册表中,我希望将其注册,并将相应生成的id返回到序列化程序进程,使其成为序列化对象的一部分,从而使其可反序列化
要实现这一点,我需要在上面的代码中做些什么?要使用SchemaRegistry,您必须使用Confluent提供的类对数据进行序列化/反序列化:
- io.confluent.kafka.serializers.KafkaAvroSerializer
- io.confluent.kafka.serializers.Kafkavrodeserializer
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-avro-serializer</artifactId>
<version>2.0.1</version>
</dependency>
合流的
卡夫卡avro序列化程序
2.0.1
签出示例实现
您将需要confluent的以下依赖项才能使其工作
<dependency>
<groupId>io.confluent</groupId>
<artifactId>common-config</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>common-utils</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-client</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-avro-serializer</artifactId>
<version>3.0.0</version>
</dependency>
合流的
公共配置
3.0.0
合流的
公用公用事业
3.0.0
合流的
卡夫卡模式注册表客户端
3.0.0
合流的
卡夫卡avro序列化程序
3.0.0
根据:
在POM中:
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-avro-serializer</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.11.0.1-cp1</version>
<scope>provided</scope>
</dependency>
使用生产者:
Properties props = new Properties();
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://localhost:8081");
// Set any other properties
KafkaProducer producer = new KafkaProducer(props);
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
Future<RecordAndMetadata> resultFuture = producer.send(user1);
User user1=新用户();
user1.setName(“Alyssa”);
user1.setFavoriteNumber(256);
Future resultFuture=producer.send(user1);
在您的注册表中,对于本例,您需要“User”的模式
Confluent还具有以下功能:
package io.confluent.examples.producer;
导入javasessione.avro.LogLine;
导入org.apache.kafka.clients.producer.KafkaProducer;
导入org.apache.kafka.clients.producer.producer;
导入org.apache.kafka.clients.producer.ProducerRecord;
导入java.util.Properties;
导入java.util.Random;
公共类AvroClicksProducer{
公共静态void main(字符串[]args)引发InterruptedException{
如果(args.length!=1){
System.out.println(“请提供命令行参数:schemaRegistryUrl”);
系统退出(-1);
}
字符串schemaUrl=args[0];
Properties props=新属性();
//为本例硬编码Kafka服务器URI
put(“bootstrap.servers”,“localhost:9092”);
道具放置(“阿克斯”、“全部”);
道具放置(“重试”,0);
props.put(“key.serializer”、“io.confluent.kafka.serializers.KafkaAvroSerializer”);
put(“value.serializer”、“io.confluent.kafka.serializers.KafkaAvroSerializer”);
put(“schema.registry.url”,schemaUrl);
//硬编码主题也一样。
String topic=“单击”;
//硬编码会在事件之间等待,所以演示体验会非常好
int wait=500;
制作人=新卡夫卡制作人(道具);
//我们不断生成新事件,并在它们之间等待,直到有人ctrl-c
while(true){
LogLine event=EventGenerator.getNext();
System.out.println(“生成的事件”+事件.toString());
//使用IP作为密钥,因此来自同一IP的事件将进入同一分区
ProducerRecord记录=新的ProducerRecord(主题,event.getIp().toString(),事件);
制作人。发送(记录);
线程。睡眠(等待);
}
}
}
Hi!那么为了让KafkaAvroSerializer能够使用,我应该在我提供的伪代码片段中将序列化内容放在什么地方呢?我不需要指定common-config
和common-utils
,因为maven会自动获得它们
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
Future<RecordAndMetadata> resultFuture = producer.send(user1);
package io.confluent.examples.producer;
import JavaSessionize.avro.LogLine;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.Properties;
import java.util.Random;
public class AvroClicksProducer {
public static void main(String[] args) throws InterruptedException {
if (args.length != 1) {
System.out.println("Please provide command line arguments: schemaRegistryUrl");
System.exit(-1);
}
String schemaUrl = args[0];
Properties props = new Properties();
// hardcoding the Kafka server URI for this example
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", schemaUrl);
// Hard coding topic too.
String topic = "clicks";
// Hard coding wait between events so demo experience will be uniformly nice
int wait = 500;
Producer<String, LogLine> producer = new KafkaProducer<String, LogLine>(props);
// We keep producing new events and waiting between them until someone ctrl-c
while (true) {
LogLine event = EventGenerator.getNext();
System.out.println("Generated event " + event.toString());
// Using IP as key, so events from same IP will go to same partition
ProducerRecord<String, LogLine> record = new ProducerRecord<String, LogLine>(topic, event.getIp().toString(), event);
producer.send(record);
Thread.sleep(wait);
}
}
}