Serialization 如何从Kafka Connect SourceTask将AVRO序列化程序与架构注册表一起使用

Serialization 如何从Kafka Connect SourceTask将AVRO序列化程序与架构注册表一起使用,serialization,schema,apache-kafka,avro,Serialization,Schema,Apache Kafka,Avro,我已经建立了Confluence数据平台并开始开发SourceConnector,在相应的SourceTask.poll()方法中,我执行了以下操作(下面是伪Java代码): public List poll()引发InterruptedException{ .... 信封=新信封(); ByteArrayOutputStream out=新建ByteArrayOutputStream(); 编码器enc=EncoderFactory.get().binaryEncoder(out,null);

我已经建立了Confluence数据平台并开始开发SourceConnector,在相应的SourceTask.poll()方法中,我执行了以下操作(下面是伪Java代码):

public List poll()引发InterruptedException{
....
信封=新信封();
ByteArrayOutputStream out=新建ByteArrayOutputStream();
编码器enc=EncoderFactory.get().binaryEncoder(out,null);
DatumWriter dw=新的反射DatumWriter(信封类);
写((信封)信封,附件);
附件:冲洗();
out.close();
Map sourcePartition=new HashMap();
sourcePartition.put(“stream”,streamName);
Map sourceOffset=newhashmap();
sourceOffset.put(“position”,Integer.parseInt(envelope.getTimestamp());
添加(新的SourceRecord(sourcePartition,sourceOffset,topic,org.apache.kafka.connect.data.Schema.BYTES_Schema,信封));
....
我希望使用模式注册表,以便使用注册表中的模式id标记要序列化的对象,然后序列化,然后通过poll()发布到Kafka主题如果任意对象的模式不在注册表中,我希望将其注册,并将相应生成的id返回到序列化程序进程,使其成为序列化对象的一部分,从而使其可反序列化


要实现这一点,我需要在上面的代码中做些什么?

要使用SchemaRegistry,您必须使用Confluent提供的类对数据进行序列化/反序列化:

  • io.confluent.kafka.serializers.KafkaAvroSerializer
  • io.confluent.kafka.serializers.Kafkavrodeserializer
这些类包含从注册表注册和请求模式的所有逻辑

如果使用maven,可以添加此依赖项:

<dependency>
  <groupId>io.confluent</groupId>
  <artifactId>kafka-avro-serializer</artifactId>
  <version>2.0.1</version>
</dependency>

合流的
卡夫卡avro序列化程序
2.0.1
签出示例实现

您将需要confluent的以下依赖项才能使其工作

    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>common-config</artifactId>
        <version>3.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>common-utils</artifactId>
        <version>3.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>kafka-schema-registry-client</artifactId>
        <version>3.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>kafka-avro-serializer</artifactId>
        <version>3.0.0</version>
    </dependency>

合流的
公共配置
3.0.0
合流的
公用公用事业
3.0.0
合流的
卡夫卡模式注册表客户端
3.0.0
合流的
卡夫卡avro序列化程序
3.0.0
根据:

在POM中:

<dependency>
    <groupId>io.confluent</groupId>
    <artifactId>kafka-avro-serializer</artifactId>
    <version>3.3.1</version>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.11</artifactId>
    <version>0.11.0.1-cp1</version>
    <scope>provided</scope>
</dependency>
使用生产者:

Properties props = new Properties();
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://localhost:8081");
// Set any other properties
KafkaProducer producer = new KafkaProducer(props);
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
Future<RecordAndMetadata> resultFuture = producer.send(user1);
User user1=新用户();
user1.setName(“Alyssa”);
user1.setFavoriteNumber(256);
Future resultFuture=producer.send(user1);
在您的注册表中,对于本例,您需要“User”的模式

Confluent还具有以下功能:

package io.confluent.examples.producer;
导入javasessione.avro.LogLine;
导入org.apache.kafka.clients.producer.KafkaProducer;
导入org.apache.kafka.clients.producer.producer;
导入org.apache.kafka.clients.producer.ProducerRecord;
导入java.util.Properties;
导入java.util.Random;
公共类AvroClicksProducer{
公共静态void main(字符串[]args)引发InterruptedException{
如果(args.length!=1){
System.out.println(“请提供命令行参数:schemaRegistryUrl”);
系统退出(-1);
}
字符串schemaUrl=args[0];
Properties props=新属性();
//为本例硬编码Kafka服务器URI
put(“bootstrap.servers”,“localhost:9092”);
道具放置(“阿克斯”、“全部”);
道具放置(“重试”,0);
props.put(“key.serializer”、“io.confluent.kafka.serializers.KafkaAvroSerializer”);
put(“value.serializer”、“io.confluent.kafka.serializers.KafkaAvroSerializer”);
put(“schema.registry.url”,schemaUrl);
//硬编码主题也一样。
String topic=“单击”;
//硬编码会在事件之间等待,所以演示体验会非常好
int wait=500;
制作人=新卡夫卡制作人(道具);
//我们不断生成新事件,并在它们之间等待,直到有人ctrl-c
while(true){
LogLine event=EventGenerator.getNext();
System.out.println(“生成的事件”+事件.toString());
//使用IP作为密钥,因此来自同一IP的事件将进入同一分区
ProducerRecord记录=新的ProducerRecord(主题,event.getIp().toString(),事件);
制作人。发送(记录);
线程。睡眠(等待);
}
}
}

Hi!那么为了让KafkaAvroSerializer能够使用,我应该在我提供的伪代码片段中将序列化内容放在什么地方呢?我不需要指定
common-config
common-utils
,因为maven会自动获得它们
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
Future<RecordAndMetadata> resultFuture = producer.send(user1);
package io.confluent.examples.producer;

import JavaSessionize.avro.LogLine;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;
import java.util.Random;

public class AvroClicksProducer {

    public static void main(String[] args) throws InterruptedException {
        if (args.length != 1) {
            System.out.println("Please provide command line arguments: schemaRegistryUrl");
            System.exit(-1);
        }

        String schemaUrl = args[0];

        Properties props = new Properties();
        // hardcoding the Kafka server URI for this example
        props.put("bootstrap.servers", "localhost:9092");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
        props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
        props.put("schema.registry.url", schemaUrl);

        // Hard coding topic too.
        String topic = "clicks";

        // Hard coding wait between events so demo experience will be uniformly nice
        int wait = 500;

        Producer<String, LogLine> producer = new KafkaProducer<String, LogLine>(props);

        // We keep producing new events and waiting between them until someone ctrl-c
        while (true) {
            LogLine event = EventGenerator.getNext();
            System.out.println("Generated event " + event.toString());

            // Using IP as key, so events from same IP will go to same partition
            ProducerRecord<String, LogLine> record = new ProducerRecord<String, LogLine>(topic, event.getIp().toString(), event);
            producer.send(record);
            Thread.sleep(wait);
        }
    }
}