Java 用于将字段从基元演化为并集的Avro模式解析_Java_Avro

Java 用于将字段从基元演化为并集的Avro模式解析

java

Java 用于将字段从基元演化为并集的Avro模式解析,java,avro,Java,Avro,我正在使用its与AVRO1.7.0合作，我在处理当前模式演变的案例时遇到了一个问题。我们在这里处理的场景是，通过将字段更改为null和该基本类型的并集，使基本类型字段成为可选字段我将使用一个简单的例子。基本上，我们的模式是：首字母：带有一个类型为int 第二个版本：相同的记录，相同的字段名，但类型现在是null和int 根据Avro规范的要求，此类情况的解决方案应为：如果读者是工会，而作者不是读者联盟中与作者模式匹配的第一个模式将根据它递归解析。如果不匹配，则发出错误信号我的解释

我正在使用its与AVRO1.7.0合作，我在处理当前模式演变的案例时遇到了一个问题。我们在这里处理的场景是，通过将字段更改为

null

和该基本类型的并集，使基本类型字段成为可选字段

我将使用一个简单的例子。基本上，我们的模式是：

首字母：带有一个类型为
```
int
```
第二个版本：相同的记录，相同的字段名，但类型现在是
```
null
```
和
```
int
```

根据Avro规范的要求，此类情况的解决方案应为：

如果读者是工会，而作者不是
读者联盟中与作者模式匹配的第一个模式将根据它递归解析。如果不匹配，则发出错误信号

我的解释是，我们应该正确地解析使用初始模式序列化的数据，因为

int

是读者模式中联合的一部分

但是，当使用版本2运行读取版本1序列化的记录的测试时，我得到

org.apache.avro.AvroTypeException:当需要联合时尝试处理int。

下面的测试正好显示了这一点：

@Test
public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
    Schema writerSchema = new Schema.Parser().parse("{\n" +
            "    \"type\":\"record\",\n" +
            "    \"name\":\"NeighborComparisons\",\n" +
            "    \"fields\": [\n" +
            "      {\"name\": \"test\",\n" +
            "      \"type\": \"int\" }]} ");

    Schema readersSchema = new Schema.Parser().parse(" {\n" +
            "    \"type\":\"record\",\n" +
            "    \"name\":\"NeighborComparisons\",\n" +
            "    \"fields\": [ {\n" +
            "      \"name\": \"test\",\n" +
            "      \"type\": [\"null\", \"int\"],\n" +
            "      \"default\": null } ]  }");

    // Writing a record using the initial schema with the 
    // test field defined as an int
    GenericData.Record record = new GenericData.Record(writerSchema);
    record.put("test", Integer.valueOf(10));        
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    JsonEncoder jsonEncoder = EncoderFactory.get().
       jsonEncoder(writerSchema, output);
    GenericDatumWriter<GenericData.Record> writer = new 
       GenericDatumWriter<GenericData.Record>(writerSchema);
    writer.write(record, jsonEncoder);
    jsonEncoder.flush();
    output.flush();

    System.out.println(output.toString());

    // We try reading it back using the second schema 
    // version where the test field is defined as a union of null and int
    JsonDecoder jsonDecoder = DecoderFactory.get().
        jsonDecoder(readersSchema, output.toString());
    GenericDatumReader<GenericData.Record> reader =
            new GenericDatumReader<GenericData.Record>(writerSchema, 
                readersSchema);
    GenericData.Record read = reader.read(null, jsonDecoder);

    // We should be able to assert that the value is 10 but it
    // fails on reading the record before getting here
    assertEquals(10, read.get("test"));
}

@测试
public void TestReadingUnionFromValueWritenSPrimitive（）引发异常{
Schema writerSchema=new Schema.Parser（）.parse（“{\n”+
“\”类型\“：\”记录\“，\n”+
“\”名称\“：\”邻居比较\“，\n”+
“\“字段\”：[\n”+
{\'name\'：\'test\'，\n+
“\”type\“：\”int\“}]}”）；
Schema readersSchema=new Schema.Parser（）.parse（“{\n”+
“\”类型\“：\”记录\“，\n”+
“\”名称\“：\”邻居比较\“，\n”+
“\”字段\“：[{\n”+
“\”名称“：\”测试“，\n”+
“\”类型\“：[\”空\“，\”整数\“]，\n”+
“\“default\”：null}]}”）；
//使用带有
//定义为int的测试字段
GenericData.Record记录=新的GenericData.Record（writerSchema）；
记录.put（“测试”，整数.valueOf（10））；
ByteArrayOutputStream输出=新建ByteArrayOutputStream（）；
JsonEncoder JsonEncoder=EncoderFactory.get（）。
jsonEncoder（writerSchema，输出）；
GenericDatumWriter=新建
通用数据编写器（writerSchema）；
writer.write（记录、编码）；
jsonEncoder.flush（）；
output.flush（）；
System.out.println（output.toString（））；
//我们尝试使用第二个模式读回它
//将测试字段定义为null和int的并集的版本
JsonDecoder JsonDecoder=DecoderFactory.get（）。
jsonDecoder（readersSchema，output.toString（））；
通用数据阅读器=
新的GenericDatumReader（writerSchema，
读者模式）；
GenericData.Record read=reader.read（null，jsonDecoder）；
//我们应该能够断言该值为10，但它
//在到达这里之前读取记录失败
assertEquals（10，读取。获取（“测试”））；
}

我想知道我的期望是否正确（这应该能成功解决吗？），或者我在哪里没有正确使用avro来处理这种情况。

将基本模式迁移到null和基本模式的并集的期望是正确的

上面代码的问题在于如何创建解码器。解码器需要的是作者的模式，而不是读者的模式

而不是这样做：

JsonDecoder jsonDecoder = DecoderFactory.get().
    jsonDecoder(readersSchema, output.toString());

应该是这样的:

JsonDecoder jsonDecoder = DecoderFactory.get().
    jsonDecoder(writerSchema, output.toString());

Doug Cutting获得了avro用户邮件列表上的答案：

确切的评论如下：