如何在python中使用自定义Apache Avro字段类型_Python_Apache Kafka_Avro

如何在python中使用自定义Apache Avro字段类型

python apache-kafka

如何在python中使用自定义Apache Avro字段类型,python,apache-kafka,avro,Python,Apache Kafka,Avro,我已经访问了ApacheKafka集群，并且得到了一个描述消息的ApacheAvro序列化格式的文件。我正在用python编写一个小型测试使用者，在尝试解析模式时出现以下错误： SchemaParseException: Type property "{u'items': u'com.myapp.avromsg.common.MilestoneField', u'type': u'array'}" not a valid Avro schema: Items schema (com.myapp.

我已经访问了ApacheKafka集群，并且得到了一个描述消息的ApacheAvro序列化格式的文件。我正在用python编写一个小型测试使用者，在尝试解析模式时出现以下错误：

SchemaParseException: Type property "{u'items': u'com.myapp.avromsg.common.MilestoneField', u'type': u'array'}" not a valid Avro schema: Items schema (com.myapp.avromsg.common.MilestoneField) not a valid Avro schema: Could not make an Avro Schema object from com.myapp.avromsg.common.MilestoneField. (known names: [u'com.myapp.avromsg.runstatus.RunStatusMessage'])

在我看来，错误似乎来自不知道自定义字段类型MilestoneField。我如何在脚本中描述这个字段，以便序列化格式能够正确解析

以下是

my_msg.avsc

avro文件：

{
  "type": "record",
  "name": "RunStatusMessage",
  "namespace": "com.myapp.avromsg.runstatus",
  "fields": [
    {
      "name": "datasetID",
      "type": "string"
    },
    {
      "name": "runID",
      "type": ["string", "null"]
    },
    {
      "name": "registryRunID",
      "type": ["string", "null"]
    },
    {
      "name": "status",
      "type": "string"
    },
    {
      "name": "logs",
      "type": ["string", "null"]
    },
    {
      "name": "jobID",
      "type": ["string", "null"]
    },
    {
      "name": "validationsJson",
      "type": ["string", "null"]
    },
    {
      "name": "zone",
      "type": "string"
    },
    {
      "name": "milestoneFields",
      "type": {
        "type": "array",
        "items": "com.myapp.avromsg.common.MilestoneField"
      }
    },
    {
      "name": "ingestionParams",
      "type": {
        "type": "array",
        "items": "com.myapp.avromsg.common.MilestoneField"
      },
      "default": []
    },
    {
      "name": "timestamp",
      "type": [
        {
          "type": "long",
          "logicalType": "timestamp-millis"
        },
        {
          "type": "bytes",
          "logicalType": "decimal",
          "precision": 38,
          "scale": 0
        },
        "string",
        "int",
        "null"
      ]
    }
  ]
}

以下是我目前使用的代码：

import avro.schema
schema = avro.schema.parse(open('my_msg.avsc', 'rb').read())

我不知道如何用pyhon编写代码，但我可以提供java版本（我的期望应该是几乎相同的）。您有两种选择，包括将

MilestoneField

对象的定义作为架构的一部分（如果您在多个部分中使用它，则根本不干净），或者向

schema.Parser添加额外的类型。在本例中，我硬编码了模式，但想法与从文件中读取相同
public static void main(String [] args){
    Schema.Parser parser = new Schema.Parser();

    Schema pojo = new Schema.Parser().parse("{\n" +
            "  \"namespace\": \"io.fama.pubsub.schema\",\n" +
            "  \"type\": \"record\",\n" +
            "  \"name\": \"Pojo\",\n" +
            "  \"fields\": [\n" +
            "    {\n" +
            "      \"name\": \"field\",\n" +
            "      \"type\": \"string\"\n" +
            "    }\n" +
            "  ]\n" +
            "}");

    HashMap<String, Schema> extraTypes = new HashMap<>();
    extraTypes.put("Pojo", pojo);
    parser.addTypes(extraTypes);

    Schema schema = parser.parse("{\n" +
            "  \"namespace\": \"io.fama.pubsub.schema\",\n" +
            "  \"type\": \"record\",\n" +
            "  \"name\": \"PojoCollection\",\n" +
            "  \"fields\": [\n" +
            "    {\n" +
            "      \"name\": \"pojosCollection\",\n" +
            "      \"type\": {\n" +
            "        \"type\": \"array\",\n" +
            "        \"items\": \"Pojo\"\n" +
            "      }\n" +
            "    }, {\n" +
            "      \"name\": \"additionaField\",\n" +
            "      \"type\": [\"null\", \"string\"]\n" +
            "    }\n" +
            "  ]\n" +
            "}");
}

假设我有定义自定义字段和消息模式的avsc
文件，下面是我如何使用pythonavro实现这一点
import avro.schema
import json

schema_list = []

# First add the custom field to the schema list
custom_json = json.loads(open('custom_field.avsc', 'rb').read())
schema_list.append(custom_json)

# Then add the main message schema
main _json = json.loads(open('main _msg.avsc', 'rb').read())
schema_list.append(main _json)

# Convert the schema json to a JSON string
schema_json = json.dumps(schema_list)

# Parse the schema
full_msg_schema = avro.schema.parse(schema_json)

您在哪里定义MilestoneField对象？谢谢@hlagos。这为我使用Python指明了正确的方向
import avro.schema
import json

schema_list = []

# First add the custom field to the schema list
custom_json = json.loads(open('custom_field.avsc', 'rb').read())
schema_list.append(custom_json)

# Then add the main message schema
main _json = json.loads(open('main _msg.avsc', 'rb').read())
schema_list.append(main _json)

# Convert the schema json to a JSON string
schema_json = json.dumps(schema_list)

# Parse the schema
full_msg_schema = avro.schema.parse(schema_json)