Hive 从具有嵌套结构的内部配置单元表读取时发生AvroTypeException
我在版本为3.6的Azure HDInsight群集上工作。它使用Hortonworks HDP2.6,它与Hive 2.1.0(在Tez 0.8.4上)一起提供 我有一些内部配置单元表,其中包含以Avro格式存储的嵌套结构字段。下面是CREATE语句的一个示例:Hive 从具有嵌套结构的内部配置单元表读取时发生AvroTypeException,hive,avro,hortonworks-data-platform,azure-hdinsight,Hive,Avro,Hortonworks Data Platform,Azure Hdinsight,我在版本为3.6的Azure HDInsight群集上工作。它使用Hortonworks HDP2.6,它与Hive 2.1.0(在Tez 0.8.4上)一起提供 我有一些内部配置单元表,其中包含以Avro格式存储的嵌套结构字段。下面是CREATE语句的一个示例: CREATE TABLE my_example_table( some_field STRING, some_other_field STRING, some_struct struct<field1:
CREATE TABLE my_example_table(
some_field STRING,
some_other_field STRING,
some_struct struct<field1: BIGINT, inner_struct struct<field2: STRING, field3: STRING>>)
PARTITIONED BY (year INT, month INT)
STORED AS AVRO;
当我想要查询内部表时,我得到了以下错误:失败,异常为java.io.IOException:org.apache.avro.AvroTypeException:Found core.record\u 0,应为union
我使用avro工具从其中一个内部表中提取了avro模式,并认识到Hive从我定义的结构创建联合类型:
{
"type" : "record",
"name" : "my_example_table",
"namespace" : "my_namespace",
"fields" : [ {
"name" : "some_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "some_other_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "my_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_0",
"namespace" : "",
"doc" : "struct<field1: BIGINT, struct<field2: STRING, field3: STRING>>",
"fields" : [ {
"name" : "field1",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "inner_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_2",
"namespace" : "",
"doc" : "struct<field2: STRING, field3: STRING>",
"fields" : [ {
"name" : "field2",
"type" : [ "null", "string" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "field2",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}]
}
]}
]}
]}
}
{
“类型”:“记录”,
“名称”:“我的示例表”,
“名称空间”:“我的名称空间”,
“字段”:[{
“名称”:“某些字段”,
“类型”:[“空”、“字符串”],
“默认值”:空
}, {
“名称”:“一些其他字段”,
“类型”:[“空”、“字符串”],
“默认值”:空
}, {
“名称”:“我的结构”,
“类型”:[“空”{
“类型”:“记录”,
“名称”:“记录0”,
“命名空间”:“”,
“doc”:“struct”,
“字段”:[{
“名称”:“字段1”,
“类型”:[“空”、“长”],
“doc”:“bigint”,
“默认值”:空
}, {
“名称”:“内部结构”,
“类型”:[“空”{
“类型”:“记录”,
“名称”:“记录2”,
“命名空间”:“”,
“doc”:“struct”,
“字段”:[{
“名称”:“字段2”,
“类型”:[“空”、“字符串”],
“doc”:“bigint”,
“默认值”:空
}, {
“名称”:“字段2”,
“类型”:[“空”、“长”],
“doc”:“bigint”,
“默认值”:空
}]
}
]}
]}
]}
}
这里出了什么问题?我很确定这几天前确实起了作用,所以我猜测微软将HDP换成了另一个补丁版本,用于HDInsight clusters,它有另一个Avro或Hive版本,但我没有发现任何迹象表明这一点
我发现:这似乎是非常类似的问题(在同一个蜂巢版本上)
有谁知道这里出了什么问题,我可以做些什么来解决这个问题或作为一个解决办法
{
"type" : "record",
"name" : "my_example_table",
"namespace" : "my_namespace",
"fields" : [ {
"name" : "some_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "some_other_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "my_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_0",
"namespace" : "",
"doc" : "struct<field1: BIGINT, struct<field2: STRING, field3: STRING>>",
"fields" : [ {
"name" : "field1",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "inner_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_2",
"namespace" : "",
"doc" : "struct<field2: STRING, field3: STRING>",
"fields" : [ {
"name" : "field2",
"type" : [ "null", "string" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "field2",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}]
}
]}
]}
]}
}