Hive 对avro logicalType的配置单元支持

Hive 对avro logicalType的配置单元支持,hive,avro,cloudera,hadoop2,hdp,Hive,Avro,Cloudera,Hadoop2,Hdp,我使用TDCH创建了一个avro文件。从avro文件生成的模式如图所示。 用于生成avro文件的Jar文件:paranamer-2.3.Jar、avro-1.9.2.Jar、avro-mapred-1.9.2.Jar 我将avro模式和avro文件上传到HDFS,并在其上创建了配置单元外部表 CREATE EXTERNAL TABLE AvroTDCH ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED

我使用TDCH创建了一个avro文件。从avro文件生成的模式如图所示。 用于生成avro文件的Jar文件:paranamer-2.3.Jar、avro-1.9.2.Jar、avro-mapred-1.9.2.Jar

我将avro模式和avro文件上传到HDFS,并在其上创建了配置单元外部表

CREATE EXTERNAL TABLE AvroTDCH 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
LOCATION '/data/sample/avro/tdch_test' 
TBLPROPERTIES ('avro.schema.url'='hdfs://NN01/data/sample/avroSchema/AVrotdchcomplex.avsc');
但当我试图从配置单元表读取数据时,它抛出如下异常

> select * from AvroTDCH ;
    java.lang.NumberFormatException: For input string: " 631"
            at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
            at java.lang.Integer.parseInt(Integer.java:569)
            at java.lang.Integer.parseInt(Integer.java:615)
            at java.sql.Date.valueOf(Date.java:133)
            at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:447)
            at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:423)
            at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:536)
            at org.apache.hive.beeline.Rows$Row.<init>(Rows.java:166)
            at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:53)
            at org.apache.hive.beeline.IncrementalRowsWithNormalization.<init>(IncrementalRowsWithNormalization.java:50)
            at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1820)
            at org.apache.hive.beeline.Commands.execute(Commands.java:878)
            at org.apache.hive.beeline.Commands.sql(Commands.java:730)
            at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1000)
            at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:835)
            at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:793)
            at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:493)
            at org.apache.hive.beeline.BeeLine.main(BeeLine.java:476)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
    Error: Unrecognized column type:DATE_TYPE (state=,code=0)
我怀疑jar版本不匹配可能是这里的问题。我还尝试使用addjar命令在beeline中添加1.9.2jar,但它也得到了相同的错误

HDP配置单元客户端中的Avro Jar

$ls-lrt/usr/hdp/current/hive client/lib/avro
-rw-r-r-1 root root 400680 2019年4月25日/usr/hdp/current/hive client/lib/avro-1.7.5.jar

在avro 1.8 cf.中添加了对日期的支持。理论上,已移植到hive 1.1 cf.>>您使用的hdp版本是什么???找到的hdp版本是:ls/usr/hdp/2.6.5.1153-2So,出于某些原因,HDP 2.6.5附带Hive 1.2,但支持AVRO 1.8>>的情况除外。您运气不好,必须将日期存储为ISO字符串,并在SQL中显式转换。或者更好的是,升级旧HDP版本,该版本将很快结束商业支持。感谢您的快速建议。令人惊讶的是,我发现timestamp-millis逻辑类型可以很好地工作,但decimal和double逻辑类型却不能。这也是因为这个版本吗?AVRO 1.8 cf.中增加了对DATE的支持,理论上,移植到Hive 1.1 cf.>>您使用的是什么版本的HDP???HDP版本是:ls/usr/HDP/2.6.5.1153-2So,出于某种原因,HDP 2.6.5附带Hive 1.2,除了支持AVRO 1.8>>您运气不好,必须将日期存储为ISO字符串,并在SQL中显式转换。或者更好的是,升级旧HDP版本,该版本将很快结束商业支持。感谢您的快速建议。令人惊讶的是,我发现timestamp-millis逻辑类型可以很好地工作,但decimal和double逻辑类型却不能。是不是也因为这个版本?
> select * from AvroTDCH ;
    java.lang.NumberFormatException: For input string: " 631"
            at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
            at java.lang.Integer.parseInt(Integer.java:569)
            at java.lang.Integer.parseInt(Integer.java:615)
            at java.sql.Date.valueOf(Date.java:133)
            at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:447)
            at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:423)
            at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:536)
            at org.apache.hive.beeline.Rows$Row.<init>(Rows.java:166)
            at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:53)
            at org.apache.hive.beeline.IncrementalRowsWithNormalization.<init>(IncrementalRowsWithNormalization.java:50)
            at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1820)
            at org.apache.hive.beeline.Commands.execute(Commands.java:878)
            at org.apache.hive.beeline.Commands.sql(Commands.java:730)
            at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1000)
            at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:835)
            at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:793)
            at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:493)
            at org.apache.hive.beeline.BeeLine.main(BeeLine.java:476)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
    Error: Unrecognized column type:DATE_TYPE (state=,code=0)