Hive 对avro logicalType的配置单元支持
我使用TDCH创建了一个avro文件。从avro文件生成的模式如图所示。 用于生成avro文件的Jar文件:paranamer-2.3.Jar、avro-1.9.2.Jar、avro-mapred-1.9.2.Jar 我将avro模式和avro文件上传到HDFS,并在其上创建了配置单元外部表Hive 对avro logicalType的配置单元支持,hive,avro,cloudera,hadoop2,hdp,Hive,Avro,Cloudera,Hadoop2,Hdp,我使用TDCH创建了一个avro文件。从avro文件生成的模式如图所示。 用于生成avro文件的Jar文件:paranamer-2.3.Jar、avro-1.9.2.Jar、avro-mapred-1.9.2.Jar 我将avro模式和avro文件上传到HDFS,并在其上创建了配置单元外部表 CREATE EXTERNAL TABLE AvroTDCH ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED
CREATE EXTERNAL TABLE AvroTDCH
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/data/sample/avro/tdch_test'
TBLPROPERTIES ('avro.schema.url'='hdfs://NN01/data/sample/avroSchema/AVrotdchcomplex.avsc');
但当我试图从配置单元表读取数据时,它抛出如下异常
> select * from AvroTDCH ;
java.lang.NumberFormatException: For input string: " 631"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.parseInt(Integer.java:615)
at java.sql.Date.valueOf(Date.java:133)
at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:447)
at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:423)
at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:536)
at org.apache.hive.beeline.Rows$Row.<init>(Rows.java:166)
at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:53)
at org.apache.hive.beeline.IncrementalRowsWithNormalization.<init>(IncrementalRowsWithNormalization.java:50)
at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1820)
at org.apache.hive.beeline.Commands.execute(Commands.java:878)
at org.apache.hive.beeline.Commands.sql(Commands.java:730)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1000)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:835)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:793)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:493)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:476)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Error: Unrecognized column type:DATE_TYPE (state=,code=0)
我怀疑jar版本不匹配可能是这里的问题。我还尝试使用addjar命令在beeline中添加1.9.2jar,但它也得到了相同的错误
HDP配置单元客户端中的Avro Jar
$ls-lrt/usr/hdp/current/hive client/lib/avro
-rw-r-r-1 root root 400680 2019年4月25日/usr/hdp/current/hive client/lib/avro-1.7.5.jar在avro 1.8 cf.中添加了对日期的支持。理论上,已移植到hive 1.1 cf.>>您使用的hdp版本是什么???找到的hdp版本是:ls/usr/hdp/2.6.5.1153-2So,出于某些原因,HDP 2.6.5附带Hive 1.2,但支持AVRO 1.8>>的情况除外。您运气不好,必须将日期存储为ISO字符串,并在SQL中显式转换。或者更好的是,升级旧HDP版本,该版本将很快结束商业支持。感谢您的快速建议。令人惊讶的是,我发现timestamp-millis逻辑类型可以很好地工作,但decimal和double逻辑类型却不能。这也是因为这个版本吗?AVRO 1.8 cf.中增加了对DATE的支持,理论上,移植到Hive 1.1 cf.>>您使用的是什么版本的HDP???HDP版本是:ls/usr/HDP/2.6.5.1153-2So,出于某种原因,HDP 2.6.5附带Hive 1.2,除了支持AVRO 1.8>>您运气不好,必须将日期存储为ISO字符串,并在SQL中显式转换。或者更好的是,升级旧HDP版本,该版本将很快结束商业支持。感谢您的快速建议。令人惊讶的是,我发现timestamp-millis逻辑类型可以很好地工作,但decimal和double逻辑类型却不能。是不是也因为这个版本?
> select * from AvroTDCH ;
java.lang.NumberFormatException: For input string: " 631"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.parseInt(Integer.java:615)
at java.sql.Date.valueOf(Date.java:133)
at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:447)
at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:423)
at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:536)
at org.apache.hive.beeline.Rows$Row.<init>(Rows.java:166)
at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:53)
at org.apache.hive.beeline.IncrementalRowsWithNormalization.<init>(IncrementalRowsWithNormalization.java:50)
at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1820)
at org.apache.hive.beeline.Commands.execute(Commands.java:878)
at org.apache.hive.beeline.Commands.sql(Commands.java:730)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1000)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:835)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:793)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:493)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:476)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Error: Unrecognized column type:DATE_TYPE (state=,code=0)