获得;无法从空字符串创建路径";使用HCatalog JSON SerDe时出错

获得;无法从空字符串创建路径";使用HCatalog JSON SerDe时出错,json,hadoop,hive,serialization,hcatalog,Json,Hadoop,Hive,Serialization,Hcatalog,我正在尝试使用hcatalogjson Serde(来自HCatalog-core-0.5.0-cdh4.7.0.jar)使用一个配置单元表。我运行的是CDH4(Hadoop 2.0.0-CDH4.7.0和Hive 0.10.0-CDH4.7.0) 表定义: CREATE EXTERNAL TABLE some_table( user_id int COMMENT 'from deserializer', event_time int COMMENT 'from deserializer

我正在尝试使用hcatalogjson Serde(来自HCatalog-core-0.5.0-cdh4.7.0.jar)使用一个配置单元表。我运行的是CDH4(Hadoop 2.0.0-CDH4.7.0和Hive 0.10.0-CDH4.7.0)

表定义:

CREATE EXTERNAL TABLE some_table(
  user_id int COMMENT 'from deserializer',
  event_time int COMMENT 'from deserializer',
  some_string string COMMENT 'from deserializer',
  some_id string COMMENT 'from deserializer',
  another_id int COMMENT 'from deserializer')
PARTITIONED BY (
  year int,
  month int,
  day int)
ROW FORMAT SERDE
  'org.apache.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:8020/somedir/some_table'
TBLPROPERTIES (
  'last_modified_by'='volker',
  'last_modified_time'='1424980336',
  'transient_lastDdlTime'='1424980952')
创建的分区如下所示:

alter table some_table add if not exists partition (year=2015,month=02,day=26) location '/somedir/some_table/year=2015/month=02/day=26'
第一步进行得很顺利,我可以在选择所有列时读取数据:

hive> select * from some_table limit 10;
OK
671764813   1424980760  fbx NtiwgY  6   2015    02  26
1632511524  1424980760  fbx AdMybO  10  2015    02  26
1201817175  1424980760  fbx GgQJEd  6   2015    02  26
1621940110  1424980760  fbx qmsXNQ  12  2015    02  26
326380277   1424980760  fbx zgVFgP  2   2015    02  26
1256744282  1424980760  fbx GeIFxq  6   2015    02  26
1741961976  1424980760  fbx CiuxZU  8   2015    02  26
2009923690  1424980760  fbx ZmGOvK  2   2015    02  26
1728798342  1424980760  fbx YikDcV  8   2015    02  26
688185292   1424980760  fbx NssSWN  7   2015    02  26
但是,当我尝试在查询失败的任何位置读取或引用特定字段时:

hive> select another_id from some_table limit 10;
java.lang.IllegalArgumentException: Can not create a Path from an empty string
    at org.apache.hadoop.fs.Path.checkPathArg(Path.java:91)
    at org.apache.hadoop.fs.Path.<init>(Path.java:99)
    at org.apache.hadoop.fs.Path.<init>(Path.java:58)
    at org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:745)
    at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:849)
    at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:774)
    at org.apache.hadoop.mapred.JobClient.access$400(JobClient.java:178)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:991)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:976)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:950)
    at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
    at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

我希望这只是我的表定义的问题。欢迎任何帮助。

我没有将此用于HCatalog SerDe,但是,我想要的是将JSON存储在HDFS中,并将其作为配置单元表读取,通过使用不同的SerDe,我最终成功地做到了这一点,您可以在这里找到:

在CDH4上对我来说效果非常好

{"another_id":6,"user_id":671764813,"some_id":"NtiwgY","event_time":1424980760,"some_string":"fbx"}
{"another_id":10,"user_id":1632511524,"some_id":"AdMybO","event_time":1424980760,"some_string":"fbx"}
{"another_id":6,"user_id":1201817175,"some_id":"GgQJEd","event_time":1424980760,"some_string":"fbx"}