Mapreduce 在PIG存储中实现动态定位-正确方式

Mapreduce 在PIG存储中实现动态定位-正确方式,mapreduce,apache-pig,cloudera,Mapreduce,Apache Pig,Cloudera,我正在尝试实现基于org.apache.pig.builtin.PigStorage的定制pig存储程序。在我的storer中,我想从发送到'INTO'word的字符串和其他作业属性中计算真正的hdfs位置 -- pig STORE data INTO 'MyObject' using ... 我扩展PigStorage并覆盖setStoreLocation功能,如下所示: public void setStoreLocation(String location, Job job) thro

我正在尝试实现基于org.apache.pig.builtin.PigStorage的定制pig存储程序。在我的storer中,我想从发送到'INTO'word的字符串和其他作业属性中计算真正的hdfs位置

-- pig
STORE data INTO 'MyObject' using ...
我扩展PigStorage并覆盖setStoreLocation功能,如下所示:

 public void setStoreLocation(String location, Job job) throws IOException {
      // location is 'MyObject' here
      // my manipulations on location
      String newLocation = basePath + '/' + someVar + '/' + location;
      super.setStoreLocation(newLocation, job);
}
它可以工作并将文件写入newLocation(/basePath/someVar/MyObject),但我在日志中得到以下消息,我如何避免它

java.io.FileNotFoundException: File hdfs://myMachine:8020/user/hdfs/MyObject does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:65)
at org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:543)
at org.apache.pig.tools.pigstats.JobStats.addOneOutputStats(JobStats.java:567)
at org.apache.pig.tools.pigstats.JobStats.addOutputStatistics(JobStats.java:516)
at org.apache.pig.tools.pigstats.PigStatsUtil.addSuccessJobStats(PigStatsUtil.java:360)
at org.apache.pig.tools.pigstats.PigStatsUtil.accumulateStats(PigStatsUtil.java:257)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:341)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
at org.apache.pig.PigServer.execute(PigServer.java:1297)
at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:478)
at org.apache.pig.PigRunner.run(PigRunner.java:49)
at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:286)
at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:226)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

看起来覆盖org.apache.pig.StoreFuncInterface#relToAbsPathForStoreLocation解决了以下问题:

@Override
public String relToAbsPathForStoreLocation(String location, Path curDir)
        throws IOException {
    return location;
}

hdfs://myMachine:8020/user/hdfs/MyObject
您的
新位置
?否,MyObject是store语句的一部分:将其存储到“MyObject”。。。新位置是根据许多作业参数生成的路径。