如何使用ApachePig在hadoop集群上加载文件?
我有一个pig脚本,需要从本地hadoop集群加载文件。我可以使用hadoop命令列出文件:hadoop fs–ls/repo/mydata,` 但当我尝试用pig脚本加载文件时,失败了。load语句如下所示:如何使用ApachePig在hadoop集群上加载文件?,hadoop,apache-pig,Hadoop,Apache Pig,我有一个pig脚本,需要从本地hadoop集群加载文件。我可以使用hadoop命令列出文件:hadoop fs–ls/repo/mydata,` 但当我尝试用pig脚本加载文件时,失败了。load语句如下所示: in = LOAD '/repo/mydata/2012/02' USING PigStorage() AS (event:chararray, user:chararray) 错误消息是: Message: org.apache.pig.backend.executionengine
in = LOAD '/repo/mydata/2012/02' USING PigStorage() AS (event:chararray, user:chararray)
错误消息是:
Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/repo/mydata/2012/02
有什么想法吗?谢谢我的建议:
hadoop fs-mkdir/pigdata
hadoop fs-put/opt/pig/tutorial/data/exite-small.log/pigdata
grunt>copyFromLocal/opt/pig/tutorial/data/excite-small.log/pigdata
)
grunt> set debug on
grunt> set job.name 'first-p2-job'
grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS
(user:chararray, time:long, query:chararray);
grunt> grpd = GROUP log BY user;
grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
grunt> STORE cntd INTO 'output';
hdfs://hostname:54310/pigdata/output
我也面临同样的问题。。以下是我的建议:
去掉“=”两边的空格
in=LOAD'/repo/mydata/2012/02'使用PigStorage()作为(事件:chararray,用户:chararray)得到它,应该是这样的:in=LOAD“hdfs:/repo/mydata/2012/02”使用PigStorage()作为。。。