Apache pig Pig进程多文件错误:错误0:在[]执行ForEach时出错
我在HDFS上的/user/bizlog/cpc目录下有4个文件A、B、C、D,记录如下: 87465422^C376832^C27786^C21161214^Ckey 这是我的猪脚本:Apache pig Pig进程多文件错误:错误0:在[]执行ForEach时出错,apache-pig,Apache Pig,我在HDFS上的/user/bizlog/cpc目录下有4个文件A、B、C、D,记录如下: 87465422^C376832^C27786^C21161214^Ckey 这是我的猪脚本: cpc_all = load '/user/bizlog/cpc' using PigStorage('\u0003') as (cpcid, accountid, cpcplanid, cpcgrpid, key); cpc = foreach cpc_all generate accountid, key;
cpc_all = load '/user/bizlog/cpc' using PigStorage('\u0003') as (cpcid, accountid, cpcplanid, cpcgrpid, key);
cpc = foreach cpc_all generate accountid, key;
account_group = group cpc by accountid;
account_sort = order account_group by group;
account_key = foreach account_sort generate group, BagToTuple(cpc.key);
store account_key into 'last' using PigStorage('\u0003');
它将得到如下结果:
376832^CKE1^CKE2
上面的脚本假设要处理所有4个文件,但我得到以下错误:
Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.
Pig Stack Trace
---------------
ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
================================================================================
奇怪的是,如果我加载一个文件,比如加载“/user/bizlog/cpc/A”,那么脚本就会成功
如果我先加载每个文件,然后合并它们,它也可以正常工作
如果我把排序步骤放在最后,错误就会消失
hadoop版本为0.20.2,pig版本为0.12.1,如有任何帮助,将不胜感激,如评论中所述: 我把排序步骤放在最后,错误就消失了 虽然我在这个话题上没有找到太多的东西,但猪似乎不喜欢重新安排群体本身
因此,“解决方案”是重新排列为组生成的内容的输出,而不是对组本身进行排序。还有一件事,如果我先加载每个文件,然后合并它们,它也可以正常工作。A、B、C、D是否具有相同的数据类型?是的,它们具有相同的数据类型奇怪,我把排序步骤放在最后,错误就消失了