Apache pig Pig进程多文件错误:错误0:在[]执行ForEach时出错

Apache pig Pig进程多文件错误:错误0:在[]执行ForEach时出错,apache-pig,Apache Pig,我在HDFS上的/user/bizlog/cpc目录下有4个文件A、B、C、D,记录如下: 87465422^C376832^C27786^C21161214^Ckey 这是我的猪脚本: cpc_all = load '/user/bizlog/cpc' using PigStorage('\u0003') as (cpcid, accountid, cpcplanid, cpcgrpid, key); cpc = foreach cpc_all generate accountid, key;

我在HDFS上的/user/bizlog/cpc目录下有4个文件A、B、C、D,记录如下: 87465422^C376832^C27786^C21161214^Ckey

这是我的猪脚本:

cpc_all = load '/user/bizlog/cpc' using PigStorage('\u0003') as (cpcid, accountid, cpcplanid, cpcgrpid, key);
cpc = foreach cpc_all generate accountid, key;
account_group = group cpc by accountid;
account_sort = order account_group by group;
account_key = foreach account_sort generate group, BagToTuple(cpc.key);
store account_key into 'last' using PigStorage('\u0003');
它将得到如下结果: 376832^CKE1^CKE2

上面的脚本假设要处理所有4个文件,但我得到以下错误:

Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.

Pig Stack Trace
---------------
ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []

org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
================================================================================
奇怪的是,如果我加载一个文件,比如加载“/user/bizlog/cpc/A”,那么脚本就会成功

如果我先加载每个文件,然后合并它们,它也可以正常工作

如果我把排序步骤放在最后,错误就会消失


hadoop版本为0.20.2,pig版本为0.12.1,如有任何帮助,将不胜感激,如评论中所述:

我把排序步骤放在最后,错误就消失了

虽然我在这个话题上没有找到太多的东西,但猪似乎不喜欢重新安排群体本身


因此,“解决方案”是重新排列为组生成的内容的输出,而不是对组本身进行排序。

还有一件事,如果我先加载每个文件,然后合并它们,它也可以正常工作。A、B、C、D是否具有相同的数据类型?是的,它们具有相同的数据类型奇怪,我把排序步骤放在最后,错误就消失了