Pig/Hadoop中奇怪的强制转换错误

Pig/Hadoop中奇怪的强制转换错误,hadoop,apache-pig,Hadoop,Apache Pig,使用Pig 0.10.1,我有以下脚本: br = LOAD 'cfs:///somefile'; SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE; SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE; SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE; SPLIT not_s2 INTO s3 IF (s > 0L),

使用Pig 0.10.1,我有以下脚本:

br = LOAD 'cfs:///somefile';

SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE;
SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE;
SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE;
SPLIT not_s2 INTO s3 IF (s > 0L), s4 OTHERWISE;

tmp0 = FOREACH s0 GENERATE b, 'x' as seg;
tmp1 = FOREACH s1 GENERATE b, 'y' as seg;
tmp2 = FOREACH s2 GENERATE b, 'z' as seg;
tmp3 = FOREACH s3 GENERATE b, 'w' as seg;
tmp4 = FOREACH s4 GENERATE b, 't' as seg;

out = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;

dump out;
其中,
br
中加载的文件是由以前的Pig脚本生成的,并且具有嵌入的架构(一个.Pig_架构文件):

上面编辑了一些不相关的字段(目前我无法完全披露数据的性质)

脚本失败,出现以下错误:

ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Integer cannot be cast to java.lang.Long
然而,转储
s0
s1
s2
s3
s4
tmp0
tmp1
tmp2
tmp4
工作无误

Hadoop作业跟踪器显示以下错误4次:

java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
我还尝试了这个片段(而不是原来的
转储文件
):

我得到了一个不同的(但我假设相关的)错误:

作业跟踪器错误(重复4次):


我试图寻找已知的与工会有关的漏洞,但运气不佳。这实在令人费解。想法?

进一步挖掘后,看起来这是一个bug。我创建了一个。

当您在两个或多个关系之间执行联合操作时,我们应该注意字段的数据类型


上述问题是由于不兼容的数据类型而引起的。为避免此问题,请将您的字符声明为bytearray。您将消除此错误。

显然,它已作为错误解决,因为它不会影响最新版本的Pig。
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
x = UNION s1,s2;
y = FOREACH x GENERATE b;
dump y;
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Double cannot be cast to java.lang.Long
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.doComparison(GTOrEqualToExpr.java:111)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.getNext(GTOrEqualToExpr.java:78)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:141)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)