Mapreduce 通过Pig提交地图/减少作业时绑定JAR?

Mapreduce 通过Pig提交地图/减少作业时绑定JAR?,mapreduce,cassandra,thrift,apache-pig,Mapreduce,Cassandra,Thrift,Apache Pig,我正在尝试将Hadoop、Pig和Cassandra结合起来,以便能够通过简单的Pig查询处理存储在Cassandra中的数据。问题是,我无法让Pig创建实际使用CassandraStorage的地图/减少工作 我所做的是将storage-conf.xml文件从我的一台集群计算机上复制到contrib/pig(Cassandra的源发行版)上,然后将这些文件编译到Cassandra_loadfun.jar文件中 接下来,我修改了example-script.pig以包括所有jar: regist

我正在尝试将Hadoop、Pig和Cassandra结合起来,以便能够通过简单的Pig查询处理存储在Cassandra中的数据。问题是,我无法让Pig创建实际使用CassandraStorage的地图/减少工作

我所做的是将storage-conf.xml文件从我的一台集群计算机上复制到contrib/pig(Cassandra的源发行版)上,然后将这些文件编译到Cassandra_loadfun.jar文件中

接下来,我修改了example-script.pig以包括所有jar:

register /opt/pig/pig-0.7.0-core.jar;
register /tmp/apache-cassandra-0.6.3-src/lib/libthrift-r917130.jar;
REGISTER /tmp/apache-cassandra-0.6.3-src/contrib/pig/build/cassandra_loadfunc.jar;
rows = LOAD 'cassandra://Keyspace1/Standard1' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
cols = FOREACH rows GENERATE flatten($1);
colnames = FOREACH cols GENERATE $0;
namegroups = GROUP colnames BY $0;
namecounts = FOREACH namegroups GENERATE COUNT($1), group;
orderednames = ORDER namecounts BY $0;
topnames = LIMIT orderednames 50;
dump topnames;
因此,如果我没有弄错的话,JAR应该绑定到提交给hadoop的作业中。 但在运行作业时,它只是向我抛出一个异常:

2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job.
2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias topnames
    at org.apache.pig.PigServer.openIterator(PigServer.java:521)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias topnames
    at org.apache.pig.PigServer.store(PigServer.java:577)
    at org.apache.pig.PigServer.openIterator(PigServer.java:504)
    ... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
    at org.apache.pig.PigServer.store(PigServer.java:569)
    ... 7 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.NoClassDefFoundError: org/apache/thrift/TBase
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)

我不明白,因为thrift库是明确列出的,应该捆绑在一起,不是吗?

异常清楚地表明它无法找到TBase类

java.lang.NoClassDefFoundError:org/apache/thrift/TBase

拆开捆绑的罐子,检查thrift lib jar是否确实出现在正确的位置。储蓄罐可能被捆绑在不同的位置


您还可以尝试将jar放在捆绑jar的lib文件夹中。另一个选项是显式地将jar添加到类路径。

这些类都存在于生成的jar文件中,因此这不是真正的问题。可能是多个jar与同一个类org/apache/thrift/TBase发生冲突,或者jar没有正确注册。这将是我能想到的唯一理由,基于例外,对于那些在这里寻找这篇文章的人来说,这是一个例外。