Apache pig 小猪:Mongoinsertsatorage商店不起作用
我在一个pig脚本中执行以下简单代码:Apache pig 小猪:Mongoinsertsatorage商店不起作用,apache-pig,mongodb-hadoop,Apache Pig,Mongodb Hadoop,我在一个pig脚本中执行以下简单代码: REGISTER /home/myuser/mongodb/mongo-2.10.1.jar REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-core_cdh4.3.0-1.2.0.jar REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/li
REGISTER /home/myuser/mongodb/mongo-2.10.1.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-core_cdh4.3.0-1.2.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-pig_cdh4.3.0-1.2.0.jar
set mapred.map.tasks.speculative.execution false;
set mapred.reduce.tasks.speculative.execution false;
col = LOAD 'mongodb://localhost:27017/mydb.mycollection' using com.mongodb.hadoop.pig.MongoLoader ('id:chararray, companyId:chararray, ts:chararray', 'id');
STORE col INTO 'mongodb://localhost:27017/mydb.mycollection2' USING com.mongodb.hadoop.pig.MongoInsertStorage ('', '');
它返回以下错误:
文件pig_1396614639609.log的结尾:
。。。位于org.apache.hadoop.util.RunJar.mainRunJar.java:208
by:java.lang.IllegalArgumentException:无效的URI格式。URI必须
以mongodb://协议字符串开头。在
com.mongodb.hadoop.pig.MongoInsertStorage.setStoreLocationMongoInsertStorage.java:159
在
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.GetJobControlCompiler.java:576
... 还有17个
我不知道错误在哪里,所以mongodb协议字符串mongodb://写得很好。在同一个Pig脚本上使用mongo hadoop运行LOAD and STORE时,我遇到了类似的问题 它抛出
java.net.UnknownHostException: localhost:27017 is not a valid Inet address
at org.apache.hadoop.net.NetUtils.verifyHostnames(NetUtils.java:587)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
我没有进一步调查,但可能是一个bug或与锁定相关的一些参数。我不知道
如果我运行相同的代码,但加载和存储在不同的脚本中,那么它运行时不会出现问题
java.net.UnknownHostException: localhost:27017 is not a valid Inet address
at org.apache.hadoop.net.NetUtils.verifyHostnames(NetUtils.java:587)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)