连接Hadoop+;AWS EMR上带有MongoDB的配置单元(找不到类com/MongoDB/DBObject)
我喜欢通过MongoDB连接(而不是通过BSON转储)将EMR集群连接到我们的MongoDB 为此,我在AWS管理控制台上生成了集群。在引导配置中,我指向位于S3上的文件:连接Hadoop+;AWS EMR上带有MongoDB的配置单元(找不到类com/MongoDB/DBObject),hadoop,amazon-web-services,hive,mongodb-java,emr,Hadoop,Amazon Web Services,Hive,Mongodb Java,Emr,我喜欢通过MongoDB连接(而不是通过BSON转储)将EMR集群连接到我们的MongoDB 为此,我在AWS管理控制台上生成了集群。在引导配置中,我指向位于S3上的文件: #!/bin/sh wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar wget -P /home/hadoop/lib htt
#!/bin/sh
wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-core-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-pig-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-hive-1.3.2.jar
当集群生成时,我将其保存到主集群中,并看到它们已成功下载
当我在蜂巢外壳中执行此操作时:
CREATE TABLE nicks
(
id STRING,
name STRING,
business STRING,
alias STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
TBLPROPERTIES('mongo.uri'='mongodb://54.93.123.123:27017/foo.aliases');
ADD JAR /home/hadoop/lib/mongo-hadoop-core-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-hadoop-hive-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-java-driver-2.13.0.jar;
Select * from nicks;
我得到以下例外:
Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/DBObject
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitterByClass(MongoSplitterFactory.java:41)
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitter(MongoSplitterFactory.java:109)
at com.mongodb.hadoop.hive.input.HiveMongoInputFormat.getSplits(HiveMongoInputFormat.java:64)
at com.mongodb.hadoop.hive.input.HiveMongoInputFormat.getSplits(HiveMongoInputFormat.java:44)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:418)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:534)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1519)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:292)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: com.mongodb.DBObject
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 20 more
注意:
- 我已经(通过ssh)确认所有4个lib都放在了正确的文件夹中
- Mongo Hive连接器JAR似乎已加载,因为在此之前,我得到了另一个异常,并通过执行“ADD JAR…”进行了修复
- 我检查了mongo java驱动程序jar的内容。它似乎是有效的(我在里面找到了DBObject类)
如何修复或如何调试出错的地方?解决方案是将lib也放入
/home/hadoop/hive/lib
中。使用此脚本,它可以工作:
#!/bin/sh
wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-core-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-pig-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-hive-1.3.2.jar
cp /home/hadoop/lib/mongo* /home/hadoop/hive/lib