Apache spark 无法使用mongo Db api连接到Azure Cosmos Db
我正在尝试使用mongo db api(spark mongo db connector)连接azure cosmos db,以将数据导出到hdfs,但出现以下异常: 以下是完整的堆栈跟踪:Apache spark 无法使用mongo Db api连接到Azure Cosmos Db,apache-spark,azure-cosmosdb,azure-cosmosdb-mongoapi,azure-databricks,Apache Spark,Azure Cosmosdb,Azure Cosmosdb Mongoapi,Azure Databricks,我正在尝试使用mongo db api(spark mongo db connector)连接azure cosmos db,以将数据导出到hdfs,但出现以下异常: 以下是完整的堆栈跟踪: { "_t" : "OKMongoResponse", "ok" : 0, "code" : 115, "errmsg" : "Command is not supported", "$err" : "Command is not supported" } at com.mongodb.connection
{ "_t" : "OKMongoResponse", "ok" : 0, "code" : 115, "errmsg" : "Command is not supported", "$err" : "Command is not supported" }
at com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:115)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:107)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:187)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:179)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:92)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:85)
at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55)
at com.mongodb.Mongo.execute(Mongo.java:810)
at com.mongodb.Mongo$2.execute(Mongo.java:797)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:137)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:131)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at scala.util.Try$.apply(Try.scala:192)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner.partitions(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:137)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:182)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
{“\u t”:“OkmongorResponse”,“ok”:0,“code”:115,“errmsg”:“不支持命令”,“$err”:“不支持命令”}
位于com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:115)
位于com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:107)
在com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
位于com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
在com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)上
在com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)上
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:187)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:179)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:92)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:85)
在com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55)
位于com.mongodb.Mongo.execute(Mongo.java:810)
位于com.mongodb.Mongo$2.execute(Mongo.java:797)
位于com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:137)
位于com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:131)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
在scala.util.Try$.apply(Try.scala:192)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:76)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:75)
在com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply上(MongoConnector.scala:171)
在com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply上(MongoConnector.scala:171)
位于com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
位于com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
位于com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner.partitions(MongoSplitVectorPartitioner.scala:75)
位于com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:137)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.sql.Dataset(Dataset.scala:182)
位于org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
位于org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
位于org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
添加了Maven依赖项:
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>2.2.0</version>
</dependency>
org.mongodb.spark
mongo-spark-connector_2.11
2.2.0
代码:
SparkSession spark = SparkSession.builder()
.getOrCreate();
jsc = new JavaSparkContext(spark.sparkContext());
HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(jsc);
Dataset<Row> implicitDS = MongoSpark.load(jsc).toDF();
SparkSession spark=SparkSession.builder()
.getOrCreate();
jsc=newJavaSparkContext(spark.sparkContext());
HiveContext HiveContext=neworg.apache.spark.sql.hive.HiveContext(jsc);
Dataset implicitDS=MongoSpark.load(jsc.toDF();
供参考:
implicitDS.count()给出0
我正在使用MongoSplitVectorPartitioner。使用完整的stacktrace更新。请粘贴用于连接的代码,如果使用MongoSharedPartition,则我的应用程序(spark作业)运行正常,但没有数据导出到HDFS。看起来您使用的命令之一不受支持。Mongo中的一些方法子集在Cosmos上并不完全受支持。接触AskCosmosDB@microsoft.comMongo API工程团队可以帮助您。谢谢。但有一个问题。我们是否可以使用spark azure connector在java中导出数据。CosmosDB与MongoDB是不同的实现,因此服务器命令和行为可能不同。看起来不支持
splitVector
命令,因此需要选择不同的分区方法。我不希望MongoSharedPartioner能工作,因为Cosmos使用不同的分区方案。尝试MongoSamplePartitioner
。如果这不起作用,MongoPaginateByCountPartitioner
或MongoPaginateBySizePartitioner
应该是可能的(但速度较慢)选项。