Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 无法使用mongo Db api连接到Azure Cosmos Db_Apache Spark_Azure Cosmosdb_Azure Cosmosdb Mongoapi_Azure Databricks - Fatal编程技术网

Apache spark 无法使用mongo Db api连接到Azure Cosmos Db

Apache spark 无法使用mongo Db api连接到Azure Cosmos Db,apache-spark,azure-cosmosdb,azure-cosmosdb-mongoapi,azure-databricks,Apache Spark,Azure Cosmosdb,Azure Cosmosdb Mongoapi,Azure Databricks,我正在尝试使用mongo db api(spark mongo db connector)连接azure cosmos db,以将数据导出到hdfs,但出现以下异常: 以下是完整的堆栈跟踪: { "_t" : "OKMongoResponse", "ok" : 0, "code" : 115, "errmsg" : "Command is not supported", "$err" : "Command is not supported" } at com.mongodb.connection

我正在尝试使用mongo db api(spark mongo db connector)连接azure cosmos db,以将数据导出到hdfs,但出现以下异常:

以下是完整的堆栈跟踪:

{ "_t" : "OKMongoResponse", "ok" : 0, "code" : 115, "errmsg" : "Command is not supported", "$err" : "Command is not supported" }
at com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:115)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:107)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:187)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:179)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:92)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:85)
at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55)
at com.mongodb.Mongo.execute(Mongo.java:810)
at com.mongodb.Mongo$2.execute(Mongo.java:797)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:137)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:131)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at scala.util.Try$.apply(Try.scala:192)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner.partitions(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:137)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:182)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
{“\u t”:“OkmongorResponse”,“ok”:0,“code”:115,“errmsg”:“不支持命令”,“$err”:“不支持命令”}
位于com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:115)
位于com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:107)
在com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
位于com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
在com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)上
在com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)上
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:187)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:179)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:92)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:85)
在com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55)
位于com.mongodb.Mongo.execute(Mongo.java:810)
位于com.mongodb.Mongo$2.execute(Mongo.java:797)
位于com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:137)
位于com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:131)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
在scala.util.Try$.apply(Try.scala:192)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:76)
在com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:75)
在com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply上(MongoConnector.scala:171)
在com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply上(MongoConnector.scala:171)
位于com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
位于com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
位于com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner.partitions(MongoSplitVectorPartitioner.scala:75)
位于com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:137)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
位于org.apache.spark.sql.Dataset(Dataset.scala:182)
位于org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
位于org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
位于org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
添加了Maven依赖项:

<dependency>
        <groupId>org.mongodb.spark</groupId>
        <artifactId>mongo-spark-connector_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>

org.mongodb.spark
mongo-spark-connector_2.11
2.2.0
代码:

SparkSession spark = SparkSession.builder()
                .getOrCreate();

        jsc = new JavaSparkContext(spark.sparkContext());
        HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(jsc);
        Dataset<Row> implicitDS = MongoSpark.load(jsc).toDF();
SparkSession spark=SparkSession.builder()
.getOrCreate();
jsc=newJavaSparkContext(spark.sparkContext());
HiveContext HiveContext=neworg.apache.spark.sql.hive.HiveContext(jsc);
Dataset implicitDS=MongoSpark.load(jsc.toDF();
供参考:

implicitDS.count()给出0


我正在使用MongoSplitVectorPartitioner。使用完整的stacktrace更新。

请粘贴用于连接的代码,如果使用MongoSharedPartition,则我的应用程序(spark作业)运行正常,但没有数据导出到HDFS。看起来您使用的命令之一不受支持。Mongo中的一些方法子集在Cosmos上并不完全受支持。接触AskCosmosDB@microsoft.comMongo API工程团队可以帮助您。谢谢。但有一个问题。我们是否可以使用spark azure connector在java中导出数据。CosmosDB与MongoDB是不同的实现,因此服务器命令和行为可能不同。看起来不支持
splitVector
命令,因此需要选择不同的分区方法。我不希望MongoSharedPartioner能工作,因为Cosmos使用不同的分区方案。尝试
MongoSamplePartitioner
。如果这不起作用,
MongoPaginateByCountPartitioner
MongoPaginateBySizePartitioner
应该是可能的(但速度较慢)选项。