Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/mongodb/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pypark Can';无法连接到MongoDB,但命令行可以_Mongodb_Apache Spark_Ubuntu_Pyspark - Fatal编程技术网

Pypark Can';无法连接到MongoDB,但命令行可以

Pypark Can';无法连接到MongoDB,但命令行可以,mongodb,apache-spark,ubuntu,pyspark,Mongodb,Apache Spark,Ubuntu,Pyspark,正在尝试将MongoDB集合加载到PySpark数据帧中。首先我可以使用NameNode上的命令行进行连接: mongo mongodb://USER:PASSWORD@HOST/DB_NAME MongoDB shell version v3.6.3 connecting to: mongodb://HOST/DB_NAME MongoDB server version: 3.6.3 > 我在集群上运行脚本,如下所示: spark-submit \ --master yarn \

正在尝试将MongoDB集合加载到PySpark数据帧中。首先我可以使用NameNode上的命令行进行连接:

mongo mongodb://USER:PASSWORD@HOST/DB_NAME

MongoDB shell version v3.6.3
connecting to: mongodb://HOST/DB_NAME
MongoDB server version: 3.6.3
> 
我在集群上运行脚本,如下所示:

spark-submit \
--master yarn \
--deploy-mode client \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 3 \
--num-executors 10 \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1 \
load_from_mongo.py
现在我创建一个SparkSession:

  spark = SparkSession.builder \
            .appName("TestMongoLoad") \
            .config("spark.mongodb.input.uri", 'mongodb://USER:PASSWORD@HOST:27017') \
            .config("spark.mongodb.input.database", DB_NAME) \
            .config("spark.mongodb.input.collection", COLLECTION_NAME) \
            .getOrCreate()

然后我尝试在DataFame中阅读:

    df = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
              .load()
    df.show(5, truncate=False)
结果是它无法通过身份验证。很明显我传错了什么东西

Ivy Default Cache set to: /home/ubuntu/.ivy2/cache
The jars for the packages stored in: /home/ubuntu/.ivy2/jars
:: loading settings :: url = jar:file:/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-caedf270-dd43-42f2-a39e-e3d1b7134046;1.0
    confs: [default]
    found org.mongodb.spark#mongo-spark-connector_2.11;2.4.1 in central
    found org.mongodb#mongo-java-driver;3.10.2 in central
    [3.10.2] org.mongodb#mongo-java-driver;[3.10,3.11)
:: resolution report :: resolve 1129ms :: artifacts dl 4ms
    :: modules in use:
    org.mongodb#mongo-java-driver;3.10.2 from central in [default]
    org.mongodb.spark#mongo-spark-connector_2.11;2.4.1 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   2   |   1   |   0   |   0   ||   2   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-caedf270-dd43-42f2-a39e-e3d1b7134046
    confs: [default]
    0 artifacts copied, 2 already retrieved (0kB/7ms)
20/02/29 21:26:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "/home/ubuntu/server/load_from_mongo.py", line 124, in <module>
    main(args)
  File "/home/ubuntu/server/load_from_mongo.py", line 102, in main
    keyword_df = getKeywordCorpus(args.begin_dt, args.end_dt)
  File "/home/ubuntu/server/load_from_mongo.py", line 79, in getKeywordCorpus
    df = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
  File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load
  File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o57.load.
: com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='USER', source='admin', password=<hidden>, mechanismProperties={}}
    at com.mongodb.internal.connection.SaslAuthenticator.wrapException(SaslAuthenticator.java:173)
    at com.mongodb.internal.connection.SaslAuthenticator.access$300(SaslAuthenticator.java:40)
    at com.mongodb.internal.connection.SaslAuthenticator$1.run(SaslAuthenticator.java:70)
    at com.mongodb.internal.connection.SaslAuthenticator$1.run(SaslAuthenticator.java:47)
    at com.mongodb.internal.connection.SaslAuthenticator.doAsSubject(SaslAuthenticator.java:179)
    at com.mongodb.internal.connection.SaslAuthenticator.authenticate(SaslAuthenticator.java:47)
    at com.mongodb.internal.connection.InternalStreamConnectionInitializer.authenticateAll(InternalStreamConnectionInitializer.java:152)
    at com.mongodb.internal.connection.InternalStreamConnectionInitializer.initialize(InternalStreamConnectionInitializer.java:63)
    at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:127)
    at com.mongodb.internal.connection.UsageTrackingInternalConnection.open(UsageTrackingInternalConnection.java:50)
    at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.open(DefaultConnectionPool.java:390)
    at com.mongodb.internal.connection.DefaultConnectionPool.get(DefaultConnectionPool.java:106)
    at com.mongodb.internal.connection.DefaultConnectionPool.get(DefaultConnectionPool.java:92)
    at com.mongodb.internal.connection.DefaultServer.getConnection(DefaultServer.java:85)
    at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:115)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:212)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:206)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:116)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:109)
    at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:56)
    at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:179)
    at com.mongodb.client.internal.MongoDatabaseImpl.executeCommand(MongoDatabaseImpl.java:184)
    at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:153)
    at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:148)
    at com.mongodb.spark.MongoConnector$$anonfun$1.apply(MongoConnector.scala:237)
    at com.mongodb.spark.MongoConnector$$anonfun$1.apply(MongoConnector.scala:237)
    at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:174)
    at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:174)
    at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:157)
    at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:174)
    at com.mongodb.spark.MongoConnector.hasSampleAggregateOperator(MongoConnector.scala:237)
    at com.mongodb.spark.rdd.MongoRDD.hasSampleAggregateOperator$lzycompute(MongoRDD.scala:221)
    at com.mongodb.spark.rdd.MongoRDD.hasSampleAggregateOperator(MongoRDD.scala:221)
    at com.mongodb.spark.sql.MongoInferSchema$.apply(MongoInferSchema.scala:68)
    at com.mongodb.spark.sql.DefaultSource.constructRelation(DefaultSource.scala:97)
    at com.mongodb.spark.sql.DefaultSource.createRelation(DefaultSource.scala:50)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.mongodb.MongoCommandException: Command failed with error 18 (AuthenticationFailed): 'Authentication failed.' on server HOST:27017. The full response is {"ok": 0.0, "errmsg": "Authentication failed.", "code": 18, "codeName": "AuthenticationFailed"}
    at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:179)
    at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:299)
    at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:255)
    at com.mongodb.internal.connection.CommandHelper.sendAndReceive(CommandHelper.java:83)
    at com.mongodb.internal.connection.CommandHelper.executeCommand(CommandHelper.java:33)
    at com.mongodb.internal.connection.SaslAuthenticator.sendSaslStart(SaslAuthenticator.java:130)
    at com.mongodb.internal.connection.SaslAuthenticator.access$100(SaslAuthenticator.java:40)
    at com.mongodb.internal.connection.SaslAuthenticator$1.run(SaslAuthenticator.java:54)
    ... 48 more
Ivy默认缓存设置为:/home/ubuntu/.ivy2/Cache
存储在:/home/ubuntu/.ivy2/jars中的包的JAR
::加载设置::url=jar:file:/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/jars/ivy-2.4.0.jar/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11作为依赖项添加
::解析依赖项::org.apache.spark#spark-submit-parent-caedf270-dd43-42f2-a39e-e3d1b7134046;1
confs:[默认值]
找到org.mongodb.spark#mongo-spark-connector2.11;2.4.1在中环
找到org.mongodb#mongo java驱动程序;3.10.2在中环
[3.10.2]org.mongodb#mongo java驱动程序;[3.10,3.11)
::解析报告::解析1129ms::工件dl 4ms
::正在使用的模块:
org.mongodb#mongo java驱动程序;3.10.2,来自[默认]中的central
org.mongodb.spark#mongo-spark-connector_2.11;2.4.1来自[默认]中的central
---------------------------------------------------------------------
||模块| |工件|
|形态|编号|搜索| dwnlded |驱逐|编号| dwnlded|
---------------------------------------------------------------------
|默认值| 2 | 1 | 0 | 0 | 2 | 0|
---------------------------------------------------------------------
::检索::org.apache.spark#spark-submit-parent-caedf270-dd43-42f2-a39e-e3d1b7134046
confs:[默认值]
已复制0个工件,已检索2个(0 KB/7毫秒)
20/02/29 21:26:24警告NativeCodeLoader:无法为您的平台加载本机hadoop库…在适用的情况下使用内置java类
回溯(最近一次呼叫最后一次):
文件“/home/ubuntu/server/load_from_mongo.py”,第124行,在
主(args)
文件“/home/ubuntu/server/load_from_mongo.py”,第102行,主视图
关键字_df=getKeywordCorpus(args.begin\u dt,args.end\u dt)
文件“/home/ubuntu/server/load_from_mongo.py”,第79行,在getKeywordCorpus中
df=spark.read.format(“com.mongodb.spark.sql.DefaultSource”)\
文件“/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py”,第172行,已加载
文件“/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,第1257行,在__
文件“/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py”,第63行,deco格式
文件“/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,第328行,在get_return_值中
py4j.protocol.Py4JJavaError:调用o57.load时出错。
:com.mongodb.MongoSecurityException:MongoCredential{mechanism=SCRAM-SHA-1,userName='USER',source='admin',password=,mechanismProperties={}验证异常
位于com.mongodb.internal.connection.SaslAuthenticator.wrapException(SaslAuthenticator.java:173)
访问com.mongodb.internal.connection.SaslAuthenticator.access$300(SaslAuthenticator.java:40)
位于com.mongodb.internal.connection.SaslAuthenticator$1.run(SaslAuthenticator.java:70)
位于com.mongodb.internal.connection.SaslAuthenticator$1.run(SaslAuthenticator.java:47)
位于com.mongodb.internal.connection.SaslAuthenticator.doAssObject(SaslAuthenticator.java:179)
位于com.mongodb.internal.connection.SaslAuthenticator.authenticate(SaslAuthenticator.java:47)
位于com.mongodb.internal.connection.InternalStreamConnectionInitializer.authenticateAll(InternalStreamConnectionInitializer.java:152)
位于com.mongodb.internal.connection.InternalStreamConnectionInitializer.initialize(InternalStreamConnectionInitializer.java:63)
位于com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:127)
在com.mongodb.internal.connection.UsageTrackingInternalConnection.open上(UsageTrackingInternalConnection.java:50)
位于com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.open(DefaultConnectionPool.java:390)
位于com.mongodb.internal.connection.DefaultConnectionPool.get(DefaultConnectionPool.java:106)
位于com.mongodb.internal.connection.DefaultConnectionPool.get(DefaultConnectionPool.java:92)
在com.mongodb.internal.connection.DefaultServer.getConnection(DefaultServer.java:85)上
在com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:115)上
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:212)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:206)
位于com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:116)
在com.mongodb.operation.CommandOperationHelper.ExecuteWrappedCommand协议(CommandOperationHelper.java:109)中
在com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:56)
位于com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:179)
位于com.mongodb.client.internal.MongoDatabaseImpl.executeCommand(MongoDatabaseImpl.java:184)
位于com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:153)
位于com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:148)
在com.mongodb.spark.MongoConnector$$anonfun$1.apply上(MongoConnector.scala:237)
在com.mongodb.spark.MongoConnector$$anonfun$1.apply上(MongoConnector.scala:237)
在com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply上(MongoConnector.scala:174)
在com.mongodb.spa
  MONGO_URL = "mongodb://USER:PASSWORD@HOST:27017/DB_NAME"

  spark = SparkSession.builder \
            .appName('TestMongoLoad') \
            .config('spark.mongodb.input.uri', MONGO_URL) \
            .config('spark.mongodb.input.collection', COLLECTION) \
            .getOrCreate()