Apache spark 雪花AWS胶水集成问题

Apache spark 雪花AWS胶水集成问题,apache-spark,pyspark,aws-glue,snowflake-cloud-data-platform,Apache Spark,Pyspark,Aws Glue,Snowflake Cloud Data Platform,我面临着运行胶水作业和出错的问题。我以下链接作为参考 使用的罐子: s3://Bucket/GlueJars/snowflake-jdbc-3.9.1.jar s3://bucket/GlueJars/spark-snowflake_2.12-2.5.9-spark_2.4.jar 雪花版本:选择当前版本() 4.5.2 问题的根本原因是什么 来自Glue的错误日志: py4j.protocol.Py4JJavaError: An error occurred while calling

我面临着运行胶水作业和出错的问题。我以下链接作为参考

使用的罐子:

  • s3://Bucket/GlueJars/snowflake-jdbc-3.9.1.jar

  • s3://bucket/GlueJars/spark-snowflake_2.12-2.5.9-spark_2.4.jar

雪花版本:选择当前版本()

  • 4.5.2
问题的根本原因是什么

来自Glue的错误日志:

py4j.protocol.Py4JJavaError: An error occurred while calling o75.load.

: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V

at net.snowflake.spark.snowflake.Parameters$MergedParameters.<init>(Parameters.scala:263)

at net.snowflake.spark.snowflake.Parameters$.mergeParameters(Parameters.scala:257)

at net.snowflake.spark.snowflake.DefaultSource.createRelation(DefaultSource.scala:59)

at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)

at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:282)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)



2020-02-22 12:55:29,258 ERROR [Driver] ya

rn.ApplicationMaster (Logging.scala:logError(70)) - User application exited with status 1

2020-02-22 12:55:29,258 INFO [Driver] yarn.ApplicationMaster (Logging.scala:logInfo(54)) - Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)

2020-02-22 12:55:29,261 INFO [pool-4-thread-1] spark.SparkContext (Logging.scala:logInfo(54)) - Invoking stop() from shutdown hook

2020-02-22 12:55:29,264 INFO [pool-4-thread-1] server.AbstractConnector (AbstractConnector.java:doStop(318)) - Stopped Spark@dc562d7{HTTP/1.1,[http/1.1]}{0.0.0.0:0}

2020-02-22 12:55:29,266 INFO [pool-4-thread-1] ui.SparkUI (Logging.scala:logInfo(54)) - Stopped Spark web UI at http://ip-172-32-153-220.ec2.internal:41151

2020-02-22 12:55:29,268 INFO [dispatcher-event-loop-1] yarn.YarnAllocator (Logging.scala:logInfo(54)) - Driver requested a total number of 0 executor(s).

2020-02-22 12:55:29,269 INFO [pool-4-thread-1] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(54)) - Shutting down all executors

2020-02-22 12:55:29,269 INFO [dispatcher-event-loop-2] cluster.YarnSchedulerBackend$YarnDriverEndpoint (Logging.scala:logInfo(54)) - Asking each executor to shut down

2020-02-22 12:55:29,272 INFO [pool-4-thread-1] cluster.SchedulerExtensionServices (Logging.scala:logInfo(54)) - Stopping SchedulerExtensionServices

(serviceOption=None,

services=List(),

started=false)

2020-02-22 12:55:29,275 INFO [dispatcher-event-loop-2] spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(54)) - MapOutputTrackerMasterEndpoint stopped!

2020-02-22 12:55:29,284 INFO [pool-4-thread-1] memory.MemoryStore (Logging.scala:logInfo(54)) - MemoryStore cleared

2020-02-22 12:55:29,284 INFO [pool-4-thread-1] storage.BlockManager (Logging.scala:logInfo(54)) - BlockManager stopped

2020-02-22 12:55:29,285 INFO [pool-4-thread-1] storage.BlockManagerMaster (Logging.scala:logInfo(54)) - BlockManagerMaster stopped

2020-02-22 12:55:29,287 INFO [dispatcher-event-loop-3] scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(54)) - OutputCommitCoordinator stopped!

2020-02-22 12:55:29,290 INFO [pool-4-thread-1] spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped SparkContext

2020-02-22 12:55:29,291 INFO [pool-4-thread-1] yarn.ApplicationMaster (Logging.scala:logInfo(54)) - Unregistering ApplicationMaster with FAILED (diag message: User application exited with status 1)

2020-02-22 12:55:29,300 INFO [pool-4-thread-1] impl.AMRMClientImpl (AMRMClientImpl.java:unregisterApplicationMaster(476)) - Waiting for application to be successfully unregistered.

2020-02-22 12:55:29,401 INFO [pool-4-thread-1] yarn.ApplicationMaster (Logging.scala:logInfo(54)) - Deleting staging directory hdfs://ip-172-32-141-92.ec2.internal:8020/user/root/.sparkStaging/application_1582375951668_0001

2020-02-22 12:55:29,411 INFO [pool-4-thread-1] util.ShutdownHookManager (Logging.scala:logInfo(54)) - Shutdown hook called

2020-02-22 12:55:29,412 INFO [pool-4-thread-1] util.ShutdownHookManager (Logging.scala:logInfo(54)) - Deleting directory /mnt/yarn/usercache/root/appcache/application_1582375951668_0001/spark-9c440196-6577-41ee-bf58-dfc1dcb6726a

2020-02-22 12:55:29,414 INFO [pool-4-thread-1] util.ShutdownHookManager (Logging.scala:logInfo(54)) - Deleting directory /mnt/yarn/usercache/root/appcache/application_1582375951668_0001/spark-9c440196-6577-41ee-bf58-dfc1dcb6726a/pyspark-c95bbd70-903e-488a-8bda-15c23b4bfe4f

End of LogType:stdout
py4j.protocol.Py4JJavaError:调用o75.load时出错。
:java.lang.NoSuchMethodError:scala.Product.$init$(Lscala/Product;)V
net.snowflake.spark.snowflake.Parameters$MergedParameters.(Parameters.scala:263)
net.snowflake.spark.snowflake.Parameters$.mergeParameters(Parameters.scala:257)
在net.snowflake.spark.snowflake.DefaultSource.createRelation(DefaultSource.scala:59)中
位于org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
位于org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
位于java.lang.reflect.Method.invoke(Method.java:498)
位于py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
位于py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
在py4j.Gateway.invoke处(Gateway.java:282)
位于py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
在py4j.commands.CallCommand.execute(CallCommand.java:79)
在py4j.GatewayConnection.run处(GatewayConnection.java:238)
运行(Thread.java:748)
2020-02-22 12:55:29258错误[司机]雅
rn.ApplicationMaster(Logging.scala:logError(70))-用户应用程序已退出,状态为1
2020-02-22 12:55:29258信息[Driver]warn.ApplicationMaster(Logging.scala:logInfo(54))-最终应用程序状态:失败,退出代码:1,(原因:用户应用程序以状态1退出)
2020-02-22 12:55:29261信息[pool-4-thread-1]spark.SparkContext(Logging.scala:logInfo(54))-从shutdown hook调用stop()
2020-02-22 12:55:29264信息[pool-4-thread-1]server.AbstractConnector(AbstractConnector.java:doStop(318))-已停止Spark@dc562d7{HTTP/1.1[HTTP/1.1]}{0.0.0.0:0}
2020-02-22 12:55:29266信息[pool-4-thread-1]ui.SparkUI(Logging.scala:logInfo(54))-在http://ip-172-32-153-220.ec2.internal:41151
2020-02-22 12:55:29268信息[dispatcher-event-loop-1]warn.YarnAllocator(Logging.scala:logInfo(54))-驱动程序请求的执行器总数为0。
2020-02-22 12:55:29269信息[pool-4-thread-1]cluster.yanclusterschedulerbackend(Logging.scala:logInfo(54))-关闭所有执行器
2020-02-22 12:55:29269信息[dispatcher-event-loop-2]集群.YarnSchedulerBackend$yardriverendpoint(Logging.scala:logInfo(54))-要求每个执行器关闭
2020-02-22 12:55:29272信息[pool-4-thread-1]cluster.SchedulerExtensionServices(Logging.scala:logInfo(54))-停止SchedulerExtensionServices
(serviceOption=None,
服务=列表(),
已启动=错误)
2020-02-22 12:55:29275信息[dispatcher-event-loop-2]spark.mapoutrackermasterendpoint(Logging.scala:logInfo(54))-mapoutrackermasterendpoint已停止!
2020-02-22 12:55:29284信息[pool-4-thread-1]memory.MemoryStore(Logging.scala:logInfo(54))-已清除MemoryStore
2020-02-22 12:55:29284信息[pool-4-thread-1]storage.BlockManager(Logging.scala:logInfo(54))-BlockManager已停止
2020-02-22 12:55:29285信息[pool-4-thread-1]storage.BlockManagerMaster(Logging.scala:logInfo(54))-BlockManagerMaster已停止
2020-02-22 12:55:29287信息[dispatcher-event-loop-3]调度器OutputCommitCoordinator$OutputCommitCoordinatorEndpoint(Logging.scala:logInfo(54))-OutputCommitCoordinator已停止!
2020-02-22 12:55:29290信息[pool-4-thread-1]spark.SparkContext(Logging.scala:logInfo(54))-已成功停止SparkContext
2020-02-22 12:55:29291信息[pool-4-thread-1]warn.ApplicationMaster(Logging.scala:logInfo(54))-注销ApplicationMaster失败(diag消息:用户应用程序已退出,状态为1)
2020-02-22 12:55:29300信息[pool-4-thread-1]impl.amrmclientmpl(amrmclientmpl.java:unregisterApplicationMaster(476))-等待成功注销应用程序。
2020-02-22 12:55:29401信息[pool-4-thread-1]warn.ApplicationMaster(Logging.scala:logInfo(54))-删除暂存目录hdfs://ip-172-32-141-92.ec2.internal:8020/user/root/.sparkStaging/application_1582375951668_0001
2020-02-22 12:55:29411信息[pool-4-thread-1]util.ShutdownHookManager(Logging.scala:logInfo(54))-调用了shutdownhook
2020-02-22 12:55:29412信息[pool-4-thread-1]util.ShutdownHookManager(Logging.scala:logInfo(54))-删除目录/mnt/thread/usercache/root/appcache/application\u 1582375951668\u 0001/spark-9c440196-6577-41ee-bf58-dfc1dcb6726a
2020-02-22 12:55:29414信息[pool-4-thread-1]util.ShutdownHookManager(Logging.scala:logInfo(54))-删除目录/mnt/thread/usercache/root/appcache/application\u 15823751668\u 0001/spark-9c440196-6577-41ee-bf58-dfc1dcb6726a/pyspark-c95bbd70-903e-488a-8bda-15c23b4bfe4f
日志类型结束:标准输出

当它试图读取雪花时,似乎失败了。这可能有几个原因。我建议您启用连续日志记录,重新运行作业并获取更详细的错误消息:谢谢。启用连续:下面的登录是来自AWS Glue的驱动程序日志..似乎与http客户端有关..2020年2月23日,9:39:21 AM 20/02/23 04:09:21警告ApacheUtils:NoSuchMethodException在禁用normalizeUri时引发。这表示您正在使用Apache http客户端的旧版本(<4.5.8)。建议使用http客户端版本>=4.5.9,以避免引入的破坏性更改