Google cloud storage CloudDataProc可以';t访问云存储桶
我有一个CloudDataProc Spark作业,它还使用Drvier端的cloud Strage API(从同一文件夹中选择要使用的特定文件) 以下是maven依赖项:Google cloud storage CloudDataProc可以';t访问云存储桶,google-cloud-storage,google-cloud-dataproc,Google Cloud Storage,Google Cloud Dataproc,我有一个CloudDataProc Spark作业,它还使用Drvier端的cloud Strage API(从同一文件夹中选择要使用的特定文件) 以下是maven依赖项: <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artif
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.4</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.101.0</version>
</dependency>
</dependencies>
这里是stacktrace:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
at com.google.api.gax.retrying.BasicRetryingFuture.<init>(BasicRetryingFuture.java:84)
at com.google.api.gax.retrying.DirectRetryingExecutor.createFuture(DirectRetryingExecutor.java:88)
at com.google.api.gax.retrying.DirectRetryingExecutor.createFuture(DirectRetryingExecutor.java:74)
at com.google.cloud.RetryHelper.run(RetryHelper.java:75)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:372)
at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:328)
--> at ai.mandal.cloud.dataproc.Test$.main(Test.scala:14)
at ai.mandal.cloud.dataproc.Test.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
线程“main”java.lang.NoSuchMethodError中的异常:com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
位于com.google.api.gax.retrying.BasicRetryingFuture(BasicRetryingFuture.java:84)
在com.google.api.gax.retrying.DirectRetryingExecutor.createFuture(DirectRetryingExecutor.java:88)
在com.google.api.gax.retrying.DirectRetryingExecutor.createFuture(DirectRetryingExecutor.java:74)
位于com.google.cloud.RetryHelper.run(RetryHelper.java:75)
位于com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
位于com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:372)
位于com.google.cloud.storage.StorageImpl.list(StorageImpl.java:328)
-->位于ai.mandal.cloud.dataproc.Test$.main(Test.scala:14)
位于ai.mandal.cloud.dataproc.Test.main(Test.scala)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
位于java.lang.reflect.Method.invoke(Method.java:498)
位于org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
位于org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
位于org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
位于org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
位于org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
位于org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
位于org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
位于org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我的问题通常是什么会导致它,而且如果我从dataproc服务(可以访问bucket)运行它,我是否需要为此配置单独的凭据。解决方案是添加
spark.executor.userClassPathFirst = true
spark.driver.userClassPathFirst = true
创建作业属性
问题是由谷歌云存储
中的guava版本与主机环境冲突引起的
谷歌建议在您的依赖项中隐藏冲突的番石榴,我也尝试过,但在这种情况下不起作用。我还添加了最新的番石榴依赖项,没有帮助。
spark.executor.userClassPathFirst = true
spark.driver.userClassPathFirst = true