Scala Spark S3 I/O-[S3ServiceException]S3头请求失败

Scala Spark S3 I/O-[S3ServiceException]S3头请求失败,scala,apache-spark,amazon-s3,apache-spark-sql,spark-dataframe,Scala,Apache Spark,Amazon S3,Apache Spark Sql,Spark Dataframe,我想保存并读取AWS S3中的Spark数据帧。我在谷歌上搜索了很多,但没发现有什么用处 我编写的代码如下所示: val spark = SparkSession.builder().master("local").appName("test").getOrCreate() spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "**********") spark.sparkContext.hadoopConf

我想保存并读取AWS S3中的Spark数据帧。我在谷歌上搜索了很多,但没发现有什么用处

我编写的代码如下所示:

val spark = SparkSession.builder().master("local").appName("test").getOrCreate()

spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "**********")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "********************")

import spark.implicits._

spark.read.textFile("s3n://myBucket/testFile").show(false)

List(1,2,3,4).toDF.write.parquet("s3n://myBucket/test/abc.parquet")
但当运行它时,我得到以下错误:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/myBucket/testFile' - ResponseCode=403, ResponseMessage=Forbidden
[info]   at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:245)
[info]   at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:119)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:498)
[info]   at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
[info]   at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
[info]   at org.apache.hadoop.fs.s3native.$Proxy15.retrieveMetadata(Unknown Source)
[info]   at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
[info]   ...
[info]   Cause: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/myBucket/testFile' - ResponseCode=403, ResponseMessage=Forbidden
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:477)
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:718)
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1599)
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1535)
[info]   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1987)
[info]   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1332)
[info]   at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   ...
[info]   Cause: org.jets3t.service.impl.rest.HttpException:
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:475)
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:718)
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1599)
[info]   at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1535)
[info]   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1987)
[info]   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1332)
[info]   at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   ...
我正在使用

  • 火花:2.1.0
  • Scala:2.11.2
  • AWS Java SDK:1.11.126

感谢您的帮助

使用“spark.hadoop.fs.s3n…”之类的选项在spark配置文件中设置机密,以便spark通过响应中的工作传播机密,aws java sdk的版本为1.7.4。我可以使用
aws java sdk
v1.11.126吗,因为在较新版本的
aws java sdk
中有许多功能在较旧的v1.7.4中不可用?当然可以使用,但如果旧的未来不存在的话。那么这可能是一个问题,你在谈论哪一个旧的未来?如果我们使用v1.7.4版本的futures,而它在v1.11.126中没有出现,那么这将是一个问题,你不能将Hadoop 2.7.3与任何其他版本的AWS SDK混合,除了它构建时使用的版本。如果您尝试这样做,您将看到许多堆栈跟踪。
I have tried following things on spark version 2.1.1 and its worked fine for me.

Step 1: Download following jars:
    -- hadoop-aws-2.7.3.jar
    -- aws-java-sdk-1.7.4.jar
    Note:
      If you not able to find the following jars, then you can get the jars from hadoop-2.7.3 
Step 2: Place the above jars into $SPARK_HOME/jars/

Step 3: code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
val conf = new SparkConf().setMaster("local").setAppName("My App")
val sc = new SparkContext(conf)
sc.getOrCreate.hadoopConfiguration.set("fs.s3a.access.key"", "***********")
sc.getOrCreate.hadoopConfiguration.set("fs.s3a.secret.key", "******************")
val input = sc.textFile("s3a://mybucket/*.txt")
List(1,2,3,4).toDF.write.parquet("s3a://mybucket/abc.parquet")