403使用hadoop aws和s3proxy

403使用hadoop aws和s3proxy,hadoop,amazon-s3,aws-sdk,s3proxy,Hadoop,Amazon S3,Aws Sdk,S3proxy,我试图在Hadoop3.1.2中使用一个403错误,而在Hadoop2.7中它似乎可以工作。这是我的倒退还是误解 s3proxy配置 s3proxy.endpoint=http://127.0.0.1:4242 s3proxy.authorization=aws-v2 s3proxy.identity=local-identity s3proxy.credential=local-credential jclouds.provider=filesystem jclouds.filesystem.

我试图在Hadoop3.1.2中使用一个403错误,而在Hadoop2.7中它似乎可以工作。这是我的倒退还是误解

s3proxy配置

s3proxy.endpoint=http://127.0.0.1:4242
s3proxy.authorization=aws-v2
s3proxy.identity=local-identity
s3proxy.credential=local-credential
jclouds.provider=filesystem
jclouds.filesystem.basedir=/tmp/s3proxy
简单scala代码

sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")
sc.hadoopConfiguration.set("fs.s3a.endpoint", "127.0.0.1:4242")
sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled", "false")
sc.hadoopConfiguration.set("fs.s3a.access.key", "local-identity")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "local-credential")
val rdd = sc.textFile("s3a://wiki/test.json")
rdd.collect().foreach(println)
使用
tree/tmp/s3proxy

/tmp/s3proxy/
└── wiki
    └── test.json
使用hadoop2.7,我得到了正确的输出

cat demo_wiki/test01.scala | ./spark-2.4.4-bin-hadoop2.7/bin/spark-shell --jars hadoop-aws-2.7.3.jar,aws-java-sdk-1.7.4.jar
但是在hadoop3.1.2中,我得到了一个403错误

cat demo_wiki/test01.scala | ./spark-2.4.4-bin-hadoop3.1/bin/spark-shell --jars hadoop-aws-3.1.2.jar,aws-java-sdk-bundle-1.11.271.jar
输出:

2019-09-23 15:23:32 WARN  MetricsConfig:134 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
java.nio.file.AccessDeniedException: s3a://wiki/test.json: getFileStatus on s3a://wiki/test.json: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 4442587FB7D0A2F9; S3 Extended Request ID: null), S3 Extended Request ID: null:403 Forbidden

有什么想法吗?

在Dockerfile()中找到了答案

正确的s3proxy配置是

s3proxy.endpoint=http://127.0.0.1:4242
s3proxy.authorization=aws-v2-or-v4
s3proxy.identity=local-identity
s3proxy.credential=local-credential
jclouds.provider=filesystem
jclouds.filesystem.basedir=/tmp/s3proxy
而使用spark-2.4.4+hadoop-3.2.0+aws-sdk-1.11.636的最新(当时)版本的命令行运行良好

cat test.scala | ./spark-2.4.4-bin-hadoop3.2/bin/spark-shell --jars hadoop-aws-3.2.0.jar,aws-java-sdk-bundle-1.11.636.jar