Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon s3 PySpark将数据帧分区写入S3_Amazon S3_Pyspark_Apache Spark Sql - Fatal编程技术网

Amazon s3 PySpark将数据帧分区写入S3

Amazon s3 PySpark将数据帧分区写入S3,amazon-s3,pyspark,apache-spark-sql,Amazon S3,Pyspark,Apache Spark Sql,我一直在尝试对S3进行分区并将spark数据帧写入S3,但我遇到了一个错误 df.write.partitionBy("year","month").mode("append")\ .parquet('s3a://bucket_name/test_folder/') 错误消息是: Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3,

我一直在尝试对S3进行分区并将spark数据帧写入S3,但我遇到了一个错误

df.write.partitionBy("year","month").mode("append")\
    .parquet('s3a://bucket_name/test_folder/')
错误消息是:

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: 
Status Code: 403, AWS Service: Amazon S3, AWS Request ID: xxxxxx, 
AWS Error Code: SignatureDoesNotMatch, 
AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method.
然而,当我只写而不分区时,确实有效

df.write.mode("append").parquet('s3a://bucket_name/test_folder/')

是什么导致了这个问题?

我在spark submit中通过从
aws java sdk:1.7.4
升级到
aws java sdk:1.11.199
hadoop aws:2.7.7
hadoop aws:3.0.0
解决了这个问题

我在python文件中使用以下方法设置此选项:

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.11.199,org.apache.hadoop:hadoop-aws:3.0.0 pyspark-shell
但您也可以将它们作为参数提供,以便直接提交

我必须重新构建Spark,提供我自己的Hadoop 3.0.0版本,以避免依赖冲突

你可以在这里阅读我关于根本原因的一些推测: