Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon web services 来自Spark的S3写入间歇性失败,错误代码404 NoSuchKey_Amazon Web Services_Apache Spark_Amazon S3_Amazon Emr - Fatal编程技术网

Amazon web services 来自Spark的S3写入间歇性失败,错误代码404 NoSuchKey

Amazon web services 来自Spark的S3写入间歇性失败,错误代码404 NoSuchKey,amazon-web-services,apache-spark,amazon-s3,amazon-emr,Amazon Web Services,Apache Spark,Amazon S3,Amazon Emr,我的spark作业(EMR)每5分钟写入一次s3,但每天都有几次写入失败,只有以下例外。你知道这是什么原因吗 代码: 例外情况: org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWri

我的spark作业(EMR)每5分钟写入一次s3,但每天都有几次写入失败,只有以下例外。你知道这是什么原因吗

代码:

例外情况:

org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: ABC4A49392884812; S3 Extended Request ID: ed+11111OKdq5jh8qY65Atc5Q33FHxy7Lzm0aWwxiHL9hhOxN+28eEPZobvLveGoJnzCoSEXsH4=), S3 Extended Request ID: ed+11111OKdq5jh8qY65Atc5Q33FHxy7Lzm0aWwxiHL9hhOxN+28eEPZobvLveGoJnzCoSEXsH4=
at com.amazon.ws.emr.hadoop.fs.s3.MultipartCopyManager.copy(MultipartCopyManager.java:180)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.copy(Jets3tNativeFileSystemStore.java:425)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.rename(S3NativeFileSystem.java:1300)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.rename(EmrFileSystem.java:321)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:457)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:471)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:582)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:545)
at com.amazon.emr.committer.FilterParquetOutputCommitter.commitTask(FilterParquetOutputCommitter.java:119)
at com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter.commitTask(EmrOptimizedSparkSqlParquetOutputCommitter.java:9)

令人惊讶的是,EMR“优化”提交人正在进行重命名-这意味着是一个“零重命名提交人”。

写入操作是创建新文件还是更新现有文件?这可能是由于S3的一致性。使EMRFS一致view@Stargazer,它会覆盖文件夹中的现有文件(如果有)。是的,这似乎是一个一致性问题。S3仅对新文件具有写后读取一致性。更新的文件可能需要一些时间才能显示一致性。@ujwalbhardwaj我将尝试启用s3一致性。谢谢
org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: ABC4A49392884812; S3 Extended Request ID: ed+11111OKdq5jh8qY65Atc5Q33FHxy7Lzm0aWwxiHL9hhOxN+28eEPZobvLveGoJnzCoSEXsH4=), S3 Extended Request ID: ed+11111OKdq5jh8qY65Atc5Q33FHxy7Lzm0aWwxiHL9hhOxN+28eEPZobvLveGoJnzCoSEXsH4=
at com.amazon.ws.emr.hadoop.fs.s3.MultipartCopyManager.copy(MultipartCopyManager.java:180)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.copy(Jets3tNativeFileSystemStore.java:425)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.rename(S3NativeFileSystem.java:1300)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.rename(EmrFileSystem.java:321)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:457)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:471)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:582)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:545)
at com.amazon.emr.committer.FilterParquetOutputCommitter.commitTask(FilterParquetOutputCommitter.java:119)
at com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter.commitTask(EmrOptimizedSparkSqlParquetOutputCommitter.java:9)