Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 火花。简单的;任何本地目录中都没有可用空间。”;_Apache Spark - Fatal编程技术网

Apache spark 火花。简单的;任何本地目录中都没有可用空间。”;

Apache spark 火花。简单的;任何本地目录中都没有可用空间。”;,apache-spark,Apache Spark,下面是一个简单的测试程序。这显然是一个很小的测试数据程序 from pyspark.sql.types import Row from pyspark.sql.types import * import pyspark.sql.functions as spark_functions schema = StructType([ StructField("cola", StringType()), StructField("colb", IntegerType()), ]) r

下面是一个简单的测试程序。这显然是一个很小的测试数据程序

from pyspark.sql.types import Row
from pyspark.sql.types import *
import pyspark.sql.functions as spark_functions

schema = StructType([
    StructField("cola", StringType()),
    StructField("colb", IntegerType()),
])

rows = [
    Row("alpha", 1),
    Row("beta", 2),
    Row("gamma", 3),
    Row("delta", 4)
]

data_frame = spark.createDataFrame(rows, schema)

print("count={}".format(data_frame.count()))

data_frame.write.save("s3a://my-bucket/test_data.parquet", mode="overwrite")

print("done")
这在以下情况下失败:

Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:366)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:416)

这是运行在AmazonEMR上的S3存储。有足够的磁盘空间。有人能解释一下吗?

我在EMR上使用Spark 2.2时遇到了同样的错误。设置,
fs.s3a.fast.upload=true
fs.s3a.buffer.dir=“/home/hadoop,/tmp”
(或任何其他文件夹)对我没有帮助。我的问题似乎与洗牌空间有关


我必须将
--conf spark.shuffle.service.enabled=true
添加到我的spark submit/spark shell以解决此错误。

我在这里提交了一个答案-在Cloudera pyspark2上尝试了此代码,并且可以无缝工作。