Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 为什么DataFrameWriter.parquet()编写的文件/分区中没有数据_Apache Spark - Fatal编程技术网

Apache spark 为什么DataFrameWriter.parquet()编写的文件/分区中没有数据

Apache spark 为什么DataFrameWriter.parquet()编写的文件/分区中没有数据,apache-spark,Apache Spark,上下文 我正在尝试从Spark 1.6.1迁移到Spark 2.0.0。我这里的问题可能不完全与Spark版本有关,更多的是与压缩格式有关。我试图阅读Spark 1.6.1编写的压缩为gzip格式的拼花文件。我需要添加一个额外的文本列并将其保存回磁盘。这个过程现在正在使用Spark 2.0.0进行。我注意到输出中包含大量小文件,其中只包含元数据。以前,当我加载这些拼花地板文件时,我得到的分区数(df.rdd.partitions.size)与磁盘上拼花地板的分割数相同。后来我意识到这是由于gzi

上下文

我正在尝试从Spark 1.6.1迁移到Spark 2.0.0。我这里的问题可能不完全与Spark版本有关,更多的是与压缩格式有关。我试图阅读Spark 1.6.1编写的压缩为gzip格式的拼花文件。我需要添加一个额外的文本列并将其保存回磁盘。这个过程现在正在使用Spark 2.0.0进行。我注意到输出中包含大量小文件,其中只包含元数据。以前,当我加载这些拼花地板文件时,我得到的分区数(df.rdd.partitions.size)与磁盘上拼花地板的分割数相同。后来我意识到这是由于gzip格式不可拆分。但是,我不明白Spark为什么要将空分区写回磁盘。Spark 2.0默认为snappy,用于压缩拼花地板文件

到目前为止我所知道的

  • gzip
    (1.6.1中的默认值)是一种不可拆分的格式,将导致内存中的分区与磁盘上的拆分一样多
  • 使用
    snappy
    允许Spark使用可用的执行器加载数据,这解释了分区数量的动态特性
  • sc.hadoopConfiguration.set(“parquet.enable.summary metadata”,“true”)
  • 我还关闭了模式合并
问题

HADOOP_CONF_DIR=/etc/hive/conf /home/srikar/spark-2.0.0/bin/spark-shell --master yarn --deploy-mode client --driver-class-path '/etc/hive/conf' --num-executors 100 --executor-memory 6g --driver-memory 8g

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = spark.read.parquet("/user/srikary/data/2016/07/05")
df: org.apache.spark.sql.DataFrame = [user_uuid: string, client_uuid: string ... 2 more fields]

scala> df.rdd.partitions.size
res1: Int = 95

scala> df.write.parquet("/user/srikary/test/partitions_test")
  • Spark为什么要将空文件写入磁盘
  • 如何使Spark将分区大小优化为每个文件128MB?(我知道这可以通过重新分区/合并来实现。不过我需要计算重新分区的参数)
验证我的声明的命令

HADOOP_CONF_DIR=/etc/hive/conf /home/srikar/spark-2.0.0/bin/spark-shell --master yarn --deploy-mode client --driver-class-path '/etc/hive/conf' --num-executors 100 --executor-memory 6g --driver-memory 8g

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = spark.read.parquet("/user/srikary/data/2016/07/05")
df: org.apache.spark.sql.DataFrame = [user_uuid: string, client_uuid: string ... 2 more fields]

scala> df.rdd.partitions.size
res1: Int = 95

scala> df.write.parquet("/user/srikary/test/partitions_test")
列出上述测试中使用的两个目录的输出
srikar@localhost:~$ hadoop fs -ls  /user/srikary/data/2016/07/05
Found 21 items
-rw-r--r--   3 srikar srikar          0 2016-10-24 01:09 /user/srikary/data/2016/07/05/_SUCCESS
-rw-r--r--   3 srikar srikar        473 2016-10-24 01:09 /user/srikary/data/2016/07/05/_common_metadata
-rw-r--r--   3 srikar srikar      12303 2016-10-24 01:09 /user/srikary/data/2016/07/05/_metadata
-rw-r--r--   3 srikar srikar   34576052 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00000-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34574386 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00001-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34575034 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00002-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34588117 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00003-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34578050 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00004-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34584603 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00005-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34595888 2016-10-24 01:09 /user/srikary/data/2016/07/05/part-r-00006-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34582493 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00007-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34594552 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00008-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34584819 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00009-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34601397 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00010-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34580279 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00011-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34651221 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00012-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34605249 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00013-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34561204 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00014-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34603328 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00015-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34575536 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00016-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
-rw-r--r--   3 srikar srikar   34597036 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00017-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet
使用
snappy

srikary@localhost:~$ hadoop fs -ls /user/srikary/test/partitions_test
Found 96 items
-rw-r--r--   3 srikary srikary          0 2016-12-12 00:45 /user/srikary/test/partitions_test/_SUCCESS
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00000-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00001-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59317007 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00002-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00003-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00004-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00005-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59321056 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00006-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00007-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00008-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00009-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59322819 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00010-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00011-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00012-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00013-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59313102 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00014-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00015-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00016-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00017-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59323721 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00018-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00019-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00020-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00021-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59316186 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00022-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00023-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00024-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00025-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59323141 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00026-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00027-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00028-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00029-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59322078 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00030-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00031-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00032-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00033-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59325795 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00034-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00035-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00036-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00037-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59329053 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00038-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00039-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00040-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00041-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59317677 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00042-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00043-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00044-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00045-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59324442 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00046-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00047-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00048-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00049-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59325743 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00050-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00051-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00052-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00053-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59317381 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00054-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00055-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00056-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00057-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59324735 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00058-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00059-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00060-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00061-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59320296 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00062-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00063-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00064-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00065-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59312148 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00066-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00067-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00068-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00069-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59326905 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00070-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00071-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00072-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00073-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary   59326284 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00074-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00075-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00076-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00077-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00078-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00079-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00080-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00081-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00082-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00083-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00084-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00085-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00086-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00087-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00088-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00089-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00090-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00091-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00092-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00093-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
-rw-r--r--   3 srikary srikary        516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00094-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet
Spark为什么要将空文件写入磁盘

它写入分区内容。它是否是空的并不重要。这是Spark和相关系统的正常行为

如何使Spark将分区大小优化为每个文件128MB?(我知道这可以通过重新分区/合并来实现。不过我需要计算重新分区的参数)

这是一个很难回答的问题。通常不可能获得多个分区的精确值。拼花地板is对数据应用不同的压缩技术,其内容与原始数量一样重要