Apache spark 如何在Spark中启用拼花地板页面索引统计信息

Apache spark 如何在Spark中启用拼花地板页面索引统计信息,apache-spark,parquet,Apache Spark,Parquet,我正在使用Spark 3.0.1+Parquet MR 1.11.1编写一个带有简单人为模式的拼花文件: creator: parquet-mr version 1.11.1 (build 765bd5cd7fdef2af1cecd0755000694b992bfadd) extra: org.apache.spark.version = 3.0.1 message spark_schema { optional int32 number; optional bi

我正在使用Spark 3.0.1+Parquet MR 1.11.1编写一个带有简单人为模式的拼花文件:

creator:     parquet-mr version 1.11.1 (build 765bd5cd7fdef2af1cecd0755000694b992bfadd) 
extra:       org.apache.spark.version = 3.0.1

message spark_schema {
  optional int32 number;
  optional binary a (STRING);
  optional binary b (STRING);
  optional binary c (STRING);
}
使用拼花工具,我可以检查行组统计数据,并观察最小/最大统计数据:

hadoop jar ./parquet-tools-1.11.1.jar meta s3a://temp/test.parquet/part-00000-c9f96988-af49-4355-875e-7681b720edd4-c000.snappy.parquet

row group 1: RC:1000000 TS:4007575 OFFSET:4 
--------------------------------------------------------------------------------
number:       INT32 SNAPPY DO:0 FPO:4 SZ:4002297/4001849/1.00 VC:1000000 ENC:PLAIN,RLE,BIT_PACKED ST:[min: 0, max: 999999, num_nulls: 0]
a:            BINARY SNAPPY DO:0 FPO:4002301 SZ:2010/1908/0.95 VC:1000000 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: b3039ae8-dc3a-49a1-9079-ac912d990363, max: b3039ae8-dc3a-49a1-9079-ac912d990363, num_nulls: 0]
b:            BINARY SNAPPY DO:0 FPO:4004311 SZ:2011/1909/0.95 VC:1000000 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 72685ad3-7b36-4043-848c-437fa10da294, max: 72685ad3-7b36-4043-848c-437fa10da294, num_nulls: 0]
c:            BINARY SNAPPY DO:0 FPO:4006322 SZ:2011/1909/0.95 VC:1000000 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 8575ea39-337d-45b2-81a5-79606c01b48f, max: 8575ea39-337d-45b2-81a5-79606c01b48f, num_nulls: 0]
但是,当我转储列“a”时,页面级别的统计信息不会被填充,并读取“此列没有统计信息]”:

CRC输出还显示“页面损坏”,但我能够读取此文件而不会出现问题:

+------+--------------------+--------------------+--------------------+
|number|                   a|                   b|                   c|
+------+--------------------+--------------------+--------------------+
|     0|b3039ae8-dc3a-49a...|72685ad3-7b36-404...|8575ea39-337d-45b...|
|     1|b3039ae8-dc3a-49a...|72685ad3-7b36-404...|8575ea39-337d-45b...|
|     2|b3039ae8-dc3a-49a...|72685ad3-7b36-404...|8575ea39-337d-45b...|
如何启用页面索引统计信息

谢谢

+------+--------------------+--------------------+--------------------+
|number|                   a|                   b|                   c|
+------+--------------------+--------------------+--------------------+
|     0|b3039ae8-dc3a-49a...|72685ad3-7b36-404...|8575ea39-337d-45b...|
|     1|b3039ae8-dc3a-49a...|72685ad3-7b36-404...|8575ea39-337d-45b...|
|     2|b3039ae8-dc3a-49a...|72685ad3-7b36-404...|8575ea39-337d-45b...|