Hive 无法使用presto db中的嵌套字段查询拼花地板数据

Hive 无法使用presto db中的嵌套字段查询拼花地板数据,hive,parquet,presto,Hive,Parquet,Presto,我有数据,其中一些包括嵌套列(对象数组的数组),在Spark 2.2中保存为拼花地板 现在,我尝试使用presto从外部访问此数据,当我尝试访问任何嵌套列时,会出现以下异常 com.facebook.presto.spi.PrestoException: Error opening Hive split hdfs://name-node/parquet_path/part-00023-8d4f14b1-a3f1-4055-b931-04838701048d-c000.snappy.parquet

我有数据,其中一些包括嵌套列(对象数组的数组),在Spark 2.2中保存为拼花地板

现在,我尝试使用presto从外部访问此数据,当我尝试访问任何嵌套列时,会出现以下异常

com.facebook.presto.spi.PrestoException: Error opening Hive split hdfs://name-node/parquet_path/part-00023-8d4f14b1-a3f1-4055-b931-04838701048d-c000.snappy.parquet (offset=0, length=108289): parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO
    at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:220)
    at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:115)
    at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:157)
    at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:93)
    at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
    at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
    at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:239)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:373)
    at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)
    at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)
    at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
    at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:477)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
 Caused by: java.lang.ClassCastException: parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO
    at parquet.io.ColumnIOConverter.constructField(ColumnIOConverter.java:56)
    at parquet.io.ColumnIOConverter.constructField(ColumnIOConverter.java:90)
 at com.facebook.presto.hive.parquet.ParquetPageSource.<init>(ParquetPageSource.java:109)
使用presto版本0.208,使用本地配置单元元存储创建外部表


任何帮助都将不胜感激:)

问题已通过
hive.parquet解决。请使用
catalog/hive.properties中定义的属性


默认情况下,presto将使用列索引来访问数据,因此需要明确定义此属性,以便按照
创建表

中的定义在拼花地板中使用列名。此问题已通过
hive.parquet.use column names=true
目录/hive.properties
中定义的属性解决

默认情况下,presto将使用列索引访问数据,因此需要明确定义此属性,以便它将使用拼花地板中的列名,如
创建表
中所定义

CREATE TABLE hive.tests.table_name (
not_nested_field_1 BIGINT,
not_nested_field_2 BIGINT,
not_nested_field_3 BOOLEAN,
not_nested_field_4 DOUBLE,
not_nested_field_5 ARRAY(VARCHAR),
not_nested_field_5 ARRAY(ROW(
    nested_level0_field1 BOOLEAN,
    nested_level0_field2 BIGINT,
    nested_level0_field3 BIGINT,
    nested_level0_field4 ARRAY(ROW(
        nested_level1_field1 BOOLEAN,
        nested_level1_field2 BIGINT,
        nested_level1_field3 VARCHAR,
        nested_level1_field4 ARRAY(ROW(
            nested_level2_field1 VARCHAR,
            nested_level2_field2 VARCHAR,
            nested_level2_field3 ARRAY(ROW(
                nested_level3_field1 VARCHAR,
                nested_level3_field2 VARCHAR)))),
        nested_level1_field5 ARRAY(ROW(
            nested_level2_field4 BIGINT,
            nested_level2_field5 BIGINT,
            nested_level2_field6 ARRAY(ROW(
                nested_level3_field3 VARCHAR,
                nested_level3_field4 VARCHAR)))))))))
WITH (
  format = 'PARQUET',
  external_location = 'hdfs://name-node/parquet_path/'
);