Amazon web services 配置单元使用配置单元2.3.3在EMR 5.19中返回空值_Amazon Web Services_Hadoop_Hive

Amazon web services 配置单元使用配置单元2.3.3在EMR 5.19中返回空值

amazon-web-services hadoop hive

Amazon web services 配置单元使用配置单元2.3.3在EMR 5.19中返回空值,amazon-web-services,hadoop,hive,Amazon Web Services,Hadoop,Hive,我试图查询EMR 5.19中hive-2.3.3中的一个表，得到的输出为空值： hive> select * from ip_sandbox_dev.master_schedule limit 5 ; OK NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NU

我试图查询EMR 5.19中hive-2.3.3中的一个表，得到的输出为空值：

hive> select * from ip_sandbox_dev.master_schedule limit 5 ;
OK
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 2.067 seconds, Fetched: 5 row(s)

但是，当我从EMR-5.4 hive 2.1.1查询同一个表时，我得到了预期的结果：

OK
THURSDAY        ABQ     ABC     3       4       ABQABC3 MIDWEST TRUCK & AUTO PARTS      18      14      Penny Mayfield  N
TUESDAY ABQ     ABC     0       4       ABQABC0 RANGER BRAKE PRODUCTS   15      14      Penny Mayfield  N
TUESDAY ABQ     ABC     1       4       ABQABC1 RANGER BRAKE PRODUCTS   15      14      Penny Mayfield  N
TUESDAY ABQ     ABC     2       4       ABQABC2 RANGER BRAKE PRODUCTS   15      14      Penny Mayfield  N
TUESDAY ANC     ABC     0       8       ANCABC0 RANGER BRAKE PRODUCTS   27      14      Penny Mayfield  N
Time taken: 2.022 seconds, Fetched: 5 row(s)

显示创建表的结果：

CREATE EXTERNAL TABLE `ip_sandbox_dev.master_schedule`(
  `schedule_day` string,
  `dc` string,
  `mfg` string,
  `subline` int,
  `weeks` int,
  `con` string,
  `supplier` string,
  `leadtime` int,
  `buyer` int,
  `buyer_name` string,
  `optimize_flag` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
  'numFiles'='2',
  'numRows'='59329',
  'rawDataSize'='38922302',
  'totalSize'='658865',
  'transient_lastDdlTime'='1569395007')

我不知道为什么结果会出现这种差异。我试图删除并重新创建表，但得到了相同的结果

以下是我的hive.log：

2019-10-11T08:25:55,404 ERROR [ORC_GET_SPLITS #0([])]: io.AcidUtils (AcidUtils.java:getAcidState(791)) - Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.s3a.S3AFileSystem
2019-10-11T08:25:55,411 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,487 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(1735)) - FooterCacheHitRatio: 0/2
2019-10-11T08:25:55,672 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,673 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC rows from s3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule/part-00000-8c46f760-a26b-4fe6-ba3b-fc4d2d0ef228-c000.orc with {include: [true, true, true, true, true, true, true, true, true, true, true, true], offset: 0, length: 648566, schema: struct<schedule_day:string,dc:string,mfg:string,subline:int,weeks:int,con:string,supplier:string,leadtime:int,buyer:int,buyer_name:string,optimize_flag:string>}
2019-10-11T08:25:55,786 WARN  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: internal.S3AbortableInputStream (S3AbortableInputStream.java:close(178)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

2019-10-11T08:25:55404错误[ORC_GET_SPLITS#0（[]）]：io.AcidUtils（AcidUtils.java:getAcidState（791））-无法获取ID为的文件；使用常规API：仅支持DFS；获取类org.apache.hadoop.fs.s3a.S3AFileSystem
2019-10-11T08:25:55411信息[5b9e417b-8008-4fd2-b3c7-987fec297d63 main（[]）：orc.orInputFormat（orInputFormat.java:getDesiredRowTypeDescr（2463））-使用模式演化配置变量schema.evolution.columns[日程安排日、dc、制造商、子行、周、con、供应商、交付周期、买方、买方名称、优化标志]/schema.evolution.columns.types[string，string，string，int，int，string，int，int，string，string]（isAcidRead false）
2019-10-11T08:25:55487信息[5b9e417b-8008-4fd2-b3c7-987fec297d63主（[]）：orc.orInputFormat（orInputFormat.java:generateSplitsInfo（1735））-页脚缓存比率：0/2
2019-10-11T08:25:55672信息[5b9e417b-8008-4fd2-b3c7-987fec297d63 main（[]）：orc.orInputFormat（orInputFormat.java:getDesiredRowTypeDescr（2463））-使用模式演化配置变量schema.evolution.columns[日程安排日、dc、制造商、子行、周、con、供应商、交付周期、买方、买方名称、优化标志]/schema.evolution.columns.types[string，string，string，int，int，string，int，int，string，string]（isAcidRead false）
2019-10-11T08:25:55673信息[5b9e417b-8008-4fd2-b3c7-987fec297d63 main（[]）：orc.ReaderImpl（ReaderImpl.java:rowsOptions（79））-从s3a://aap仓库默认开发/ip_sandbox.db/master_schedule/part-00000-8c46f760-a26b-4fe6-ba3b-FC4D200EF228-c000读取orc行，{包括：[真，真，真，真，真，真，真，真，真，真，真，真]，偏移量：0，长度：648566，架构：struct}
2019-10-11T08:25:55786警告[5b9e417b-8008-4fd2-b3c7-987fec297d63主（[]）：internal.S3AbortableInputStream（S3AbortableInputStream.java:close（178））-未从S3ObjectInputStream读取所有字节，正在中止HTTP连接。这可能是一个错误，可能会导致次优行为。请仅通过远程GET请求所需的字节，或在使用后耗尽输入流。

有人能帮助克服这个问题吗？

尝试创建表格，例如

创建表格一些表格（列字符串，…）存储为ORC

，而不使用serde。