Hive 配置单元alter table add列由于大量分区而失败

Hive 配置单元alter table add列由于大量分区而失败,hive,hive-metastore,metastore,Hive,Hive Metastore,Metastore,我有一个有超过300k个分区的表。当我尝试添加一个像下面这样的新列时,它会运行很多小时,然后失败。Metastor rds位于mysql上,分区表有超过500万行。有人遇到过这种情况吗 alter table tablea添加列(col1字符串)cacade 错误消息: at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:638) at org.apache.hadoop.hive.ql.

我有一个有超过300k个分区的表。当我尝试添加一个像下面这样的新列时,它会运行很多小时,然后失败。Metastor rds位于mysql上,分区表有超过500万行。有人遇到过这种情况吗

alter table tablea添加列(col1字符串)cacade

错误消息:

        at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:638)
        at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3590)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:390)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1689)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1673)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:375)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:322)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
        at com.sun.proxy.$Proxy34.alter_table_with_environmentContext(Unknown Source)

最后我编写了一个for循环,循环遍历每个分区并执行

alter table表格A添加列(col1字符串)

这似乎是最安全的方法。
考虑到试图在表级执行级联的分区的数量会导致不可预测的行为,更不用说完成级联所需的时间了。

我最后编写了一个for循环,循环遍历每个分区并执行

alter table表格A添加列(col1字符串)

这似乎是最安全的方法。 考虑到试图在表级别执行级联的分区的数量会导致不可预测的行为,更不用说完成级联所需的时间了