Hadoop 配置单元acid更新和删除错误

Hadoop 配置单元acid更新和删除错误,hadoop,hive,Hadoop,Hive,我使用hive1.2.1和tez0.7进行测试,但是当我使用acid table进行更新和删除时,出现了一些问题,下面是表结构: CREATE EXTERNAL TABLE IF NOT EXISTS working.dw_items_w ( column defination ) CLUSTERED BY (id) into 5000 buckets STORED AS ORC LOCATION '/sys/edw/working/dw_items_w2' TBLPROPERTIES ("tr

我使用hive1.2.1和tez0.7进行测试,但是当我使用acid table进行更新和删除时,出现了一些问题,下面是表结构:

CREATE EXTERNAL TABLE IF NOT EXISTS working.dw_items_w
(
column defination
)
CLUSTERED BY (id) into 5000 buckets
STORED AS ORC
LOCATION '/sys/edw/working/dw_items_w2'
TBLPROPERTIES ("transactional"="true");
更新查询如下:

update working.dw_items_w
set 
PROCESS_FLAG =(case when (
(TGT_LSTG_STATUS_ID = 1 and (to_date(SALE_END) - to_date(TGT_AUCT_END_DT) ) <> 0 )
or  (TGT_LSTG_STATUS_ID in (1,2) and NEW_LSTG_STATUS_ID in (0,4) )   
) then  'D' 
when 
((TGT_LSTG_STATUS_ID =1 and NEW_LSTG_STATUS_ID = 1 and datediff(to_date(SALE_END) ,to_date(TGT_AUCT_END_DT) 
) = 0 )
or (TGT_LSTG_STATUS_ID = 2 and NEW_LSTG_STATUS_ID = 1)) then 'X' else PROCESS_FLAG end ),
NEW_LSTG_STATUS_ID = (case when TGT_LSTG_STATUS_ID = 0  AND NEW_LSTG_STATUS_ID = 0   AND to_date(SALE_END)
 <  date_sub(to_date( from_unixtime(unix_timestamp(),'yyyy-MM-dd') ), 92)
     AND to_date(SALE_END)  <> to_date('1969-12-31') then 1 else NEW_LSTG_STATUS_ID end) 

where PROCESS_FLAG = 'U';
更新working.dw\u项目
设置
进程_标志=(在(
(TGT-LSTG-STATUS-ID=1和(截止日期(销售结束)-截止日期(销售结束))0)
或(1,2)中的TGT状态ID和(0,4)中的新状态ID)
)那就
什么时候
((TGT\U LSTG\U STATUS\U ID=1,新的\U LSTG\U STATUS\U ID=1,日期差(截止日期(销售结束),截止日期(TGT\U拍卖结束)
) = 0 )
或者(TGT_LSTG_STATUS_ID=2和NEW_LSTG_STATUS_ID=1))然后“X”否则进程_标志结束),
新状态ID=(TGT状态ID=0,新状态ID=0和截止日期(销售结束)时的情况)
问题如下:

update working.dw_items_w
set 
PROCESS_FLAG =(case when (
(TGT_LSTG_STATUS_ID = 1 and (to_date(SALE_END) - to_date(TGT_AUCT_END_DT) ) <> 0 )
or  (TGT_LSTG_STATUS_ID in (1,2) and NEW_LSTG_STATUS_ID in (0,4) )   
) then  'D' 
when 
((TGT_LSTG_STATUS_ID =1 and NEW_LSTG_STATUS_ID = 1 and datediff(to_date(SALE_END) ,to_date(TGT_AUCT_END_DT) 
) = 0 )
or (TGT_LSTG_STATUS_ID = 2 and NEW_LSTG_STATUS_ID = 1)) then 'X' else PROCESS_FLAG end ),
NEW_LSTG_STATUS_ID = (case when TGT_LSTG_STATUS_ID = 0  AND NEW_LSTG_STATUS_ID = 0   AND to_date(SALE_END)
 <  date_sub(to_date( from_unixtime(unix_timestamp(),'yyyy-MM-dd') ), 92)
     AND to_date(SALE_END)  <> to_date('1969-12-31') then 1 else NEW_LSTG_STATUS_ID end) 

where PROCESS_FLAG = 'U';
在 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) 位于org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) 位于org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) 位于org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) 位于org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) 位于java.security.AccessController.doPrivileged(本机方法) 位于javax.security.auth.Subject.doAs(Subject.java:415) 位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1650) 位于org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) 位于org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) 位于org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 在java.util.concurrent.FutureTask.run(FutureTask.java:262)处 位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 在java.lang.Thread.run(Thread.java:745)处,由以下原因引起:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:配置单元运行时错误 处理行时(标记=0) {“key”:{“reducesinkkey0”:{“transactionid”:19,“bucketid”:471,“rowid”:0}},“value”:忽略} 位于org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302) 位于org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249) 位于org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 14多


将以下内容添加到hive-site.xml

<property>
    <name>hive.enforce.bucketing</name>
    <value>true</value>
</property>
<property>
    <name>hive.compactor.initiator.on</name>
    <value>true</value>
</property>
<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
</property>
<property>
    <name>hive.compactor.worker.threads</name>
    <value>1</value>
</property>
<property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>

蜂巢
真的
hive.compactor.initiator.on
真的
hive.support.concurrency
真的
hive.compactor.worker.threads
1.
hive.txn.manager
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
然后确保您正在创建一个ORC表,并在predict上显示bucketing:

如果不存在,则创建表foo.tableinfo ( 模式名称varchar(32) ,表_name varchar(64) ,部门varchar(64) ,country varchar(64) ,state varchar(64) ,城市瓦查尔(64) ,粒度int ,notes varchar(256) ) 按(表名称)聚集到4个存储桶中 储存为兽人 TBLProperty(“orc.compress”=“ZLIB”、“transactional”=“true”)

然后,以下步骤将起作用:

从foo.tableinfo中删除,其中table_name='foo'