Hive 按性能划分的行数分区
在配置单元查询中使用行数分区时如何提高性能Hive 按性能划分的行数分区,hive,query-performance,Hive,Query Performance,在配置单元查询中使用行数分区时如何提高性能 select * from ( SELECT '123' AS run_session_id , tbl1.transaction_id , tbl1.sr
select *
from
(
SELECT
'123' AS run_session_id
, tbl1.transaction_id
, tbl1.src_transaction_id
, tbl1.transaction_created_epoch_time
, tbl1.currency
, tbl1.event_type
, tbl1.event_sub_type
, tbl1.estimated_total_cost
, tbl1.actual_total_cost
, tbl1.tfc_export_created_epoch_time
, tbl1.authorizer
, tbl1.acquirer
, tbl1.processor
, tbl1.company_code
, tbl1.country_of_account
, tbl1.merchant_id
, tbl1.client_id
, tbl1.ft_id
, tbl1.transaction_created_date
, tbl1.event_pst_time
, tbl1.extract_id_seq
, tbl1.src_type
, ROW_NUMBER() OVER(PARTITION by tbl1.transaction_id ORDER BY tbl1.event_pst_time DESC) AS seq_num -- while writing back to the pfit events table, write each event so that event_pst_time populates in right way
FROM nest.nest_cost_events tbl1 --<hiveFinalDB>-- -- DB variables wont work, so need to change the DB accrodingly for testing and PROD deployment
WHERE extract_id_seq BETWEEN 275 - 60
AND 275
AND event_type in('ACT','CBR','SKU','CAL','KIT','BXT' )) tbl1
where seq_num=1;
此表按src_类型进行分区。
现在需要20个MNT来处理1.54亿条记录。我想减少到10个MNT
有什么建议吗
谢谢此表的存储格式是什么?事务id的最大出现次数是多少?您是如何测量执行时间的?此表的存储格式是什么?事务id的最大出现次数是多少?你是如何衡量执行时间的?