Hive 按性能划分的行数分区

Hive 按性能划分的行数分区,hive,query-performance,Hive,Query Performance,在配置单元查询中使用行数分区时如何提高性能 select * from ( SELECT '123' AS run_session_id , tbl1.transaction_id , tbl1.sr

在配置单元查询中使用行数分区时如何提高性能

    select *
    from
    (
    SELECT
                      '123'                                                                         AS run_session_id
                    , tbl1.transaction_id
                    , tbl1.src_transaction_id
                    , tbl1.transaction_created_epoch_time
                    , tbl1.currency
                    , tbl1.event_type
                    , tbl1.event_sub_type
                    , tbl1.estimated_total_cost
                    , tbl1.actual_total_cost
                    , tbl1.tfc_export_created_epoch_time
                    , tbl1.authorizer
                    , tbl1.acquirer
                    , tbl1.processor
                    , tbl1.company_code
                    , tbl1.country_of_account
                    , tbl1.merchant_id
                    , tbl1.client_id
                    , tbl1.ft_id
                    , tbl1.transaction_created_date
                    , tbl1.event_pst_time
                    , tbl1.extract_id_seq
                    , tbl1.src_type
                    , ROW_NUMBER() OVER(PARTITION by tbl1.transaction_id ORDER BY tbl1.event_pst_time DESC)   AS seq_num       -- while writing back to the pfit events table, write each event so that event_pst_time populates in right way

                  FROM nest.nest_cost_events tbl1                                --<hiveFinalDB>--                           -- DB variables wont work, so need to change the DB accrodingly for testing and PROD deployment
                  WHERE extract_id_seq     BETWEEN 275 - 60
                                           AND 275
                    AND event_type    in('ACT','CBR','SKU','CAL','KIT','BXT' )) tbl1
    where seq_num=1;                
此表按src_类型进行分区。 现在需要20个MNT来处理1.54亿条记录。我想减少到10个MNT

有什么建议吗


谢谢

此表的存储格式是什么?事务id的最大出现次数是多少?您是如何测量执行时间的?此表的存储格式是什么?事务id的最大出现次数是多少?你是如何衡量执行时间的?