Hadoop 尽管添加了会话设置,查询仍需要时间

Hadoop 尽管添加了会话设置,查询仍需要时间,hadoop,hive,yarn,hiveql,apache-tez,Hadoop,Hive,Yarn,Hiveql,Apache Tez,下面是ETL生成的查询 查询- SELECT infaHiveSysTimestamp('SS') as a0, 7991 as a1, single_use_subq30725.a1 as a2, SUBSTR(SUBSTR(single_use_subq30725.a2, 0, 5), 0, 5) as a3, CAST(1 AS SMALLINT) as a4, single_use_subq30725.a3 as a5, single_use_subq30725.a4 as a6, SU

下面是ETL生成的查询

查询-

SELECT infaHiveSysTimestamp('SS') as a0, 7991 as a1, single_use_subq30725.a1 as a2, SUBSTR(SUBSTR(single_use_subq30725.a2, 0, 5), 0, 5) as a3, CAST(1 AS SMALLINT) as a4, single_use_subq30725.a3 as a5, single_use_subq30725.a4 as a6, SUBSTR(SUBSTR(SUBSTR(single_use_subq30725.a8, (CASE WHEN 12 < (- LENGTH(single_use_subq30725.a8)) THEN 0 ELSE 12 END), 104857600), 0, 20), 0, 20) as a7, infaNativeUDFCallString('TO_CHAR', single_use_subq30725.a5) as a8, infaHiveSysTimestamp('SS') as a9, CAST(infaNativeUDFCallDate('TRUNC', single_use_subq30725.a6, 'DD') AS DATE) as a10 FROM (SELECT (CASE WHEN 1 = t1.a1 THEN t1.a0 ELSE CAST(NULL AS TIMESTAMP) END) as a0, infaNativeUDFCallDate('TRUNC', (CASE WHEN 1 = t1.a1 THEN t1.a0 ELSE CAST(NULL AS TIMESTAMP) END), 'DD') as a1 
FROM 
    (
        SELECT MAX(t1.a0) as a0, MAX(t1.a1) as a1 
        FROM (
            SELECT mstr_load_audit.last_run_ts as a0, 1 as a1 FROM mstr_etl.mstr_load_audit WHERE interface_name='m_CTM_RAWTLogData_target_tbl'
            ) t1
        )t1
    ) single_use_subq39991
JOIN (
    SELECT w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.CREATE_TS as a0, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.txACTION_ID AS STRING) as a1, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10))) THEN 0 ELSE 0 END), 10) as a2, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10))) THEN 0 ELSE 0 END), 10) as a3, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 20), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 20))) THEN 0 ELSE 0 END), 20) as a4, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LOYALTY_DEV_NUM AS DECIMAL(28, 0)) as a5, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_DT AS TIMESTAMP) as a6, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.ETL_LOAD_DT AS TIMESTAMP) as a7, w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_TS as a8 
FROM 
sourcedb.W5883634877684653839_Read_lcl_tlog_raw_2_VIEW__m_CTM_RAWTLogData_target_tbl
) 
single_use_subq30725 
WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)]
但我们没有看到任何重大进展。 我们确实可以选择在传统的批处理模式下移动此作业,通过shell脚本运行此操作。 是否存在通过减少执行时间对查询进行任何更改的范围。我相信我们可以摆脱所有类型转换,并减少执行时间。
是否还有其他内容,我们可以尝试。

请加入您的查询:

JOIN (
    SELECT 
 ... SKIPPED ...
) 
single_use_subq30725 
WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)]
添加此设置以启用映射联接:
set-hive.auto.convert.join=true
检查解释输出中是否存在映射联接

但最大的问题不是这个交叉(地图?)连接本身。当在第二个查询中读取表时,它防止谓词下推在联接之前工作

我建议完全删除join并计算第一个查询一次,并在
where
子句中提供
a0
a1
作为参数。这样,您将消除不必要的连接,谓词下推可能直接起作用

例如,PPD可应用于此列:
w5883634877684653839_read_lcl_tlog_raw_2_view_m_ctm_rawtlogdata_target_tbl.CREATE_TS as a0


检查PPD和其他性能设置:

在查询中加入此连接:

JOIN (
    SELECT 
 ... SKIPPED ...
) 
single_use_subq30725 
WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)]
添加此设置以启用映射联接:
set-hive.auto.convert.join=true
检查解释输出中是否存在映射联接

但最大的问题不是这个交叉(地图?)连接本身。当在第二个查询中读取表时,它防止谓词下推在联接之前工作

我建议完全删除join并计算第一个查询一次,并在
where
子句中提供
a0
a1
作为参数。这样,您将消除不必要的连接,谓词下推可能直接起作用

例如,PPD可应用于此列:
w5883634877684653839_read_lcl_tlog_raw_2_view_m_ctm_rawtlogdata_target_tbl.CREATE_TS as a0

检查PPD和其他性能设置:

SELECT MAX(t1.a0) as a0, MAX(t1.a1) as a1