Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/laravel/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 配置单元查询逻辑与优化_Sql_Hadoop_Hive_Hdfs_Hiveql - Fatal编程技术网

Sql 配置单元查询逻辑与优化

Sql 配置单元查询逻辑与优化,sql,hadoop,hive,hdfs,hiveql,Sql,Hadoop,Hive,Hdfs,Hiveql,我有以下格式的数据: 输入 输出: **ID col1 col2 col3** 1 C1_abc C1_xce C1_fde 2 C1_sds C1_hhh null 3 C1_aaa null null 4 C1_asw C1_eee C1_ttt 我想使用配置单元脚本实现这一点。我知道多种方法,但需要最优化的方法,因为数据量很大。只需使用条件聚合: select i

我有以下格式的数据:

输入

输出:

**ID    col1    col2      col3**
1     C1_abc     C1_xce    C1_fde      
2     C1_sds     C1_hhh    null
3     C1_aaa     null      null
4     C1_asw     C1_eee    C1_ttt

我想使用配置单元脚本实现这一点。我知道多种方法,但需要最优化的方法,因为数据量很大。

只需使用条件聚合:

select id,
       max(case when rank = 1 then col1 end) as col1,
       max(case when rank = 2 then col1 end) as col2,
       max(case when rank = 3 then col1 end) as col3
from t
where t1.rank in (1, 2, 3)
group by id;
另一种选择是多路连接:

select t1.id, t1.col1, t2.col1 as col2, t3.col1 as col3
from t t1 left join
     t t2
     on t1.rank = 1 and t2.rank = 2 and t1.id = t2.id left join
     t t3
     on t1.id = t3.id and t3.rank = 3;

您可能需要同时尝试这两种方法,以查看哪种方法运行得更快。根据您的数据,它可能会有所不同。

第一个选项很有魅力。在问这个问题之前,我已经使用了第二个选项,但它没有优化,需要40分钟才能完成。这是在60秒内完成的。谢谢你,戈登。
select t1.id, t1.col1, t2.col1 as col2, t3.col1 as col3
from t t1 left join
     t t2
     on t1.rank = 1 and t2.rank = 2 and t1.id = t2.id left join
     t t3
     on t1.id = t3.id and t3.rank = 3;