Hive 蜂巢多选
结果:Hive 蜂巢多选,hive,Hive,结果: CID F_ID NME 1 A QR 1 B QB 2 A QR 3 B QB 4 A QR 4 B QB 在配置单元中,用于获取结果的查询应该只输出CID 在F_ID-A和B中,我可以使用oracle中的LISTAGG实现相同的结果此查询将在单个map reduce阶段执行: CID F
CID F_ID NME
1 A QR
1 B QB
2 A QR
3 B QB
4 A QR
4 B QB
在配置单元中,用于获取结果的查询应该只输出CID
在F_ID-A和B中,我可以使用oracle中的LISTAGG实现相同的结果此查询将在单个map reduce阶段执行:
CID F_ID NME
1 A QR
1 B QB
4 A QR
4 B QB
演示:
结果:
select CID, F_ID, NME from
(
select s.*,
sum(A) over (partition by CID) A_cnt,
sum(B) over (partition by CID) B_cnt
from
(
select s.*,
case when F_ID='A' then 1 else 0 end A,
case when F_ID='B' then 1 else 0 end B
from
( --replace this subquery (s) with your table
select stack(6,
1, 'A', 'QR',
1, 'B', 'QB',
2, 'A', 'QR',
3, 'B', 'QB',
4, 'A', 'QR',
4, 'B', 'QB') as (CID, F_ID, NME)
) s
)s
)s where A_cnt>=1 and B_cnt >=1
;
感谢leftjoin的响应,它非常有用,如果在中执行相同的查询,您能告诉我性能的改进吗Impala@kalis,我无法帮助您改进Impala性能,抱歉,因为我不使用Impala,在AWS平台上工作。用执行日志等问这个问题,社区的人肯定会帮你解决这个问题。
select CID, F_ID, NME from
(
select s.*,
sum(A) over (partition by CID) A_cnt,
sum(B) over (partition by CID) B_cnt
from
(
select s.*,
case when F_ID='A' then 1 else 0 end A,
case when F_ID='B' then 1 else 0 end B
from
( --replace this subquery (s) with your table
select stack(6,
1, 'A', 'QR',
1, 'B', 'QB',
2, 'A', 'QR',
3, 'B', 'QB',
4, 'A', 'QR',
4, 'B', 'QB') as (CID, F_ID, NME)
) s
)s
)s where A_cnt>=1 and B_cnt >=1
;
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.39 sec HDFS Read: 13549 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 390 msec
OK
1 B QB
1 A QR
4 B QB
4 A QR
Time taken: 108.779 seconds, Fetched: 4 row(s)