Hive 避免蜂巢中的自连接
我正在使用蜂箱内置的collect_set功能。该表如下所示:Hive 避免蜂巢中的自连接,hive,hiveql,Hive,Hiveql,我正在使用蜂箱内置的collect_set功能。该表如下所示: cookie, events, keywords,pages 1234 1 'dress' 10 1234 1 'dress' 10 1235 2 'shoes' 14 1234 5 'socks' 22 使用collect_set,我可以得到以下结构 select cookie, collect_set(events) as
cookie, events, keywords,pages
1234 1 'dress' 10
1234 1 'dress' 10
1235 2 'shoes' 14
1234 5 'socks' 22
使用collect_set,我可以得到以下结构
select cookie, collect_set(events) as ev, collect_set(keywords) as kwords,
collect_set(pages)
from table1
group by cookie
我需要做的是搜索收集的数组,多次,例如:
select cookie
,array_contains(collect_set(events),2) as has_2
,array_contains(collect_set(keywords),1) as has_4
from table1
group by cookie) A
据我所知,我无法将一个字段投影超过1次,最终不得不执行以下操作
select a.cookie,a.has_2,b.has_4 from (
select cookie
,array_contains(collect_set(events),2) as has_2
from table1 group by cookie ) A
inner join
select cookie
,array_contains(collect_set(events),4) as has_4
from table1 group by cookie) B
on A.cookie = B. cookie
最终结果如下:
cookie, has_2, has_4
1234 F F
1235 T T
在没有自联接的情况下,有什么方法可以做到这一点吗?目前,我必须自我加入大约30次,以获得我需要的格式
谢谢您应该在SQL中引入一个GROUP BY e、 g
-将groupby添加到示例中
select
cookie,
array_contains(collect_set(events),2) as has_2,
array_contains(collect_set(keywords),1) as has_4
from
table1
group by
cookie;
select S.cookie, array_contains(S.events_set,2), array_contains(S.events_set,4)
from
(select cookie, collect_set(events) as events_set
from table1 group by cookie ) S