Hive 配置单元SQL。跨多个列查找最常用的值
我有以下资料:Hive 配置单元SQL。跨多个列查找最常用的值,hive,hiveql,Hive,Hiveql,我有以下资料: name device operating browser A mob l c A mob l b A mob l b A web w b B web w c B web w c B mob w c B web l b 我想为
name device operating browser
A mob l c
A mob l b
A mob l b
A web w b
B web w c
B web w c
B mob w c
B web l b
我想为每列中的每个名称找到最常见的值,结果如下所示:
name device operating browser
A mob l b
B web w c
我怎样才能做到这一点?谢谢 这可能会有帮助。
但请注意,子查询并不是很好用的
SELECT
a.name,
(SELECT b.device FROM YOUR_TABLE_NAME b WHERE b.name = a.name GROUP BY device ORDER BY COUNT(b.device) DESC LIMIT 1) AS device,
(SELECT c.operating FROM YOUR_TABLE_NAME c WHERE c.name = a.name GROUP BY operating ORDER BY COUNT(c.operating) DESC LIMIT 1) AS operating,
(SELECT d.browser FROM YOUR_TABLE_NAME d WHERE d.name = a.name GROUP BY browser ORDER BY COUNT(d.browser) DESC LIMIT 1) AS browser
FROM YOUR_TABLE_NAME AS a
GROUP BY a.name
对于Hive 0.11+,您可以使用窗口功能,如
rank
:
select name, device, operating, browser
from (
select *, rank() over (partition by name order by cnt desc) as rnk
from (
select name, device, operating, browser, count(*) as cnt
from yourtable
group by name, device, operating, browser
) t
) t
where rnk = 1
逐步:
注意:如果特定名称中有一个tie,它将返回所有具有相同计数号的行。欢迎使用StackOverflow。我们不是免费的编码服务。请看一看和。如果您在编写代码时遇到了特定问题,请随时提问。