Hive 前N个排序行,是否按分组?
我有以下交易表: 我正在按客户id和类别进行分组,以创建产品id-score映射对列表:Hive 前N个排序行,是否按分组?,hive,apache-spark-sql,hiveql,top-n,Hive,Apache Spark Sql,Hiveql,Top N,我有以下交易表: 我正在按客户id和类别进行分组,以创建产品id-score映射对列表: SELECT s.customer_id, s.category, collect_list(s.pair) FROM ( SELECT customer_id, category, map(product_id, score) AS pair FROM
SELECT
s.customer_id,
s.category,
collect_list(s.pair)
FROM
(
SELECT
customer_id,
category,
map(product_id, score) AS pair
FROM
transaction
WHERE
score > {score_threshold}
) s
GROUP BY
s.customer_id,
s.category
现在我想更进一步。对于每个组,我希望只保留前n对,按分数降序排序。我尝试过按…顺序进行过度分区,但遇到了问题
注意:事务表是按类别划分的
谢谢试试这个:
SELECT
s.customer_id,
s.category,
collect_list(s.pair)
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer_id, category ORDER BY score desc) as RowId
customer_id,
category,
map(product_id, score) AS pair
FROM
transaction
WHERE
score > {score_threshold}
) s
where s.RowId < n
GROUP BY
s.customer_id,
s.category
SELECT
s.customer_id,
s.category,
collect_list(s.pair)
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer_id, category ORDER BY score desc) as RowId
customer_id,
category,
map(product_id, score) AS pair
FROM
transaction
WHERE
score > {score_threshold}
) s
where s.RowId < n
GROUP BY
s.customer_id,
s.category