Hive 从配置单元表中选择“每列随机行”值
我试图为列Hive 从配置单元表中选择“每列随机行”值,hive,subquery,hiveql,Hive,Subquery,Hiveql,我试图为列散列的每个不同值检索一个随机行。我还需要dt列 到目前为止,我得出了这个不起作用的问题: INSERT OVERWRITE TABLE t PARTITION(dt) SELECT hash, dt FROM ( SELECT hash, RAND() as r, dt FROM t1 UNION SELECT hash, RAND() as r, dt FROM t2 ) result WHERE r I
散列的每个不同值检索一个随机行。我还需要dt列
到目前为止,我得出了这个不起作用的问题:
INSERT OVERWRITE TABLE t PARTITION(dt)
SELECT hash, dt FROM (
SELECT hash, RAND() as r, dt FROM t1
UNION
SELECT hash, RAND() as r, dt FROM t2
) result
WHERE r IN (SELECT MAX(r) FROM result WHERE hash=result.hash);
由于在FROM子句FROM result
中使用了表,查询失败,出现错误未找到“result”
如何修复此查询或在此处使用什么其他方法?您可以使用行\u编号
获取按r排序的每个哈希值最大的行
INSERT OVERWRITE TABLE t PARTITION(dt)
SELECT hash,dt
FROM (SELECT hash, dt, row_number() over(partition by hash order by r desc) as rnum
FROM (SELECT hash, RAND() as r, dt FROM t1
UNION ALL
SELECT hash, RAND() as r, dt FROM t2
) result
) t
WHERE rnum=1