Hadoop 如何计算小于配置单元表中特定行的行数?
考虑Hive中的下表Hadoop 如何计算小于配置单元表中特定行的行数?,hadoop,hive,hql,hiveql,bigdata,Hadoop,Hive,Hql,Hiveql,Bigdata,考虑Hive中的下表 +------+------+ | id | res | +------+------+ | 1 | 55 | | 2 | 10 | | 3 | 89 | | 4 | 100 | | 5 | 80 | | 6 | 55 | | 7 | 70 | | 8 | 35 | | 9 | 46 | | 10 | 51 | +------+------+ 现在我必须计算一行中小于re
+------+------+
| id | res |
+------+------+
| 1 | 55 |
| 2 | 10 |
| 3 | 89 |
| 4 | 100 |
| 5 | 80 |
| 6 | 55 |
| 7 | 70 |
| 8 | 35 |
| 9 | 46 |
| 10 | 51 |
+------+------+
现在我必须计算一行中小于res值的行数
对于上表,输出应为
+------+------+
| id |count |
+------+------+
| 1 | 4 |
| 2 | 0 |
| 3 | 8 |
| 4 | 9 |
| 5 | 7 |
| 6 | 4 |
| 7 | 6 |
| 8 | 1 |
| 9 | 2 |
| 10 | 3 |
+------+------+
您可以尝试对功能进行排序 示例Hiveql
select
id,
res,
rank() over (ORDER BY res) as rank
from
my_table
order by
res
阅读更多信息。您可以尝试对功能进行排序 示例Hiveql
select
id,
res,
rank() over (ORDER BY res) as rank
from
my_table
order by
res
请阅读更多信息。Voila'
+-----+------+
| id | _c1 |
+-----+------+
| 1 | 4 |
| 2 | 0 |
| 3 | 8 |
| 4 | 9 |
| 5 | 7 |
| 6 | 4 |
| 7 | 6 |
| 8 | 1 |
| 9 | 2 |
| 10 | 3 |
+-----+------+
这很简单,也很疯狂,因为这个查询会做叉积。当然,对于每一行,您必须找到所有具有较小值的行,看起来像叉积的东西是隐式的
SELECT id, SUM(IF ( c.res1 > c.res2, 1 , 0 ))
FROM (
SELECT id, a.res AS res1, b.res AS res2
FROM test_4 AS a
INNER JOIN (
SELECT res
FROM test_4
) b
) c
GROUP BY id;
瞧
这很简单,也很疯狂,因为这个查询会做叉积。当然,对于每一行,您必须找到所有具有较小值的行,看起来像叉积的东西是隐式的
SELECT id, SUM(IF ( c.res1 > c.res2, 1 , 0 ))
FROM (
SELECT id, a.res AS res1, b.res AS res2
FROM test_4 AS a
INNER JOIN (
SELECT res
FROM test_4
) b
) c
GROUP BY id;
您可以执行以下操作,但请记住从排名结果中删除1,因为我们没有检查您可以执行以下操作,但请记住从排名结果中删除1,因为我们没有检查排名可能是一种方法,但这里有一个有趣的选择:
SELECT mt.id AS id
, mt.res AS res
, COUNT(1) OVER (PARTITION BY NULL ORDER BY mt.res ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - 1 AS cnt
FROM my_table mt
排名可能是一条路,但这里有一个有趣的选择:
SELECT mt.id AS id
, mt.res AS res
, COUNT(1) OVER (PARTITION BY NULL ORDER BY mt.res ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - 1 AS cnt
FROM my_table mt
??? 您的查询返回:
(2,10,1)(8,35,2)(9,46,3)(10,51,4)(1,55,5)(6,55,5)(7,70,7)(5,80,8)(3,89,9)(4100,10)
。你运行了你的查询吗?如果我对这个主题感兴趣,请告诉我。@ozw1z5rd我能看到的唯一区别是开始索引Rank
返回从1开始的索引。休息,一切都一样。太好了!我错过了,它没有交叉积就可以工作。???您的查询返回:(2,10,1)(8,35,2)(9,46,3)(10,51,4)(1,55,5)(6,55,5)(7,70,7)(5,80,8)(3,89,9)(4100,10)
。你运行了你的查询吗?如果我对这个主题感兴趣,请告诉我。@ozw1z5rd我能看到的唯一区别是开始索引Rank
返回从1开始的索引。休息,一切都一样。太好了!我错过了,它没有交叉积。