Hadoop 如何计算小于配置单元表中特定行的行数?

Hadoop 如何计算小于配置单元表中特定行的行数?,hadoop,hive,hql,hiveql,bigdata,Hadoop,Hive,Hql,Hiveql,Bigdata,考虑Hive中的下表 +------+------+ | id | res | +------+------+ | 1 | 55 | | 2 | 10 | | 3 | 89 | | 4 | 100 | | 5 | 80 | | 6 | 55 | | 7 | 70 | | 8 | 35 | | 9 | 46 | | 10 | 51 | +------+------+ 现在我必须计算一行中小于re

考虑Hive中的下表

+------+------+
| id   | res  |
+------+------+
|    1 |   55 |
|    2 |   10 |
|    3 |   89 |
|    4 |  100 |
|    5 |   80 |
|    6 |   55 |
|    7 |   70 |
|    8 |   35 |
|    9 |   46 |
|   10 |   51 |
+------+------+
现在我必须计算一行中小于res值的行数

对于上表,输出应为

+------+------+
| id   |count |
+------+------+
|    1 |    4 |
|    2 |    0 |
|    3 |    8 |
|    4 |    9 |  
|    5 |    7 |
|    6 |    4 | 
|    7 |    6 |
|    8 |    1 |
|    9 |    2 |
|   10 |    3 |
+------+------+

您可以尝试对功能进行排序

示例Hiveql

select
  id,
  res,
  rank() over (ORDER BY res) as rank
from
  my_table
order by
  res

阅读更多信息。

您可以尝试对功能进行排序

示例Hiveql

select
  id,
  res,
  rank() over (ORDER BY res) as rank
from
  my_table
order by
  res
请阅读更多信息。

Voila'

+-----+------+
| id  | _c1  |
+-----+------+
| 1   | 4    |
| 2   | 0    |
| 3   | 8    |
| 4   | 9    |
| 5   | 7    |
| 6   | 4    |
| 7   | 6    |
| 8   | 1    |
| 9   | 2    |
| 10  | 3    |
+-----+------+
这很简单,也很疯狂,因为这个查询会做叉积。当然,对于每一行,您必须找到所有具有较小值的行,看起来像叉积的东西是隐式的

SELECT id, SUM(IF ( c.res1 > c.res2, 1 , 0 )) 
FROM ( 
    SELECT id, a.res AS res1, b.res AS res2 
    FROM test_4 AS a 
         INNER JOIN ( 
            SELECT res
            FROM test_4 
         ) b 
) c 
GROUP BY id;

这很简单,也很疯狂,因为这个查询会做叉积。当然,对于每一行,您必须找到所有具有较小值的行,看起来像叉积的东西是隐式的

SELECT id, SUM(IF ( c.res1 > c.res2, 1 , 0 )) 
FROM ( 
    SELECT id, a.res AS res1, b.res AS res2 
    FROM test_4 AS a 
         INNER JOIN ( 
            SELECT res
            FROM test_4 
         ) b 
) c 
GROUP BY id;

您可以执行以下操作,但请记住从排名结果中删除1,因为我们没有检查您可以执行以下操作,但请记住从排名结果中删除1,因为我们没有检查排名可能是一种方法,但这里有一个有趣的选择:

SELECT      mt.id              AS id 
            , mt.res           AS res 
            , COUNT(1) OVER (PARTITION BY NULL ORDER BY mt.res ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - 1   AS cnt
FROM        my_table mt 

排名可能是一条路,但这里有一个有趣的选择:

SELECT      mt.id              AS id 
            , mt.res           AS res 
            , COUNT(1) OVER (PARTITION BY NULL ORDER BY mt.res ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - 1   AS cnt
FROM        my_table mt 

??? 您的查询返回:
(2,10,1)(8,35,2)(9,46,3)(10,51,4)(1,55,5)(6,55,5)(7,70,7)(5,80,8)(3,89,9)(4100,10)
。你运行了你的查询吗?如果我对这个主题感兴趣,请告诉我。@ozw1z5rd我能看到的唯一区别是开始索引
Rank
返回从1开始的索引。休息,一切都一样。太好了!我错过了,它没有交叉积就可以工作。???您的查询返回:
(2,10,1)(8,35,2)(9,46,3)(10,51,4)(1,55,5)(6,55,5)(7,70,7)(5,80,8)(3,89,9)(4100,10)
。你运行了你的查询吗?如果我对这个主题感兴趣,请告诉我。@ozw1z5rd我能看到的唯一区别是开始索引
Rank
返回从1开始的索引。休息,一切都一样。太好了!我错过了,它没有交叉积。