Hive 使用last_值窗口函数时,配置单元中出现语义异常错误

Hive 使用last_值窗口函数时,配置单元中出现语义异常错误,hive,hiveql,window-functions,Hive,Hiveql,Window Functions,我有一个包含以下数据的表格: dt device id count 2018-10-05 computer 7541185957382 6 2018-10-20 computer 7541185957382 3 2018-10-14 computer 7553187775734 6 2018-10-17 computer 7553187775734 10 2018-10-21 computer 7553187775734 2 20

我有一个包含以下数据的表格:

dt  device  id  count
2018-10-05  computer    7541185957382   6
2018-10-20  computer    7541185957382   3
2018-10-14  computer    7553187775734   6
2018-10-17  computer    7553187775734   10
2018-10-21  computer    7553187775734   2
2018-10-22  computer    7549187067178   5
2018-10-20  computer    7553187757256   3
2018-10-11  computer    7549187067178   10
我想获得每个
id
的最后一个和第一个
dt
。因此,我使用了窗口函数first_value和last_value,如下所示:

select id,last_value(dt) over (partition by id order by dt) last_dt
from table
order by id
;
但我得到了这个错误:

FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Primitve type DATE not supported in Value Boundary expression

我无法诊断问题,我将非常感谢您的帮助。

如果您在查询中添加行之间的子句,那么您的查询将正常工作

hive> select id,last_value(dt) over (partition by id order by dt 
      rows between unbounded preceding and unbounded following) last_dt 
      from table order by id;
结果:

+----------------+-------------+--+
|       id       |   last_dt   |
+----------------+-------------+--+
| 7541185957382  | 2018-10-20  |
| 7541185957382  | 2018-10-20  |
| 7549187067178  | 2018-10-22  |
| 7549187067178  | 2018-10-22  |
| 7553187757256  | 2018-10-20  |
| 7553187775734  | 2018-10-21  |
| 7553187775734  | 2018-10-21  |
| 7553187775734  | 2018-10-21  |
+----------------+-------------+--+
+----------------+-------------+--+
|       id       |     dt      |
+----------------+-------------+--+
| 7541185957382  | 2018-10-20  |
| 7553187757256  | 2018-10-20  |
| 7553187775734  | 2018-10-21  |
| 7549187067178  | 2018-10-22  |
+----------------+-------------+--+
关于原语类型支持,已在配置单元中修复。2.1.0

更新:

+----------------+-------------+--+
|       id       |   last_dt   |
+----------------+-------------+--+
| 7541185957382  | 2018-10-20  |
| 7541185957382  | 2018-10-20  |
| 7549187067178  | 2018-10-22  |
| 7549187067178  | 2018-10-22  |
| 7553187757256  | 2018-10-20  |
| 7553187775734  | 2018-10-21  |
| 7553187775734  | 2018-10-21  |
| 7553187775734  | 2018-10-21  |
+----------------+-------------+--+
+----------------+-------------+--+
|       id       |     dt      |
+----------------+-------------+--+
| 7541185957382  | 2018-10-20  |
| 7553187757256  | 2018-10-20  |
| 7553187775734  | 2018-10-21  |
| 7549187067178  | 2018-10-22  |
+----------------+-------------+--+
对于不同的记录,您可以使用行编号窗口功能,从结果集中仅过滤出
第一行

hive> select id,last_dt from 
          (select id,last_value(dt) over (partition by id order by dt 
              rows between unbounded preceding and unbounded following) last_dt,
              ROW_NUMBER() over (partition by id order by dt)rn 
              from so )t 
           where t.rn=1;
结果:

+----------------+-------------+--+
|       id       |   last_dt   |
+----------------+-------------+--+
| 7541185957382  | 2018-10-20  |
| 7541185957382  | 2018-10-20  |
| 7549187067178  | 2018-10-22  |
| 7549187067178  | 2018-10-22  |
| 7553187757256  | 2018-10-20  |
| 7553187775734  | 2018-10-21  |
| 7553187775734  | 2018-10-21  |
| 7553187775734  | 2018-10-21  |
+----------------+-------------+--+
+----------------+-------------+--+
|       id       |     dt      |
+----------------+-------------+--+
| 7541185957382  | 2018-10-20  |
| 7553187757256  | 2018-10-20  |
| 7553187775734  | 2018-10-21  |
| 7549187067178  | 2018-10-22  |
+----------------+-------------+--+

谢谢它起作用了。有一个问题,是否有一种方法可以只获取
不同的行
,而不在顶部写入
select distinct
语句。从上面的结果可以看出,存在重复项。如何删除它们并一次只获取唯一值,而不使用子查询。@Raj,请检查我编辑的答案的更新部分,更简单的方法是在顶部添加
distinct
,但我们也可以在子查询中使用
行数
窗口功能。