Hadoop 配置单元:与窗口函数一起使用滞后时出现异常

Hadoop 配置单元:与窗口函数一起使用滞后时出现异常,hadoop,hive,Hadoop,Hive,我试图计算两行之间的时间差,并应用问题的解。但我有一个例外: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Failed to breakup Windowing > invocations into Groups. At least 1 group must only depend on input >

我试图计算两行之间的时间差,并应用问题的解。但我有一个例外:

> org.apache.hive.service.cli.HiveSQLException: Error while compiling
> statement: FAILED: SemanticException Failed to breakup Windowing
> invocations into Groups. At least 1 group must only depend on input
> columns. Also check for circular dependencies. Underlying error:
> Expecting left window frame boundary for function
> LAG((tok_table_or_col time), 1, 0) Window
> Spec=[PartitioningSpec=[partitionColumns=[(tok_table_or_col
> client_id)]orderColumns=[(tok_table_or_col time) ASC
> NULLS_FIRST]]window(type=ROWS, start=1 PRECEDING, end=currentRow)] as
> LAG_window_0 to be unbounded. Found : 1
HiveQL:

SELECT id, loc, LAG(time, 1, 0) OVER (PARTITION BY id, loc ORDER BY time ROWS 1 PRECEDING) - time AS response_time FROM mytable
我该如何解决这个问题?问题是什么

编辑:

样本数据:

id  loc time
0   1   1414250523591
0   1   1414250523655
1   2   1414250523655
1   2   1414250523661
1   3   1414250523661
1   3   1414250523662
我想要的是相同id和loc的行之间的时间差(总是2对)

编辑2:我还应该提到我是hadoop/hive生态系统的新手

因此,正如错误所说,窗口应该是无界的。所以我刚刚删除了ROWS子句,现在至少它正在做一些事情,但它仍然是错误的。所以我只想检查一下滞后值实际上是什么:

SELECT id, loc, LAG(time, 1) OVER (PARTITION BY id, loc ORDER BY time) AS lag_col FROM mytable
我得到这个作为输出:

id  loc lag_col
1   2   null
1   2   -1
1   3   null
1   3   -1

空值是明确的,因为我删除了默认值,但为什么是-1?时间列中的大值是否会导致某种溢出?列被定义为bigint,因此它实际上应该适合,没有问题,但在查询过程中可能会转换为int?

它在loc和LAG上有语法错误。请向我们展示样品数据和预期结果。谢谢。这实际上是混淆造成的错误