Sql Impala Last_值()未按预期给出结果
我在Impala中有一个表,其中有Unix时间(频率为1毫秒)的时间信息和三个变量的信息,如下所示:Sql Impala Last_值()未按预期给出结果,sql,database,impala,resampling,qsqlquery,Sql,Database,Impala,Resampling,Qsqlquery,我在Impala中有一个表,其中有Unix时间(频率为1毫秒)的时间信息和三个变量的信息,如下所示: ts Val1 Val2 Val3 1.60669E+12 7541.76 0.55964607 267.1613 1.60669E+12 7543.04 0.5607262 267.27805 1.60669E+12 7543.04 0.5607241 267.2230
ts Val1 Val2 Val3
1.60669E+12 7541.76 0.55964607 267.1613
1.60669E+12 7543.04 0.5607262 267.27805
1.60669E+12 7543.04 0.5607241 267.22308
1.60669E+12 7543.6797 0.56109643 267.25974
1.60669E+12 7543.6797 0.56107396 267.30624
1.60669E+12 7543.6797 0.56170875 267.2643
ts val1_Last Val2_Last Val3_Last
2020-11-29 22:30:00 7541.76 0.55964607 267.1613
2020-11-29 22:30:10 7542.3994 0.5613486 267.31238
2020-11-29 22:30:20 7542.3994 0.5601791 267.22842
2020-11-29 22:30:30 7544.32 0.56069416 267.20248
select distinct *
from (
select ts,
first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1,
first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,
first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts ,
Val1 as Val1,
val2 as Val2,
Val3 as Val3
FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09'
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt
order by ts
我想对数据进行重采样,并获取新时间窗口的最后一个值。例如,如果我想重新采样为10秒频率,则输出应为10秒窗口的最后一个值,如下所示:
ts Val1 Val2 Val3
1.60669E+12 7541.76 0.55964607 267.1613
1.60669E+12 7543.04 0.5607262 267.27805
1.60669E+12 7543.04 0.5607241 267.22308
1.60669E+12 7543.6797 0.56109643 267.25974
1.60669E+12 7543.6797 0.56107396 267.30624
1.60669E+12 7543.6797 0.56170875 267.2643
ts val1_Last Val2_Last Val3_Last
2020-11-29 22:30:00 7541.76 0.55964607 267.1613
2020-11-29 22:30:10 7542.3994 0.5613486 267.31238
2020-11-29 22:30:20 7542.3994 0.5601791 267.22842
2020-11-29 22:30:30 7544.32 0.56069416 267.20248
select distinct *
from (
select ts,
first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1,
first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,
first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts ,
Val1 as Val1,
val2 as Val2,
Val3 as Val3
FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09'
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt
order by ts
要获得此结果,我运行以下查询:
select distinct *
from (
select ts,
last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1,
last_value(Val2) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val2,
last_value(Val3) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts ,
Val1 as Val1,
Val2 as Val2,
Val3 as Val3
FROM Sensor_Data.Table where unit='Unit1'
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt
order by ts
我在一些论坛上读到,LAST\u VALUE()
有时会导致问题,因此我尝试使用FIRST\u VALUE
和ORDER BY DESC
实现同样的效果。查询如下:
ts Val1 Val2 Val3
1.60669E+12 7541.76 0.55964607 267.1613
1.60669E+12 7543.04 0.5607262 267.27805
1.60669E+12 7543.04 0.5607241 267.22308
1.60669E+12 7543.6797 0.56109643 267.25974
1.60669E+12 7543.6797 0.56107396 267.30624
1.60669E+12 7543.6797 0.56170875 267.2643
ts val1_Last Val2_Last Val3_Last
2020-11-29 22:30:00 7541.76 0.55964607 267.1613
2020-11-29 22:30:10 7542.3994 0.5613486 267.31238
2020-11-29 22:30:20 7542.3994 0.5601791 267.22842
2020-11-29 22:30:30 7544.32 0.56069416 267.20248
select distinct *
from (
select ts,
first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1,
first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,
first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts ,
Val1 as Val1,
val2 as Val2,
Val3 as Val3
FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09'
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt
order by ts
但在这两种情况下,我都没有得到预期的结果。重采样时间ts
如预期出现(窗口为10秒),但我得到了0-9秒、10-19秒之间Val1
、Val2
和Val3
的随机值。。。窗户
从逻辑上看,这个查询看起来不错,我没有发现任何问题。有人能解释一下为什么我不能用这个问题得到正确的答案吗
谢谢 问题在于这一行:
last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1,
您按同一列进行分区和排序,ts
——因此没有排序(或者更具体地说,按整个分区中的常量值排序会导致任意排序)。您需要保留原始ts才能进行此操作,并将其用于订购:
select ts,
last_value(Val1) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val1,
last_value(Val2) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val2,
last_value(Val3) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts_10,
t.*
FROM Sensor_Data.Table t
WHERE unit = 'Unit1' AND
cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00'
) t
顺便说一句,last_value()
的问题在于,当您忽略窗口框架(窗口函数规范的行
或范围
部分)时,它会出现意外行为
问题在于,默认规范是介于无界的前一行和当前行之间的范围
,这意味着最后一个值()
只拾取当前行中的值
另一方面,
first_value()
可以很好地处理默认帧。但是,如果包含一个显式框架,则两者都是等效的。@ZeeshanShareef。嗯,这是一个神秘的评论。这将解决您在问题中描述的问题。谢谢您的答复。它正在工作,但有一些问题。ts_10的值重复10次,类似地,val1
,val2
和val3
的最后值也重复10次。当我尝试在ts_10
之前放置DISTINCT
时,我得到了一个错误,即它不能应用于分析函数,因为它也应用于LAST_值(Val1)
,LAST_值(val2)
和LAST_值(Val3)
。有没有办法只在ts_10上应用DISTINCT函数,这样我就不会再出现这个错误了。@ZeeshanShareef。您仍然需要选择distinct
。这是为了告诉你什么是错误的窗口功能。谢谢!!!它解决了这个问题。