Sql Impala Last_值()未按预期给出结果

Sql Impala Last_值()未按预期给出结果,sql,database,impala,resampling,qsqlquery,Sql,Database,Impala,Resampling,Qsqlquery,我在Impala中有一个表,其中有Unix时间(频率为1毫秒)的时间信息和三个变量的信息,如下所示: ts Val1 Val2 Val3 1.60669E+12 7541.76 0.55964607 267.1613 1.60669E+12 7543.04 0.5607262 267.27805 1.60669E+12 7543.04 0.5607241 267.2230

我在Impala中有一个表,其中有Unix时间(频率为1毫秒)的时间信息和三个变量的信息,如下所示:

ts          Val1        Val2        Val3        
1.60669E+12 7541.76     0.55964607  267.1613        
1.60669E+12 7543.04     0.5607262   267.27805       
1.60669E+12 7543.04     0.5607241   267.22308       
1.60669E+12 7543.6797   0.56109643  267.25974       
1.60669E+12 7543.6797   0.56107396  267.30624       
1.60669E+12 7543.6797   0.56170875  267.2643    
ts                      val1_Last       Val2_Last       Val3_Last   
2020-11-29 22:30:00     7541.76         0.55964607      267.1613
2020-11-29 22:30:10     7542.3994       0.5613486       267.31238
2020-11-29 22:30:20     7542.3994       0.5601791       267.22842
2020-11-29 22:30:30     7544.32         0.56069416      267.20248
select distinct *
from (
select ts,
first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1, 
first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,
first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts , 
Val1 as Val1, 
val2 as Val2, 
Val3 as Val3
FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09'  
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt 
order by ts
我想对数据进行重采样,并获取新时间窗口的最后一个值。例如,如果我想重新采样为10秒频率,则输出应为10秒窗口的最后一个值,如下所示:

ts          Val1        Val2        Val3        
1.60669E+12 7541.76     0.55964607  267.1613        
1.60669E+12 7543.04     0.5607262   267.27805       
1.60669E+12 7543.04     0.5607241   267.22308       
1.60669E+12 7543.6797   0.56109643  267.25974       
1.60669E+12 7543.6797   0.56107396  267.30624       
1.60669E+12 7543.6797   0.56170875  267.2643    
ts                      val1_Last       Val2_Last       Val3_Last   
2020-11-29 22:30:00     7541.76         0.55964607      267.1613
2020-11-29 22:30:10     7542.3994       0.5613486       267.31238
2020-11-29 22:30:20     7542.3994       0.5601791       267.22842
2020-11-29 22:30:30     7544.32         0.56069416      267.20248
select distinct *
from (
select ts,
first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1, 
first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,
first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts , 
Val1 as Val1, 
val2 as Val2, 
Val3 as Val3
FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09'  
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt 
order by ts
要获得此结果,我运行以下查询:

select distinct *
from (
select ts,
last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1, 
last_value(Val2) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val2,
last_value(Val3) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val3 
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts , 
Val1 as Val1, 
Val2 as Val2, 
Val3 as Val3
FROM Sensor_Data.Table where unit='Unit1'  
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt 
order by ts
我在一些论坛上读到,
LAST\u VALUE()
有时会导致问题,因此我尝试使用
FIRST\u VALUE
ORDER BY DESC
实现同样的效果。查询如下:

ts          Val1        Val2        Val3        
1.60669E+12 7541.76     0.55964607  267.1613        
1.60669E+12 7543.04     0.5607262   267.27805       
1.60669E+12 7543.04     0.5607241   267.22308       
1.60669E+12 7543.6797   0.56109643  267.25974       
1.60669E+12 7543.6797   0.56107396  267.30624       
1.60669E+12 7543.6797   0.56170875  267.2643    
ts                      val1_Last       Val2_Last       Val3_Last   
2020-11-29 22:30:00     7541.76         0.55964607      267.1613
2020-11-29 22:30:10     7542.3994       0.5613486       267.31238
2020-11-29 22:30:20     7542.3994       0.5601791       267.22842
2020-11-29 22:30:30     7544.32         0.56069416      267.20248
select distinct *
from (
select ts,
first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1, 
first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,
first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts , 
Val1 as Val1, 
val2 as Val2, 
Val3 as Val3
FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09'  
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt 
order by ts
但在这两种情况下,我都没有得到预期的结果。重采样时间
ts
如预期出现(窗口为10秒),但我得到了0-9秒、10-19秒之间
Val1
Val2
Val3
的随机值。。。窗户

从逻辑上看,这个查询看起来不错,我没有发现任何问题。有人能解释一下为什么我不能用这个问题得到正确的答案吗


谢谢

问题在于这一行:

last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1, 
您按同一列进行分区和排序,
ts
——因此没有排序(或者更具体地说,按整个分区中的常量值排序会导致任意排序)。您需要保留原始ts才能进行此操作,并将其用于订购:

select ts,
        last_value(Val1) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val1, 
        last_value(Val2) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val2,
        last_value(Val3) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val3 
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts_10, 
             t.*
      FROM Sensor_Data.Table t
      WHERE unit = 'Unit1' AND
            cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00'
     ) t
顺便说一句,
last_value()
的问题在于,当您忽略窗口框架(窗口函数规范的
范围
部分)时,它会出现意外行为

问题在于,默认规范是
介于无界的前一行和当前行之间的范围
,这意味着
最后一个值()
只拾取当前行中的值


另一方面,
first_value()
可以很好地处理默认帧。但是,如果包含一个显式框架,则两者都是等效的。

@ZeeshanShareef。嗯,这是一个神秘的评论。这将解决您在问题中描述的问题。谢谢您的答复。它正在工作,但有一些问题。ts_10的值重复10次,类似地,
val1
val2
val3
的最后值也重复10次。当我尝试在
ts_10
之前放置
DISTINCT
时,我得到了一个错误,即它不能应用于分析函数,因为它也应用于
LAST_值(Val1)
LAST_值(val2)
LAST_值(Val3)
。有没有办法只在ts_10上应用DISTINCT函数,这样我就不会再出现这个错误了。@ZeeshanShareef。您仍然需要
选择distinct
。这是为了告诉你什么是错误的窗口功能。谢谢!!!它解决了这个问题。