Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 基于周范围(impala)对窗口上的列值求和_Sql_Datetime_Window Functions_Impala - Fatal编程技术网

Sql 基于周范围(impala)对窗口上的列值求和

Sql 基于周范围(impala)对窗口上的列值求和,sql,datetime,window-functions,impala,Sql,Datetime,Window Functions,Impala,下表所示: client_id date connections --------------------------------------- 121438297 2018-01-03 0 121438297 2018-01-08 1 121438297 2018-01-10 3 121438297 2018-01-12 1 121438297 2018-01-19 7 363863811 20

下表所示:

client_id   date            connections
---------------------------------------
121438297   2018-01-03      0
121438297   2018-01-08      1
121438297   2018-01-10      3
121438297   2018-01-12      1
121438297   2018-01-19      7
363863811   2018-01-18      0
363863811   2018-01-30      5
363863811   2018-02-01      4
363863811   2018-02-10      0
我正在寻找一种有效的方法,将当前行(当前行包含在总和中)之后6天内发生的连接数相加,并按客户端id进行分区,这将导致:

client_id   date            connections     connections_within_6_days
---------------------------------------------------------------------
121438297   2018-01-03      0               1        
121438297   2018-01-08      1               5     
121438297   2018-01-10      3               4     
121438297   2018-01-12      1               1                       
121438297   2018-01-19      7               7
363863811   2018-01-18      0               0
363863811   2018-01-30      5               9
363863811   2018-02-01      4               4
363863811   2018-02-10      0               0
问题:

  • 我不想添加所有缺少的日期,然后执行滑动窗口来计算下面的7行,因为我的表已经非常大了

  • 我正在使用Impala,不支持间隔“7”天后与当前行之间的
    范围



  • 编辑:考虑到我需要将窗口大小更改为更大的数字(例如30天以上)

    这回答了问题的原始版本

    黑斑羚不完全支持介于
    之间的
    范围。不幸的是,这并没有留下很多选择。一种是使用带有大量显式逻辑的
    lag()

    select t.*,
           ( (case when lag(date, 6) over (partition by client_id order by date) = date - interval 6 day
                   then lag(connections, 6) over (partition by client_id order by date)
                   else 0
              end) +
             (case when lag(date, 5) over (partition by client_id order by date) = date - interval 6 day
                   then lag(connections, 5) over (partition by client_id order by date)
                   else 0
              end) +
             (case when lag(date, 4) over (partition by client_id order by date) = date - interval 6 day
                   then lag(connections, 4) over (partition by client_id order by date)
                   else 0
              end) +
             (case when lag(date, 3) over (partition by client_id order by date) = date - interval 6 day
                   then lag(connections, 3) over (partition by client_id order by date)
                   else 0
              end) +
             (case when lag(date, 2) over (partition by client_id order by date) = date - interval 6 day
                   then lag(connections, 2) over (partition by client_id order by date)
                   else 0
              end) +
             (case when lag(date, 1) over (partition by client_id order by date) = date - interval 6 day
                   then lag(connections, 1) over (partition by client_id order by date)
                   else 0
              end) +
             connections
            ) as connections_within_6_days         
    from t;
    

    不幸的是,这不能很好地概括。如果你想问一个范围广泛的问题,你可能想问另一个问题。

    谢谢你@gordon linoff,我对我的答案进行了编辑,以考虑其中的微妙之处。虽然我还不能在黑斑羚身上繁衍后代,但我发现这可能会有所帮助。@nicholas。正如这个问题所建议的,你应该问一个新问题。你原来的问题得到了回答。