Sql 如何仅从Hiveql中的select over语句中提取最近一周?

Sql 如何仅从Hiveql中的select over语句中提取最近一周?,sql,hiveql,greatest-n-per-group,gaps-and-islands,Sql,Hiveql,Greatest N Per Group,Gaps And Islands,我需要一些帮助,我已经创建了一个查询,该查询保存一个元素针对特定度量值返回1还是0的运行总计,如果度量值提供0,则运行总计将返回0,示例如下: year_week element measure running_total 2020_40 A 1 1 2020_41 A 1 2 2020_42 A 1 3 2020_43 A 0 0 2020_44

我需要一些帮助,我已经创建了一个查询,该查询保存一个元素针对特定度量值返回1还是0的运行总计,如果度量值提供0,则运行总计将返回0,示例如下:

year_week  element  measure  running_total
2020_40    A        1        1
2020_41    A        1        2
2020_42    A        1        3
2020_43    A        0        0
2020_44    A        1        1
2020_45    A        1        2
2020_40    B        1        1
2020_41    B        1        2
2020_42    B        1        3
2020_43    B        1        4
2020_44    B        1        5
2020_45    B        1        6
使用此查询可以实现上述功能:

SELECT element,
       year_week,
       measure,
       SUM(measure) OVER (PARTITION BY element, flag_sum ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
    FROM (
      SELECT *,
          SUM(measure_flag) OVER (PARTITION BY element ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flag_sum
      FROM (
        SELECT *,
            CASE WHEN  measure = 1 THEN 0 ELSE 1 END AS measure_flag
        FROM database.table ) x ) y
这很好,也很有效——但我只想提供每个元素最近几周的数据。因此,在上述示例中,它将是:

year_week  element  measure  running_total
2020_45    A        1        2
2020_45    B        1        6
本质上,我需要保持逻辑不变,但限制返回的数据集。我尝试过这样做,但是它将结果从正确的运行总数更改为1或0


非常感谢您的帮助

您可以添加另一级别的嵌套,并使用
行编号()过滤每个
元素的最新记录

我建议:

select element, year_week, measure, running_total
from (
    select t.*,
        row_number() over(partition by element, grp order by year_week) - 1 as running_total
    from (
        select t.*,
            sum(1 - measure) over(partition by element order by year_week) as grp,
            row_number() over(partition by element order by year_week desc) as rn
        from mytable t
    ) t
) t
where rn = 1
考虑到
measure
只有
0
1
两个值,我稍微简化了查询,如示例数据所示。如果不是这样,那么:

select element, year_week, measure, running_total
from (
    select t.*,
        sum(measure) over(partition by element, grp order by year_week) as running_total
    from (
        select t.*,
            sum(case when measure = 0 then 1 else 0 end) over(partition by element order by year_week) as grp,
            row_number() over(partition by element order by year_week desc) as rn
        from mytable t
    ) t
) t
where rn = 1

这看起来很棒-我将尝试实现它,并让您知道它是如何进行的。谢谢你的帮助!