Amazon redshift 有没有办法在红移条件下计算运行总量？_Amazon Redshift_Cumulative Sum

Amazon redshift 有没有办法在红移条件下计算运行总量？

amazon-redshift

Amazon redshift 有没有办法在红移条件下计算运行总量？,amazon-redshift,cumulative-sum,Amazon Redshift,Cumulative Sum,我正在运行Redshift软件包中心的卷可用性模型。在该表中，B列显示了每小时的到达量。班次从1700点开始，到午夜结束。在这段时间里，他们可以每小时处理50K个包。C列。我有前三列的表格，我想知道是否有办法在红移时计算D列我想我知道你想要什么，但如果我没有回答你的问题，请提供更多细节。要实现连续合计，您需要使用SUM窗口函数，该函数可以对前面所有行的值求和 SUM("arrived packages") over ( order by timeinterval rows

我正在运行Redshift软件包中心的卷可用性模型。在该表中，B列显示了每小时的到达量。班次从1700点开始，到午夜结束。在这段时间里，他们可以每小时处理50K个包。C列。我有前三列的表格，我想知道是否有办法在红移时计算D列

我想我知道你想要什么，但如果我没有回答你的问题，请提供更多细节。要实现连续合计，您需要使用SUM窗口函数，该函数可以对前面所有行的值求和

SUM("arrived packages") over ( order by timeinterval rows unbounded preceding )

将为您提供到达包裹的运行总数。这不是您想要的，但是让我们先讨论一下这个重要的函数

最后一个要求是这会变得棘手。您不能将未使用的容量存入银行以备以后使用-未使用的容量将丢失。因此，所有能够处理50000个包裹的时间都将被占用。这将需要在查询和子查询两个步骤中完成-首先找到已到达包的运行总数和可用吞吐量。然后取两者之间的差异，但在任何时候都可以添加回未使用的容量。基本上采用简单的方法，并将误差作为最终调整进行考虑。否则这将成为一个递归问题，Redshift不喜欢这些问题。抱歉，下面的SQL未经测试，因此将其视为概念

select timeinterval, "arrived packages", "throughput per hour",
    run_tot_pack - run_tot_capacity + 
        sum(decode(run_tot_pack - run_tot_capacity < 0, true, run_tot_capacity - run_tot_pack, 0)) over (order by timeinterval rows unbounded preceding) as "available volume"    
from (
    select timeinterval, "arrived packages", "throughput per hour",
        sum("arrived packages") over (order by timeinterval rows unbounded preceding) as run_tot_pack,
        sum("throughput per hour") over (order by timeinterval rows unbounded preceding) as run_tot_capacity
    from <table>
)
order by timeinterval;

你是对的，我先前的回答缺少一个术语。今天我花了一些时间在集群上，并准备了一个测试用例。下面是修改后的SQL和setup语句。它需要一个新的术语，这是一个窗口函数，因为它们不能嵌套另一个选择层。我希望这个例子能有所帮助，并且我知道在非递归数据库上解决递归问题是很困难的

drop table if exists package_volume;

create table package_volume (
        A timestamp encode zstd,
        B int encode zstd,
        C int encode zstd);

insert into package_volume values
('2020-06-26 13:00', 0, 0),
('2020-06-26 14:00', 3500, 0),
('2020-06-26 15:00', 3200, 0),
('2020-06-26 16:00', 6500, 0),
('2020-06-26 17:00', 5200, 50000),
('2020-06-26 18:00', 51000, 50000),
('2020-06-26 19:00', 120000, 50000),
('2020-06-26 20:00', 30000, 50000),
('2020-06-26 21:00', 40000, 50000),
('2020-06-26 22:00', 15000, 50000),
('2020-06-26 23:00', 5500, 50000),
('2020-06-27 00:00', 0, 0);

commit;

select A, B, C, 
        run_tot_pack - run_tot_capacity + sum(unrealized_capacity) over (order by A rows unbounded preceding) as available_volume    
from (
    select A, B, C, run_tot_pack, run_tot_capacity, 
        decode(unrealized_capacity - max(unrealized_capacity) over (order by A rows between unbounded preceding and 1 preceding) < 0, true, 0, 
            unrealized_capacity - max(unrealized_capacity) over (order by A rows between unbounded preceding and 1 preceding)) as unrealized_capacity
    from (
                    select A, B, C,
                        sum(B) over (order by A rows unbounded preceding) as run_tot_pack,
                        sum(C) over (order by A rows unbounded preceding) as run_tot_capacity,
                        decode(run_tot_pack - run_tot_capacity < 0, true, run_tot_capacity - run_tot_pack, 0) as unrealized_capacity
            from package_volume
        )
)
order by A;

欢迎来到stackoverflow=请为您的问题添加更多细节，以便社区其他人更容易帮助您：嗨，Victor，让问题更清楚。我想在这里计算D列。例如D8=IFB8+D7-C8Hi账单，谢谢您的回复。我在红移之外尝试了这个逻辑，我认为这接近于我试图实现的目标。代码可以正确地生成前5行，但它在D列的第一个零之后开始生成错误的数字。哇！这很聪明。这对我很有效，谢谢你，比尔。我真的很感谢你的帮助。