Sql 条件聚合-每个键一次

Sql 条件聚合-每个键一次,sql,oracle,Sql,Oracle,我有一个聚合问题,最好用一些示例数据来描述 下面是带有传输的数据集,由trp\u no标识。每一次这样的运输都装在一个集装箱里。一个集装箱可以装载多个这样的运输工具,在本例中,任何运输工具只能装载在一个集装箱中 TRP_NO TRANSPORT_VOLUME COUNTRY CONTAINER_ID CONTAINER_MAX ------ ---------------- ------- ------------ ------------- 1 10

我有一个聚合问题,最好用一些示例数据来描述

下面是带有传输的数据集,由
trp\u no
标识。每一次这样的运输都装在一个集装箱里。一个集装箱可以装载多个这样的运输工具,在本例中,任何运输工具只能装载在一个集装箱中

TRP_NO TRANSPORT_VOLUME COUNTRY CONTAINER_ID CONTAINER_MAX
------ ---------------- ------- ------------ -------------
     1               10   SE         A            80
     2               20   SE         A            80
     3               30   SE         A            80
数据集中存在以下键(或函数依赖项):

trp_no       -> {transport_volume, country, container_id}
container_id -> {container_max}
我想计算每个国家的填充率,计算为运输量除以容量。转换为SQL后,将变成:

with sample_data as(
   select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual
)  
select country
      ,sum(transport_volume) / sum(container_max)
  from sample_data
 group 
    by country; 
…返回(10+20+30)/(80+80+80)=25%。这不是我想要的,因为所有的运输都使用相同的集装箱id,我的查询将容量计算了三倍

我想要的结果是(10+20+30)/80=75%。 因此,我只想对组中的每个容器id求一次容器最大值之和

关于如何修复查询有什么想法吗?

我尝试了以下方法:

with sample_data as(
   select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual
)  
select country
      ,sum(transport_volume) / container_max
  from sample_data
 group 
    by country, container_max; 
结果是意料之中的


ps:一些好心人记得我们也对容器id进行了分组,这在本例中不会影响结果,但在其他情况下可能需要:-)

这种方法,虽然其他方法更简单,但使用分析函数。我只使用这种方法进行编辑,因为虽然jonearle的回答给出了正确的输出,但您回答说您需要一种使用分析函数的方法。这种方法使用解析函数

但是,如果不在查询中添加第二层,则不能将聚合函数或group by子句与分析函数一起使用(这个想法本身没有意义)。根据您希望运行的其他类似查询的不同,这对于模板查询来说可能更容易,但是如果不知道您正在运行的其他类似查询,则很难判断

with sample_data as(
    select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
)
, sub as(
select x.*, sum(x.cont_mx_n) over (partition by country order by country, container_id, trp_no) as cont_mx
from(
select country
      , container_id
      , trp_no
      , sum(transport_volume) over (partition by country order by country, container_id, trp_no) as transp_vol
      , case when lead(container_id,1) over (partition by country order by country, container_id, trp_no) = container_id
             then null
             else container_max end as cont_mx_n
      , row_number() over (partition by country order by country, container_id, trp_no) as maxchk
  from sample_data
order by country, container_id, trp_no) x)
select country, transp_vol / cont_mx as rate
from sub y
where y.maxchk = (select max(x.maxchk) from sub x where x.country = y.country);
上述结果如下:

AU  0.9
SE  0.666666666666667

我添加了更多的示例数据,用于说明解决该问题的查询中的一个小修复-

with sample_data as(
   select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
   select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
   select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
   select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
   select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
   select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
)  
select country
      ,sum(transport_volume / container_max) -- Note the change here
  from sample_data
 group 
    by country; 
输出:

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9
COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9
编辑:

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9
COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9
正如我看到的示例数据,我认为您需要在数据库中进行一些规范化。容器的列和运输行程的列应位于单独的表中,如下所示:\

TABLE CONTAINER (
    container_id        VARCHAR2 / INTEGER,
    container_max       INTEGER,
    country             VARCHAR2
)

TABLE trip (
    trp_no              INTEGER,
    transport_volume    INTEGER,
    container_id        VARCHAR2 / INTEGER REFERENCES container.container_id
)
编辑2:

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9
COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9
如果您想根据集装箱的容量具体汇总运输量,可以使用类似以下查询的方法(使用相同的样本数据表
sample\u data
):

输出:

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9
COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9

这使用了Rachcha更大的样本集,我认为这对于真正测试这个问题是必要的

with sample_data as(
    select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
),
country_container_sum as
(
    select country, sum(container_max) sum_container_max
    from
    (
        select distinct country, container_id, container_max
        from sample_data
    )
    group by country
),
country_transport_volume_sum as
(
    select country, sum(transport_volume) sum_transport_volume
    from sample_data
    group by country
)
select country, sum_transport_volume / sum_container_max rate
from country_container_sum
join country_transport_volume_sum using (country);
结果:

COUNTRY   RATE
-------   ----
SE        0.666666666666667
AU        0.9

谢谢我必须将此语句包装在另一个SELECT/Group by中。尝试将一个“A”更改为“B”,您就会明白我的意思。@Ronnis如果您仅将3行中的一行更改为B,则您正在添加另一个容器,因此您希望在选择列表中显示容器,仅此而已。这样您就可以看到哪一行反映了哪一个容器。这就是你的意思吗?我不知道为什么查询需要第二层。如果我不明白你在说什么,请编辑另一个示例行集来说明。ThanksI更新了查询,如果这是你的意思。我以为你是说你认为你需要在select语句的上面加一个select语句,我不相信这是真的。@ShWiVeL,上面的查询对每个容器重复国家一次。因此,为了得到哪个国家->填充率的结果集,我需要将此查询包装在另一个查询中。@Ronnis上面使用分析函数按国家计算填充率。(编辑)我预计SE为0.667:(10+20+30+10+20+30)/(80+100)。很好,您添加了更多示例数据,这将有助于澄清要求。@jonearles:编辑了我的答案,请检查。@Rachcha,country=SE的第一个输出是错误的。我会尽快检查你的第二次编辑。谢谢关于规范化的评论:真实的模型有单独的实体,但是因为你必须连接它们,重复计算的问题是相同的,不管规范化如何:)@Ronnis-好吧,我想你可以通过简单地用表名替换子查询,从我的编辑2部分派生代码。我仍然会说,如果您将示例数据作为两个单独的表发布,那么会更加容易和准确。无论如何,你知道该怎么做,如果有问题,请告诉我们。我认为,这是给定问题的最佳解决方案,前提是OP给出的样本数据的非规范化性质。谢谢,这给出了期望的结果。我希望我能找到一个解析函数的解决方案,让我使用相同的SUM()构造,并简单地更改分组依据。根据您建议的构造,我必须重写查询的多个部分,具体取决于我选择的分组方式。我仍然认为这是一个简单性/性能权衡的好答案。