Sql 条件聚合-每个键一次_Sql_Oracle

Sql 条件聚合-每个键一次

sql oracle

Sql 条件聚合-每个键一次,sql,oracle,Sql,Oracle,我有一个聚合问题，最好用一些示例数据来描述下面是带有传输的数据集，由trp\u no标识。每一次这样的运输都装在一个集装箱里。一个集装箱可以装载多个这样的运输工具，在本例中，任何运输工具只能装载在一个集装箱中 TRP_NO TRANSPORT_VOLUME COUNTRY CONTAINER_ID CONTAINER_MAX ------ ---------------- ------- ------------ ------------- 1 10

我有一个聚合问题，最好用一些示例数据来描述

下面是带有传输的数据集，由

trp\u no

标识。每一次这样的运输都装在一个集装箱里。一个集装箱可以装载多个这样的运输工具，在本例中，任何运输工具只能装载在一个集装箱中

TRP_NO TRANSPORT_VOLUME COUNTRY CONTAINER_ID CONTAINER_MAX
------ ---------------- ------- ------------ -------------
     1               10   SE         A            80
     2               20   SE         A            80
     3               30   SE         A            80

数据集中存在以下键（或函数依赖项）：

trp_no       -> {transport_volume, country, container_id}
container_id -> {container_max}

我想计算每个国家的填充率，计算为运输量除以容量。转换为SQL后，将变成：

with sample_data as(
   select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual
)  
select country
      ,sum(transport_volume) / sum(container_max)
  from sample_data
 group 
    by country;

…返回（10+20+30）/（80+80+80）=25%。这不是我想要的，因为所有的运输都使用相同的集装箱id，我的查询将容量计算了三倍

我想要的结果是（10+20+30）/80=75%。因此，我只想对组中的每个容器id求一次容器最大值之和

关于如何修复查询有什么想法吗？

我尝试了以下方法：

with sample_data as(
   select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual
)  
select country
      ,sum(transport_volume) / container_max
  from sample_data
 group 
    by country, container_max;

结果是意料之中的

ps：一些好心人记得我们也对容器id进行了分组，这在本例中不会影响结果，但在其他情况下可能需要：-）

这种方法，虽然其他方法更简单，但使用分析函数。我只使用这种方法进行编辑，因为虽然jonearle的回答给出了正确的输出，但您回答说您需要一种使用分析函数的方法。这种方法使用解析函数

但是，如果不在查询中添加第二层，则不能将聚合函数或group by子句与分析函数一起使用（这个想法本身没有意义）。根据您希望运行的其他类似查询的不同，这对于模板查询来说可能更容易，但是如果不知道您正在运行的其他类似查询，则很难判断

with sample_data as(
    select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
)
, sub as(
select x.*, sum(x.cont_mx_n) over (partition by country order by country, container_id, trp_no) as cont_mx
from(
select country
      , container_id
      , trp_no
      , sum(transport_volume) over (partition by country order by country, container_id, trp_no) as transp_vol
      , case when lead(container_id,1) over (partition by country order by country, container_id, trp_no) = container_id
             then null
             else container_max end as cont_mx_n
      , row_number() over (partition by country order by country, container_id, trp_no) as maxchk
  from sample_data
order by country, container_id, trp_no) x)
select country, transp_vol / cont_mx as rate
from sub y
where y.maxchk = (select max(x.maxchk) from sub x where x.country = y.country);

上述结果如下：

AU  0.9
SE  0.666666666666667

我添加了更多的示例数据，用于说明解决该问题的查询中的一个小修复-

with sample_data as(
   select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
   select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
   select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
   select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
   select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
   select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
   select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
)  
select country
      ,sum(transport_volume / container_max) -- Note the change here
  from sample_data
 group 
    by country;

输出：

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9

COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9

编辑：

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9

COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9

正如我看到的示例数据，我认为您需要在数据库中进行一些规范化。容器的列和运输行程的列应位于单独的表中，如下所示：\

TABLE CONTAINER (
    container_id        VARCHAR2 / INTEGER,
    container_max       INTEGER,
    country             VARCHAR2
)

TABLE trip (
    trp_no              INTEGER,
    transport_volume    INTEGER,
    container_id        VARCHAR2 / INTEGER REFERENCES container.container_id
)

编辑2:

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9

COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9

如果您想根据集装箱的容量具体汇总运输量，可以使用类似以下查询的方法（使用相同的样本数据表

sample\u data

）：

输出：

COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE                                     1.35
AU                                       .9

COUNTRY        COL1
------- -----------
SE      0.666666667
AU              0.9

这使用了Rachcha更大的样本集，我认为这对于真正测试这个问题是必要的

with sample_data as(
    select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
    select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
    select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
    select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
),
country_container_sum as
(
    select country, sum(container_max) sum_container_max
    from
    (
        select distinct country, container_id, container_max
        from sample_data
    )
    group by country
),
country_transport_volume_sum as
(
    select country, sum(transport_volume) sum_transport_volume
    from sample_data
    group by country
)
select country, sum_transport_volume / sum_container_max rate
from country_container_sum
join country_transport_volume_sum using (country);

结果:

COUNTRY   RATE
-------   ----
SE        0.666666666666667
AU        0.9

谢谢我必须将此语句包装在另一个SELECT/Group by中。尝试将一个“A”更改为“B”，您就会明白我的意思。@Ronnis如果您仅将3行中的一行更改为B，则您正在添加另一个容器，因此您希望在选择列表中显示容器，仅此而已。这样您就可以看到哪一行反映了哪一个容器。这就是你的意思吗？我不知道为什么查询需要第二层。如果我不明白你在说什么，请编辑另一个示例行集来说明。ThanksI更新了查询，如果这是你的意思。我以为你是说你认为你需要在select语句的上面加一个select语句，我不相信这是真的。@ShWiVeL，上面的查询对每个容器重复国家一次。因此，为了得到哪个国家->填充率的结果集，我需要将此查询包装在另一个查询中。@Ronnis上面使用分析函数按国家计算填充率。（编辑）我预计SE为0.667:（10+20+30+10+20+30）/（80+100）。很好，您添加了更多示例数据，这将有助于澄清要求。@jonearles:编辑了我的答案，请检查。@Rachcha，country=SE的第一个输出是错误的。我会尽快检查你的第二次编辑。谢谢关于规范化的评论：真实的模型有单独的实体，但是因为你必须连接它们，重复计算的问题是相同的，不管规范化如何：）@Ronnis-好吧，我想你可以通过简单地用表名替换子查询，从我的编辑2部分派生代码。我仍然会说，如果您将示例数据作为两个单独的表发布，那么会更加容易和准确。无论如何，你知道该怎么做，如果有问题，请告诉我们。我认为，这是给定问题的最佳解决方案，前提是OP给出的样本数据的非规范化性质。谢谢，这给出了期望的结果。我希望我能找到一个解析函数的解决方案，让我使用相同的SUM（）构造，并简单地更改分组依据。根据您建议的构造，我必须重写查询的多个部分，具体取决于我选择的分组方式。我仍然认为这是一个简单性/性能权衡的好答案。