Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/postgresql/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PostgreSQL连接前聚合与连接后聚合性能差异_Sql_Postgresql_Sql Execution Plan - Fatal编程技术网

PostgreSQL连接前聚合与连接后聚合性能差异

PostgreSQL连接前聚合与连接后聚合性能差异,sql,postgresql,sql-execution-plan,Sql,Postgresql,Sql Execution Plan,我有三张桌子: create table cart ( id bigserial primary key, buyer_id bigint unique not null ); create table contact_person ( id bigserial primary key, cart_id bigint references cart (id) not null unique, phone_number jsonb,

我有三张桌子:

create table cart (
  id       bigserial primary key,
  buyer_id bigint unique not null
);


create table contact_person (
  id           bigserial primary key,
  cart_id      bigint references cart (id) not null unique,
  phone_number jsonb,
  first_name   VARCHAR,
  middle_name  VARCHAR,
  last_name    VARCHAR
);

create table cart_items (
  id      bigserial primary key,
  item_id bigint                      not null,
  cart_id bigint references cart (id) not null,
  count   int                         not null,
  unique (item_id, cart_id)
);
购物车:以1:1的比例联系相关人员 购物车:购物车项目1:N

我想按购物车id聚合所有购物车项目字段。 有两种选择:

1) 加入前聚合:

select c.id       as id,
               c.buyer_id as buyer_id,
               cp.id      as contact_id,
               cp.phone_number,
               cp.first_name,
               cp.middle_name,
               cp.last_name,
               ci.ids, ci.item_ids, ci.counts
        from cart c
               inner join contact_person cp on c.id = cp.cart_id
               left join (select cart_id, array_agg(id) as ids, array_agg(item_id) as item_ids, array_agg(count) as counts
                          from cart_items ci
                          group by cart_id) ci on ci.cart_id = c.id
        where c.buyer_id = :buyerId;
2) 加入后聚合:

select c.id       as id,
               c.buyer_id as buyer_id,
               cp.id      as contact_id,
               cp.phone_number,
               cp.first_name,
               cp.middle_name,
               cp.last_name,
               array_agg(ci.id) as ids,
               array_agg(ci.item_id) as item_ids,
               array_agg(ci.count) as counts
        from cart c
               inner join contact_person cp on c.id = cp.cart_id
               left join cart_items ci on ci.cart_id = c.id
        where c.buyer_id = :buyerId
group by c.id, cp.id;
正如Explain所示,连接后使用聚合的查询速度要快得多。 查询计划确实不同,但我无法解释为什么在聚合的情况下它们会有如此高的成本

1) 在以下日期之前的合计:

Nested Loop  (cost=108.97..141.16 rows=1 width=248)
  ->  Merge Left Join  (cost=108.82..132.96 rows=1 width=112)
        Merge Cond: (c.id = ci.cart_id)
        ->  Sort  (cost=8.18..8.19 rows=1 width=16)
              Sort Key: c.id
              ->  Index Scan using cart_buyer_id_key on cart c  (cost=0.15..8.17 rows=1 width=16)
                    Index Cond: (buyer_id = 1)
        ->  GroupAggregate  (cost=100.64..122.26 rows=200 width=104)
              Group Key: ci.cart_id
              ->  Sort  (cost=100.64..104.26 rows=1450 width=28)
                    Sort Key: ci.cart_id
                    ->  Seq Scan on cart_items ci  (cost=0.00..24.50 rows=1450 width=28)
  ->  Index Scan using contact_person_cart_id_key on contact_person cp  (cost=0.15..8.17 rows=1 width=144)
        Index Cond: (cart_id = c.id)
2) 以下情况后的合计:

GroupAggregate  (cost=41.62..41.66 rows=1 width=248)
  Group Key: c.id, cp.id
  ->  Sort  (cost=41.62..41.63 rows=1 width=172)
        Sort Key: c.id, cp.id
        ->  Nested Loop Left Join  (cost=15.33..41.61 rows=1 width=172)
              ->  Nested Loop  (cost=0.30..16.37 rows=1 width=152)
                    ->  Index Scan using cart_buyer_id_key on cart c  (cost=0.15..8.17 rows=1 width=16)
                          Index Cond: (buyer_id = 1)
                    ->  Index Scan using contact_person_cart_id_key on contact_person cp  (cost=0.15..8.17 rows=1 width=144)
                          Index Cond: (cart_id = c.id)
              ->  Bitmap Heap Scan on cart_items ci  (cost=15.03..25.17 rows=7 width=28)
                    Recheck Cond: (cart_id = c.id)
                    ->  Bitmap Index Scan on cart_items_item_id_cart_id_key  (cost=0.00..15.03 rows=7 width=0)
                          Index Cond: (cart_id = c.id)
我想在cart\u项目中添加一个cart\u id字段索引,这有效地加快了查询速度,但在第一种情况下,就像在第二种情况下一样。
您如何解释这种差异呢?

这样想:在您的before示例中,您正在连接一个表和一个“动态”视图,必须先生成它才能连接


在“after”示例中,您将连接两个表,然后进行聚合。联接本身速度更快,不需要创建、排序等。收集完所有数据后,如果不删除任何行,则聚合数据的速度应该更快。。无论如何,连接要简单得多。

[正如您自己发现的那样]对于FK
cart\u项目没有支持索引。cart\u id
-->
carts.id
(这可能会导致需要一个排序步骤)注意:查询都相对较小,基于成本的计划对于较小的数字来说效果不好。听起来合乎逻辑。但是,如果联接的数量在增长,那么出于某种原因,在联接之前进行聚合的选项速度要快得多。例如,我有一个问题,当我在加入之前使用聚合时,答案中给出了一个带有加入之前聚合的变体,他出来的速度更快。当连接的数量增加时,速度上的差距就增大了。[我会看看我是否能复制它。规划者们可以做一些奇怪的魔术,这就是为什么我一直喜欢甲骨文的“暗示”系统告诉优化器你想在某些场景中完成什么。PistGres已经很难做到这一点,通常你只需要调整语法,直到优化器做你想做的事情。但是,再一次,涉及的一些事情变得非常复杂,因为当你得到几个表时,有很多统计要考虑。