使用函数扫描和嵌套循环的缓慢和错误估计优化PostgreSQL查询

使用函数扫描和嵌套循环的缓慢和错误估计优化PostgreSQL查询,sql,postgresql,Sql,Postgresql,我有一个包含4个表的数据库 CREATE TABLE definition ( id bigserial PRIMARY KEY, public_id uuid UNIQUE NOT NULL DEFAULT uuid_generate_v4(), name varchar(128) NOT NULL, type definition_type NOT NULL, cr

我有一个包含4个表的数据库

CREATE TABLE definition (
   id                bigserial PRIMARY KEY,
   public_id         uuid UNIQUE NOT NULL DEFAULT uuid_generate_v4(),
   name              varchar(128) NOT NULL,
   type              definition_type NOT NULL,
   created_at        timestamptz NOT NULL DEFAULT current_timestamp,
   updated_at        timestamptz NOT NULL DEFAULT current_timestamp,
   created_by_token  varchar(128) NOT NULL CHECK (created_by_token <> ''),
   created_by_user   varchar(128)
);

CREATE INDEX ON definition (type);
CREATE INDEX ON definition (created_at);
CREATE INDEX ON definition (updated_at);
CREATE UNIQUE INDEX unique_definition_name ON definition (name);
CREATE INDEX definition_lower_token_idx ON definition (lower(created_by_token));
CREATE INDEX definition_lower_user_idx ON definition (lower(created_by_user));

CREATE TABLE definition_translation (
   id             bigserial PRIMARY KEY,
   language       varchar(35) NOT NULL,
   definition_id  uuid NOT NULL REFERENCES definition(public_id) ON DELETE CASCADE,
   created_at     timestamptz NOT NULL
   updated_at     timestamptz NOT NULL
   title          varchar(64) NOT NULL CHECK (title <> ''),
   template       text NOT NULL CHECK (template <> ''),
   redirect_to    text
);

ALTER TABLE definition_translation
   ADD CONSTRAINT unique_translation_by_definition_per_language UNIQUE (definition_id, language);

CREATE INDEX ON definition_translation (language);
CREATE INDEX ON definition_translation (definition_id);
CREATE INDEX ON definition_translation (created_at);
CREATE INDEX ON definition_translation (updated_at);

CREATE INDEX ON definition_translation USING gin (title gin_trgm_ops);

CREATE OR REPLACE FUNCTION get_definition_translation(did uuid, lang varchar, default_lang varchar)
   RETURNS TABLE (
      title       varchar(64),
      template    text,
      redirect_to text
   ) AS $$
BEGIN
   RETURN QUERY
      SELECT dt.title, dt.template, dt.redirect_to
      FROM definition_translation dt JOIN
         (values(1,lang),(2,default_lang)) AS lng(ord,code) ON (dt.language = lng.code)
      WHERE dt.definition_id = did
      ORDER BY lng.ord
      LIMIT 1;
END
$$ LANGUAGE plpgsql;

CREATE TABLE broadcast (
   id                bigserial PRIMARY KEY,
   public_id         uuid UNIQUE NOT NULL DEFAULT uuid_generate_v4(),
   definition_id     uuid NOT NULL REFERENCES definition(public_id) ON DELETE CASCADE,
   created_at        timestamptz NOT NULL DEFAULT current_timestamp,
   begin_at          timestamptz NOT NULL,
   expire_at         timestamptz,
   created_by_token  varchar(128) NOT NULL CHECK (created_by_token <> ''),
   created_by_user   varchar(128),
   replace_values    jsonb
);

ALTER TABLE broadcast
   ADD CONSTRAINT unique_broadcast_by_definition UNIQUE (definition_id);

CREATE INDEX ON broadcast (created_at);
CREATE INDEX ON broadcast (begin_at);
CREATE INDEX ON broadcast (expire_at);
CREATE INDEX broadcast_lower_token_idx ON broadcast (lower(created_by_token));
CREATE INDEX broadcast_lower_user_idx ON broadcast (lower(created_by_user));

CREATE TABLE broadcast_acknowledgement (
   id             bigserial PRIMARY KEY,
   public_id      uuid UNIQUE NOT NULL DEFAULT uuid_generate_v4(),
   broadcast_id   uuid NOT NULL REFERENCES broadcast(public_id) ON DELETE CASCADE,
   account        citext NOT NULL,
   created_at     timestamptz NOT NULL DEFAULT current_timestamp
);

ALTER TABLE broadcast_acknowledgement
   ADD CONSTRAINT unique_ack_by_broadcast_per_account UNIQUE (broadcast_id, account);

CREATE INDEX ON broadcast_acknowledgement (account);
CREATE INDEX ON broadcast_acknowledgement (created_at);
查询使用视图获取帐户确认的所有广播,并使用行筛选主查询中选定的广播。 在JOIN子句中,
get\u definition\u translation
负责返回广播引用的定义的翻译。它接受语言的主值和回退值

WITH acknowledged AS (
    SELECT ba.broadcast_id 
    FROM broadcast_acknowledgement AS ba
    WHERE ba.account = 'toto'
) SELECT
    d.type,
    dt.title,
    dt.template,
    COALESCE(dt.redirect_to, '') AS redirect_to,
    src.replace_values,
    src.begin_at,
    'broadcast:' || src.public_id AS id
  FROM broadcast AS src
  JOIN definition AS d ON src.definition_id = d.public_id, get_definition_translation(d.public_id , 'fr_FR', 'en_US') AS dt
  WHERE (src.public_id NOT IN (SELECT broadcast_id FROM acknowledged)
    AND src.begin_at <= current_timestamp
    AND src.expire_at > current_timestamp);
查询计划如下所示:

 Nested Loop  (cost=851.39..7815.32 rows=250000 width=149) (actual time=1.400..12.740 rows=288 loops=1)
   Output: d.type, dt.title, dt.template, COALESCE(dt.redirect_to, ''::text), src.replace_values, src.begin_at, ('broadcast:'::text || (src.public_id)::text)
   CTE acknowledged
     ->  Bitmap Heap Scan on public.broadcast_acknowledgement ba  (cost=6.18..823.06 rows=209 width=16) (actual time=0.159..0.495 rows=212 loops=1)
           Output: ba.broadcast_id
           Recheck Cond: (ba.account = 'toto'::citext)
           Heap Blocks: exact=210
           ->  Bitmap Index Scan on broadcast_acknowledgement_account_idx  (cost=0.00..6.13 rows=209 width=0) (actual time=0.136..0.136 rows=212 loops=1)
                 Index Cond: (ba.account = 'toto'::citext)
   ->  Hash Join  (cost=28.08..117.01 rows=250 width=53) (actual time=1.284..3.521 rows=288 loops=1)
         Output: src.replace_values, src.begin_at, src.public_id, d.type, d.public_id
         Hash Cond: (d.public_id = src.definition_id)
         ->  Seq Scan on public.definition d  (cost=0.00..77.04 rows=2504 width=20) (actual time=0.010..1.054 rows=2504 loops=1)
               Output: d.id, d.public_id, d.name, d.type, d.created_at, d.updated_at, d.created_by_token, d.created_by_user
         ->  Hash  (cost=24.95..24.95 rows=250 width=49) (actual time=1.253..1.253 rows=288 loops=1)
               Output: src.replace_values, src.begin_at, src.public_id, src.definition_id
               Buckets: 1024  Batches: 1  Memory Usage: 33kB
               ->  Seq Scan on public.broadcast src  (cost=4.70..24.95 rows=250 width=49) (actual time=0.834..1.098 rows=288 loops=1)
                     Output: src.replace_values, src.begin_at, src.public_id, src.definition_id
                     Filter: ((NOT (hashed SubPlan 2)) AND (src.begin_at <= now()) AND (src.expire_at > now()))
                     Rows Removed by Filter: 212
                     SubPlan 2
                       ->  CTE Scan on acknowledged  (cost=0.00..4.18 rows=209 width=16) (actual time=0.162..0.702 rows=212 loops=1)
                             Output: acknowledged.broadcast_id
   ->  Function Scan on public.get_definition_translation dt  (cost=0.25..10.25 rows=1000 width=96) (actual time=0.029..0.030 rows=1 loops=288)
         Output: dt.title, dt.template, dt.redirect_to
         Function Call: get_definition_translation(d.public_id, 'fr_FR'::character varying, 'en_US'::character varying)
 Planning time: 0.621 ms
 Execution time: 12.946 ms
(29 rows)
我想优化查询以减少嵌套循环和函数扫描所花费的时间。我无法理解为什么查询计划器对这两个部分的行数估计也不正确,我想知道这是否会影响性能


感谢

错误估计是因为函数是PostgreSQL优化器的“黑匣子”(从PostgreSQL 12开始,新功能“支持函数”可以缓解这一问题,但这需要编写C代码)

为了获得更好的估计,请告诉优化器函数通常返回多少行,这样它就不必使用猜测的1000行:

ALTER FUNCTION get_definition_translation(did uuid, lang varchar, default_lang varchar)
ROWS 1;

如果您不使用函数,而是替换实际查询,您的情况会好得多。

只有一点:CTE是postgres中的优化围栏:。当您内联CTE时,您是否获得了相同的性能?

是的,我尝试过,没有发现明显的性能差异。谢谢您的回答。我不知道这种可能性。它解决了查询计划中的错误估计。关于函数本身,在找到我所适应的函数之前,我一直在努力在定义翻译语言上实现连接回退。你能提供一些关于函数内部查询替换到主查询的指导吗,或者提供另一种在连接上进行回退的方法吗?对不起,我不明白。我需要的是根据翻译的
language
列将dt.title和dt.template加入到行中。如果此定义/语言不存在行,则应尝试与其他语言谓词联接。我不知道在哪里可以使用你提到的
EXISTS(…)子句。
Scratch,我误读了这个查询。这应该是
交叉连接横向(SELECT…
)。这句话从第12页开始就不再正确了:另一种方法是将函数转换为标记为
稳定的
语言sql
函数,然后Postgres可以内联它。不要使用
语言plpgsql
,但是,如果将它转换为
语言sql stable
,那么该函数就不再是优化器的黑盒了
ALTER FUNCTION get_definition_translation(did uuid, lang varchar, default_lang varchar)
ROWS 1;