Sql Postgres查询速度慢，尽管使用了索引_Sql_Postgresql_Query Performance

Sql Postgres查询速度慢，尽管使用了索引

sql postgresql

Sql Postgres查询速度慢，尽管使用了索引,sql,postgresql,query-performance,Sql,Postgresql,Query Performance,我有以下表格：主lead表有近500M行： create table lead ( id integer, client_id integer, insert_date integer (a transformed date that looks like 20201231) ) create index lead_id_index on lead (id); create index

我有以下表格：

主

lead

表有近500M行：

create table lead
(
    id                  integer,
    client_id           integer,
    insert_date         integer  (a transformed date that looks like 20201231)
)

create index lead_id_index
    on lead (id);

create index lead_insert_date_index
    on lead (insert_date) include (id, client_id);

create index lead_client_id_index
    on lead (client_id) include (id, insert_date);

然后是其他桌子

create table last_activity_with_client
(
    lead_id       integer,
    last_activity timestamp,
    last_modified timestamp,
    client_id     integer
);

create index last_activity_with_client_client_id_index
    on last_activity_with_client (client_id) include (lead_id, last_activity);

create index last_activity_with_client_last_activity_index
    on last_activity_with_client (last_activity desc);

create index last_activity_with_client_lead_id_client_id_index
    on last_activity_with_client (lead_id, client_id);


create table lead_last_response_time
(
    lead_id            integer,
    last_response_time timestamp,
    last_modified      timestamp
);

create index lead_last_response_time_last_response_time_index
    on lead_last_response_time (last_response_time desc);

create index lead_last_response_time_lead_id_index
    on lead_last_response_time (lead_id);



create table lead_last_response_time
(
    lead_id            integer,
    last_response_time timestamp,
    last_modified      timestamp
);

create index lead_last_response_time_last_response_time_index
    on lead_last_response_time (last_response_time desc);

create index lead_last_response_time_lead_id_index
    on lead_last_response_time (lead_id);



create table date_dimensions
(
    key                      integer,  (a transformed date that looks like 20201231)
    date                     date,
    description              varchar(256),
    day                      smallint,
    month                    smallint,
    quarter                  char(2),
    year                     smallint
    past_30                  boolean
);

create index date_dimensions_key_index
    on date_dimensions (key);

我尝试在不同的

client\u id

上运行以下查询，但在

lead\u表中client\u id
上的位图索引扫描总是会减慢查询速度
EXPLAIN ANALYZE
with TempResult AS (
    select DISTINCT lead.id AS lead_id,
                    last_activity_join.last_activity,
                    lead_last_response_time.last_response_time
    from lead
             left join (select * from last_activity_with_client where client_id = 13189) last_activity_join on
        lead.id = last_activity_join.lead_id

             left join lead_last_response_time lead_last_response_time on
        lead.id = lead_last_response_time.lead_id

             join date_dimensions date_dimensions on
        lead.insert_date = date_dimensions.key

    where (date_dimensions.past_30 = true)
      and (lead.client_id in (13189))
),
     TempCount AS (
         select COUNT(*) as total_rows
         fromt TempResult
     )
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;

一些结果：

正如你所看到的，它正在使用索引，但速度相当慢。总是超过50秒我可以做些什么来加快查询速度？我也可以自由更改查询和表
Try this:

        EXPLAIN ANALYZE
          with TempResult AS (
                select DISTINCT lead.id AS lead_id,
                last_activity,
                last_response_time 
                from (
                select key 
                from date_dimensions 
                where past_30 = true
                ) date_dimensions
                join (select id, 
                insert_date 
                from lead 
                where client_id = 13189
                ) lead on lead.insert_date = date_dimensions.key
                left join (
                select lead_id, 
                last_activity 
                from last_activity_with_client 
                where client_id = 13189
                ) last_activity_join on lead.id = last_activity_join.lead_id
                left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
    ),
     TempCount AS (
         select COUNT(*) as total_rows
         from TempResult
     )
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;

或者这个：
    EXPLAIN ANALYZE
          with TempResult AS (
                select DISTINCT lead.id AS lead_id,
                last_activity,
                last_response_time 
                from  date_dimensions date_dimensions
                join (select id, 
                insert_date 
                from lead 
                where client_id = 13189
                ) lead on lead.insert_date = date_dimensions.key
                left join (
                select lead_id, 
                last_activity 
                from last_activity_with_client 
                where client_id = 13189
                ) last_activity_join on lead.id = last_activity_join.lead_id
                left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
                where date_dimensions.past_30 = true
    ),
     TempCount AS (
         select COUNT(*) as total_rows
         from TempResult
     )
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;

为了在该查询中有效使用，应改为在lead（客户id，插入日期，id）上使用
。使用INCLUDE只会降低索引的实用性，而不会产生任何效果。我认为使用INCLUDE的唯一原因是如果索引在列的子集上是唯一的，或者如果要包含的列是不支持btree操作的类型
但即使是现有的指数也似乎出人意料地缓慢。我想知道它是否有什么问题，比如碎片，或者它位于磁盘的损坏部分，在成功读取之前必须反复重试。
您不使用TempCount
，所以您可以从消除它开始。您的结果必须是（13189）中的客户端id（或其他特定的客户端id）还是你这样做是为了测试？@GordonLinoff编辑了query@StefanDzalev所有查询都会在客户端id上进行筛选。这不仅仅是为了测试。您能否编辑问题并用表名限定所有列？否则很难阅读查询。谢谢！我尝试过这种方法，但是一个简单的SELECT*FROM lead，其中client_id=12345
本身即使使用索引也需要很长时间。有时是位图索引扫描
，有时是并行顺序扫描
，具体取决于客户端保存的数据量。但在所有情况下，这都需要很长的时间。我建议您检查客户id索引上是否存在碎片。如果它是碎片化的，您将不得不重新组织它，这将使您的查询更快。谢谢我跟着你的帖子跑了VACUUM FULL，但我仍然看到相同的响应时间。我创建了这个索引并跑了VACUUM。对于小于30天的日期范围，其给出的值小于1s。但对于较大的时间范围，它会迅速恶化，达到15秒以上。
create index lead_client_id_index
    on lead (client_id) include (id, insert_date);