Sql Postgres查询速度慢,尽管使用了索引
我有以下表格: 主Sql Postgres查询速度慢,尽管使用了索引,sql,postgresql,query-performance,Sql,Postgresql,Query Performance,我有以下表格: 主lead表有近500M行: create table lead ( id integer, client_id integer, insert_date integer (a transformed date that looks like 20201231) ) create index lead_id_index on lead (id); create index
lead
表有近500M行:
create table lead
(
id integer,
client_id integer,
insert_date integer (a transformed date that looks like 20201231)
)
create index lead_id_index
on lead (id);
create index lead_insert_date_index
on lead (insert_date) include (id, client_id);
create index lead_client_id_index
on lead (client_id) include (id, insert_date);
然后是其他桌子
create table last_activity_with_client
(
lead_id integer,
last_activity timestamp,
last_modified timestamp,
client_id integer
);
create index last_activity_with_client_client_id_index
on last_activity_with_client (client_id) include (lead_id, last_activity);
create index last_activity_with_client_last_activity_index
on last_activity_with_client (last_activity desc);
create index last_activity_with_client_lead_id_client_id_index
on last_activity_with_client (lead_id, client_id);
create table lead_last_response_time
(
lead_id integer,
last_response_time timestamp,
last_modified timestamp
);
create index lead_last_response_time_last_response_time_index
on lead_last_response_time (last_response_time desc);
create index lead_last_response_time_lead_id_index
on lead_last_response_time (lead_id);
create table lead_last_response_time
(
lead_id integer,
last_response_time timestamp,
last_modified timestamp
);
create index lead_last_response_time_last_response_time_index
on lead_last_response_time (last_response_time desc);
create index lead_last_response_time_lead_id_index
on lead_last_response_time (lead_id);
create table date_dimensions
(
key integer, (a transformed date that looks like 20201231)
date date,
description varchar(256),
day smallint,
month smallint,
quarter char(2),
year smallint
past_30 boolean
);
create index date_dimensions_key_index
on date_dimensions (key);
我尝试在不同的client\u id
上运行以下查询,但在lead\u表中client\u id
上的位图索引扫描总是会减慢查询速度
EXPLAIN ANALYZE
with TempResult AS (
select DISTINCT lead.id AS lead_id,
last_activity_join.last_activity,
lead_last_response_time.last_response_time
from lead
left join (select * from last_activity_with_client where client_id = 13189) last_activity_join on
lead.id = last_activity_join.lead_id
left join lead_last_response_time lead_last_response_time on
lead.id = lead_last_response_time.lead_id
join date_dimensions date_dimensions on
lead.insert_date = date_dimensions.key
where (date_dimensions.past_30 = true)
and (lead.client_id in (13189))
),
TempCount AS (
select COUNT(*) as total_rows
fromt TempResult
)
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;
一些结果:
正如你所看到的,它正在使用索引,但速度相当慢。总是超过50秒我可以做些什么来加快查询速度?我也可以自由更改查询和表
Try this:
EXPLAIN ANALYZE
with TempResult AS (
select DISTINCT lead.id AS lead_id,
last_activity,
last_response_time
from (
select key
from date_dimensions
where past_30 = true
) date_dimensions
join (select id,
insert_date
from lead
where client_id = 13189
) lead on lead.insert_date = date_dimensions.key
left join (
select lead_id,
last_activity
from last_activity_with_client
where client_id = 13189
) last_activity_join on lead.id = last_activity_join.lead_id
left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
),
TempCount AS (
select COUNT(*) as total_rows
from TempResult
)
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;
或者这个:
EXPLAIN ANALYZE
with TempResult AS (
select DISTINCT lead.id AS lead_id,
last_activity,
last_response_time
from date_dimensions date_dimensions
join (select id,
insert_date
from lead
where client_id = 13189
) lead on lead.insert_date = date_dimensions.key
left join (
select lead_id,
last_activity
from last_activity_with_client
where client_id = 13189
) last_activity_join on lead.id = last_activity_join.lead_id
left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
where date_dimensions.past_30 = true
),
TempCount AS (
select COUNT(*) as total_rows
from TempResult
)
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;
为了在该查询中有效使用,应改为在lead(客户id,插入日期,id)上使用
。使用INCLUDE只会降低索引的实用性,而不会产生任何效果。我认为使用INCLUDE的唯一原因是如果索引在列的子集上是唯一的,或者如果要包含的列是不支持btree操作的类型
但即使是现有的指数也似乎出人意料地缓慢。我想知道它是否有什么问题,比如碎片,或者它位于磁盘的损坏部分,在成功读取之前必须反复重试。您不使用TempCount
,所以您可以从消除它开始。您的结果必须是(13189)中的客户端id(或其他特定的客户端id)还是你这样做是为了测试?@GordonLinoff编辑了query@StefanDzalev所有查询都会在客户端id上进行筛选。这不仅仅是为了测试。您能否编辑问题并用表名限定所有列?否则很难阅读查询。谢谢!我尝试过这种方法,但是一个简单的SELECT*FROM lead,其中client_id=12345
本身即使使用索引也需要很长时间。有时是位图索引扫描
,有时是并行顺序扫描
,具体取决于客户端保存的数据量。但在所有情况下,这都需要很长的时间。我建议您检查客户id索引上是否存在碎片。如果它是碎片化的,您将不得不重新组织它,这将使您的查询更快。谢谢我跟着你的帖子跑了VACUUM FULL
,但我仍然看到相同的响应时间。我创建了这个索引并跑了VACUUM。对于小于30天的日期范围,其给出的值小于1s。但对于较大的时间范围,它会迅速恶化,达到15秒以上。
create index lead_client_id_index
on lead (client_id) include (id, insert_date);