Sql Postgres三角图和排序非常慢
我试图建立一个针对多个列的模糊搜索,每个列与相应搜索项之间的距离都有权重 我有以下疑问:Sql Postgres三角图和排序非常慢,sql,postgresql,group-by,Sql,Postgresql,Group By,我试图建立一个针对多个列的模糊搜索,每个列与相应搜索项之间的距离都有权重 我有以下疑问: select sf_id from ( select * from ( select sf_id , (1.0 - cast(coalesce(mailingcity, '') <->> 'san ant' as float)) * 3.0 as score from contacts order by score
select sf_id
from (
select *
from (
select sf_id ,
(1.0 - cast(coalesce(mailingcity, '') <->> 'san ant' as float)) * 3.0 as score
from contacts
order by score desc
limit 1000
) as mailingcity
union
select *
from (
select sf_id,
(1.0 - cast(coalesce(lastname, '') <->> 'anders' as float)) * 5.0 as score
from contacts
order by score
desc limit 1000
) as lastname
)
as agg
group by sf_id
order by sum(score) desc
在用于匹配的列上
我们在表中有大约500000条记录,查询需要三秒钟
我还有关于coalesce函数的表达式索引
有没有办法加快速度
解释结果-
Sort (cost=212791.05..212791.55 rows=200 width=154) (actual time=3165.154..3165.247 rows=2000 loops=1)
Sort Key: (sum(((('1'::double precision - (((COALESCE(contacts.mailingcity, ''::character varying))::text <->> 'san ant'::text))::double precision) * '3'::double precision)))) DESC
Sort Method: quicksort Memory: 205kB
-> GroupAggregate (cost=212766.41..212783.41 rows=200 width=154) (actual time=3163.855..3164.621 rows=2000 loops=1)
Group Key: contacts.sf_id
-> Sort (cost=212766.41..212771.41 rows=2000 width=154) (actual time=3163.847..3163.966 rows=2000 loops=1)
Sort Key: contacts.sf_id
Sort Method: quicksort Memory: 205kB
-> HashAggregate (cost=212616.75..212636.75 rows=2000 width=154) (actual time=3155.719..3156.055 rows=2000 loops=1)
Group Key: contacts.sf_id, ((('1'::double precision - (((COALESCE(contacts.mailingcity, ''::character varying))::text <->> 'san ant'::text))::double precision) * '3'::double precision))
-> Append (cost=106166.70..212606.75 rows=2000 width=154) (actual time=1629.241..3154.841 rows=2000 loops=1)
-> Limit (cost=106166.70..106283.37 rows=1000 width=27) (actual time=1629.241..1629.798 rows=1000 loops=1)
-> Gather Merge (cost=106166.70..154807.03 rows=416888 width=27) (actual time=1629.239..1629.730 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=105166.68..105687.79 rows=208444 width=27) (actual time=1589.059..1589.232 rows=1021 loops=3)
Sort Key: ((('1'::double precision - (((COALESCE(contacts.mailingcity, ''::character varying))::text <->> 'san ant'::text))::double precision) * '3'::double precision)) DESC
Sort Method: external merge Disk: 7256kB
-> Parallel Seq Scan on contacts (cost=0.00..81763.88 rows=208444 width=27) (actual time=0.145..1405.681 rows=166755 loops=3)
-> Limit (cost=106166.70..106283.37 rows=1000 width=27) (actual time=1524.305..1524.912 rows=1000 loops=1)
-> Gather Merge (cost=106166.70..154807.03 rows=416888 width=27) (actual time=1524.304..1524.842 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=105166.68..105687.79 rows=208444 width=27) (actual time=1455.159..1455.386 rows=1016 loops=3)
Sort Key: ((('1'::double precision - (((COALESCE(contacts_1.lastname, ''::character varying))::text <->> 'anders'::text))::double precision) * '5'::double precision)) DESC
Sort Method: external merge Disk: 7280kB
-> Parallel Seq Scan on contacts contacts_1 (cost=0.00..81763.88 rows=208444 width=27) (actual time=0.373..1290.368 rows=166755 loops=3)
Planning time: 0.855 ms
Execution time: 3218.589 ms
排序(成本=212791.05..212791.55行=200宽度=154)(实际时间=3165.154..3165.247行=2000循环=1)
排序键:(sum(((('1'::双精度-((COALESCE(contacts.mailingcity,::字符变化))::text>'sanant'::text))::双精度)*'3'::双精度)))描述
排序方法:快速排序内存:205kB
->GroupAggregate(成本=212766.41..212783.41行=200宽=154)(实际时间=3163.855..3164.621行=2000圈=1)
组密钥:contacts.sf\u id
->排序(成本=212766.41..212771.41行=2000宽度=154)(实际时间=3163.847..3163.966行=2000循环=1)
排序键:contacts.sf\u id
排序方法:快速排序内存:205kB
->HashAggregate(成本=212616.75..212636.75行=2000宽度=154)(实际时间=3155.719..3156.055行=2000循环=1)
组键:contacts.sf_id,((('1'::双精度-((COALESCE(contacts.mailingcity,,::字符变化))::text>'sanant'::text))::双精度)*'3'::双精度))
->追加(成本=106166.70..212606.75行=2000宽度=154)(实际时间=1629.241..3154.841行=2000循环=1)
->限制(成本=106166.70..106283.37行=1000宽=27)(实际时间=1629.241..1629.798行=1000圈=1)
->聚集合并(成本=106166.70..154807.03行=416888宽度=27)(实际时间=1629.239..1629.730行=1000循环=1)
计划人数:2人
劳工处推出:2
->排序(成本=105166.68..105687.79行=208444宽度=27)(实际时间=1589.059..1589.232行=1021循环=3)
排序键:((('1'::双精度-((COALESCE(contacts.mailingcity,::字符变化))::text>'sanant'::text))::双精度)*'3'::双精度)描述
排序方法:外部合并磁盘:7256kB
->触点上的并行顺序扫描(成本=0.00..81763.88行=208444宽度=27)(实际时间=0.145..1405.681行=166755圈=3)
->限制(成本=106166.70..106283.37行=1000宽=27)(实际时间=1524.305..1524.912行=1000圈=1)
->聚集合并(成本=106166.70..154807.03行=416888宽度=27)(实际时间=1524.304..1524.842行=1000循环=1)
计划人数:2人
劳工处推出:2
->排序(成本=105166.68..105687.79行=208444宽度=27)(实际时间=1455.159..1455.386行=1016圈=3)
排序键:(('1'::双精度-((合并(contacts_1.lastname,::字符变化))::文本>anders::文本):双精度)*'5'::双精度)描述
排序方法:外部合并磁盘:7280kB
->触点1上的平行顺序扫描(成本=0.00..81763.88行=208444宽度=27)(实际时间=0.373..1290.368行=166755圈=3)
计划时间:0.855毫秒
执行时间:3218.589毫秒
根据下面的@a_horse_和_no_名称的建议,下面的查询现在以大约250ms的速度运行
select sf_id from (select * from (
select sf_id ,
(1.0 - cast(coalesce(mailingcity, '') <->> 'san ant' as float)) * 3.0
as score
from contacts
where mailingcity % 'san ant'
order by score desc limit 1000) as mailingcity
union all
select * from (
select sf_id,
(1.0 - cast(coalesce(lastname, '') <->> 'anders' as float)) * 5.0 as score
from contacts
where lastname % 'anders'
order by score desc limit 1000) as lastname)
as agg group by sf_id order by sum(score) desc
select sf\u id from(select*from(
选择sf_id,
(1.0-cast(联合(mailingcity)、>“san ant”作为浮动))*3.0
作为分数
来自联系人
哪里有mailingcity%‘san ant’
按分数排序(限制1000)为mailingcity
联合所有
从中选择*(
选择sf_id,
(1.0-演员阵容(coalesce(lastname)、>“anders”为浮动演员阵容)*5.0为得分
来自联系人
其中lastname%'anders'
按分数排序(限制1000)作为姓氏)
按sf_id顺序按总和(分数)描述作为agg组
请添加解释(分析)
输出并格式化查询,使其可读。第一个优化是使用union all
而不是union
,索引用于快速查找要计算的行-通常由where
子句指定。您的单个查询请求所有需要排序的行,然后丢弃其中的大部分。如果在完整表达式和sf\u id
上有一个索引可能会有帮助,那么我已经按照a\u horse\u用\u no\u name更新了解释文本request@a_horse_with_no_name谢谢根据您的建议,以下查询将缩短到250毫秒!!!选择sf_id from(选择*from(选择sf_id,(1.0-强制转换(coalesce(mailingcity)”>“san ant”作为浮动))*3.0作为来自联系人的分数,其中mailingcity%“san ant”按分数说明排序限制1000)作为mailingcity union所有选择*from(选择sf_id,(1.0-强制转换(coalesce(lastname)”>“anders”作为浮动))*5.0作为联系人的分数,其中lastname%'anders'order by score desc limit 1000)作为lastname)作为agg group by sf_id order by sum(score)desc
select sf_id from (select * from (
select sf_id ,
(1.0 - cast(coalesce(mailingcity, '') <->> 'san ant' as float)) * 3.0
as score
from contacts
where mailingcity % 'san ant'
order by score desc limit 1000) as mailingcity
union all
select * from (
select sf_id,
(1.0 - cast(coalesce(lastname, '') <->> 'anders' as float)) * 5.0 as score
from contacts
where lastname % 'anders'
order by score desc limit 1000) as lastname)
as agg group by sf_id order by sum(score) desc