如何使用嵌套在或中的多个和条件索引SQL
我想加速下面的sql(成本是19685.75)。我可以索引这个sql吗?它有多个复杂的嵌套和条件,并与WHERE语句或在WHERE语句中组合如何使用嵌套在或中的多个和条件索引SQL,sql,postgresql,performance,indexing,query-planner,Sql,Postgresql,Performance,Indexing,Query Planner,我想加速下面的sql(成本是19685.75)。我可以索引这个sql吗?它有多个复杂的嵌套和条件,并与WHERE语句或在WHERE语句中组合 SELECT DISTINCT ON ("crawler_url"."url") U0."id" FROM "characteristics_text" U0 LEFT OUTER JOIN "characteristics_text_urls" ON (U0."id" = "characteristics_text_urls"."text
SELECT DISTINCT
ON ("crawler_url"."url") U0."id"
FROM "characteristics_text" U0 LEFT OUTER
JOIN "characteristics_text_urls"
ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
JOIN "crawler_url"
ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
WHERE (
(
U0."publication_date" BETWEEN '2018-01-01' AND '2018-12-31'
AND EXTRACT('month' FROM U0."publication_date") = 10
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" BETWEEN '2018-01-01' AND '2018-12-31'
AND EXTRACT('month' FROM U0."lastmod") = 10
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" BETWEEN '2018-01-01 00:00:00+08:00' AND '2018-12-31 23:59:59.999999+08:00'
AND EXTRACT('month' FROM U0."created_at" AT TIME ZONE 'Asia/Hong_Kong') = 10
)
OR
(
U0."publication_date" >= '2018-08-01'
AND U0."publication_date" < '2018-10-31'
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" >='2018-08-01'
AND U0."lastmod" < '2018-10-31'
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" >= '2018-07-31 16:00:00+00:00'
AND U0."created_at" < '2018-10-30 16:00:00+00:00'
)
)
ORDER BY "crawler_url"."url" ASC, U0."created_at" DESC
我为created_at、lastmod和publication_date添加了三个索引;和这些字段的一个多列索引
但是在postgres EXPAIN查询中,这个where子句仍然使用Seq Scan,而不是Index Scan
-> Seq Scan on characteristics_text u0 (cost=0.00..19685.75 rows=14535 width=12)
Filter: (
(
(publication_date >= '2018-01-01'::date) AND
(publication_date <= '2018-12-31'::date) AND
(
date_part(
'month'::text, (publication_date)::timestamp without time zone
) = 10::double precision)
) OR
((publication_date IS NULL) AND (lastmod >= '2018-01-01'::date) AND (lastmod <= '2018-12-31'::date) AND (date_part('month'::text, (lastmod)::timestamp without time zone) = 10::double precision)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2017-12-31 16:00:00+00'::timestamp with time zone) AND (created_at <= '2018-12-31 15:59:59.999999+00'::timestamp with time zone) AND (date_part('month'::text, timezone('Asia/Hong_Kong'::text, created_at)) = 10::double precision)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))
)
publication\u date
和last\u mod
分别触发索引扫描,而created\u at
无法:
是因为在处创建的是时间戳吗?但为什么索引对时间戳不起作用呢
explain SELECT DISTINCT
ON ("crawler_url"."url") U0."id"
FROM "characteristics_text" U0 LEFT OUTER
JOIN "characteristics_text_urls"
ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
JOIN "crawler_url"
ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
WHERE (
(
U0."created_at" BETWEEN '2018-01-01 00:00:00+08:00' AND '2018-12-31 23:59:59.999999+08:00'
AND EXTRACT('month' FROM U0."created_at" AT TIME ZONE 'Asia/Hong_Kong') = 10
)
)
ORDER BY "crawler_url"."url" ASC, U0."created_at" DESC
Unique (cost=18004.05..18006.01 rows=393 width=86)
-> Sort (cost=18004.05..18005.03 rows=393 width=86)
Sort Key: crawler_url.url, u0.created_at
-> Nested Loop Left Join (cost=0.71..17987.11 rows=393 width=86)
-> Nested Loop Left Join (cost=0.42..17842.25 rows=393 width=16)
-> Seq Scan on characteristics_text u0 (cost=0.00..15467.37 rows=393 width=12)
Filter: ((created_at >= '2017-12-31 16:00:00+00'::timestamp with time zone) AND (created_at <= '2018-12-31 15:59:59.999999+00'::timestamp with time zone) AND (date_part('month'::text, timezone('Asia/Hong_Kong'::text, created_at)) = 10::double precision))
-> Index Scan using characteristics_text_urls_65eb77fe on characteristics_text_urls (cost=0.42..6.03 rows=1 width=8)
Index Cond: (u0.id = text_id)
-> Index Scan using crawler_url_pkey on crawler_url (cost=0.29..0.36 rows=1 width=78)
Index Cond: (characteristics_text_urls.url_id = id)
我怀疑你是否能得到一个有用的索引。您可能会考虑将此查询分解为4或5个部分,然后使用联合将结果粘在一起。(UNION将删除重复项,而UNION ALL将返回所有行)
联合是一个相当昂贵的操作,因此需要考虑它返回多少行。如果联合删除了足够多的行,那么使用索引可以获得比联合损失更多的效率。如果返回了许多行,则当前表单的性能与它将得到的一样好。我怀疑您是否能得到一个在这里有用的索引。您可能会考虑将此查询分解为4或5个部分,然后使用联合将结果粘在一起。(UNION将删除重复项,而UNION ALL将返回所有行)
联合是一个相当昂贵的操作,因此需要考虑它返回多少行。如果联合删除了足够多的行,那么使用索引可以获得比联合损失更多的效率。如果返回了许多行,则当前表单的性能与预期的一样好。好的,完整表扫描(seq_Scan)实际上比多个索引扫描要快。这取决于过滤条件的特定“选择性”
首先,你的WHERE
子句有六个过滤条件,它们是或ed。这意味着如果你想使用索引,PostgreSQL需要使用它6次,然后执行“Index OR”来合并结果。这可能并不便宜
因此,首先,您需要知道6种过滤条件中每一种的预期选择性。这是相对于表的总行数选择的行数。做它;几个简单的SQL查询将为您提供答案。把答案贴在这里
现在,如果所有六个选择性的总和超过5%,那么全表扫描(您现在的算法)会更快。不要为索引操心
否则,以下索引可能会有所帮助:
create index ix1 on characteristics_text (
publication_date,
lastmod,
created_at,
1);
好的,一个完整的表扫描(seq_扫描)实际上可以比多个索引扫描更快。这取决于过滤条件的特定“选择性”
首先,你的WHERE
子句有六个过滤条件,它们是或ed。这意味着如果你想使用索引,PostgreSQL需要使用它6次,然后执行“Index OR”来合并结果。这可能并不便宜
因此,首先,您需要知道6种过滤条件中每一种的预期选择性。这是相对于表的总行数选择的行数。做它;几个简单的SQL查询将为您提供答案。把答案贴在这里
现在,如果所有六个选择性的总和超过5%,那么全表扫描(您现在的算法)会更快。不要为索引操心
否则,以下索引可能会有所帮助:
create index ix1 on characteristics_text (
publication_date,
lastmod,
created_at,
1);
2018年全年10万条记录中有60%的记录,这使得数据库使用seq scan。从一整年到一个月的时间间隔可以进行索引扫描
AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
和U0。“创建于”>=“2018-10-01 00:00:00+00:00”
和U0。“创建于”=“2018-10-01”
和U0。“发布日期”=“2018-10-01”
U0.“lastmod”=“2018-10-01 00:00:00+00:00”
和U0。“创建于”=“2018-08-01”
以及U0.“发布日期”<'2018-10-31')
或
(U0.“发布日期”为空
U0.“lastmod”>=“2018-08-01”
U0.“lastmod”<'2018-10-31')
或
(U0.“发布日期”为空
和U0。“lastmod”为空
和U0。“创建于”>=“2018-07-31 16:00:00+00:00”
“创建时间”<'2018-10-30 16:00:00+00:00')
)
按“爬虫”url、“url”ASC排序
EXPLAIN语句显示每个和条件的索引扫描,所以总共有6个索引扫描
Unique (cost=22885.16..22962.39 rows=15446 width=88)
-> Sort (cost=22885.16..22923.77 rows=15446 width=88)
Sort Key: crawler_url.url
-> Hash Right Join (cost=18669.29..21068.51 rows=15446 width=88)
Hash Cond: (crawler_url.id = characteristics_text_urls.url_id)
-> Seq Scan on crawler_url (cost=0.00..1691.88 rows=55288 width=88)
-> Hash (cost=18476.21..18476.21 rows=15446 width=8)
-> Hash Right Join (cost=14982.09..18476.21 rows=15446 width=8)
Hash Cond: (characteristics_text_urls.text_id = u0.id)
-> Seq Scan on characteristics_text_urls (cost=0.00..1907.25 rows=115525 width=8)
-> Hash (cost=14789.01..14789.01 rows=15446 width=4)
-> Bitmap Heap Scan on characteristics_text u0 (cost=516.57..14789.01 rows=15446 width=4)
Recheck Cond: (((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone)))
-> BitmapOr (cost=516.57..516.57 rows=16081 width=0)
-> Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq (cost=0.00..4.53 rows=11 width=0)
Index Cond: ((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..6.49 rows=166 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..14.61 rows=413 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone))
-> Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq (cost=0.00..74.61 rows=3419 width=0)
Index Cond: ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..108.20 rows=3503 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..284.95 rows=8569 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))
Unique(成本=22885.16..22962.39行=15446宽度=88)
->排序(成本=22885.16..22923.77行=15446宽度=88)
排序键:crawler_url.url
->哈希右连接(成本=18669.29..21068.51行=15446宽度=88)
哈希条件:(爬虫程序\u url.id=特征\u文本\u url.url\u id)
->爬虫url上的序列扫描(成本=0.00..1691.88行=55288宽度=88)
->散列(成本=18476.21..18476.21行=15446宽度=8)
->哈希右连接(成本=14982.09..18476.21行=15446宽度=8)
散列条件:(特征\u文本\u URL.text\u id=u0.id)
->序列特征扫描\u文本\u URL(成本=0.00..1907.25行=115525宽度=8)
->散列(成本=14789.01..14789.01行=15446宽度=4)
->特征上的位图堆扫描\u文本u0(成本=516.57..14789.01行=15446宽度=4)
重新检查条件:((发布日期>='2018-10-01'::日期)和(发布日期='2018-10-01'::日期)和(lastmod='2018-10-01 00:00:00+00'::带时区的时间戳)和(在
AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
SELECT DISTINCT
ON ("crawler_url"."url") U0."id"
FROM "characteristics_text" U0 LEFT OUTER
JOIN "characteristics_text_urls"
ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
JOIN "crawler_url"
ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
WHERE (
(U0."publication_date" >= '2018-10-01'
AND U0."publication_date" <= '2018-11-01')
OR (U0."publication_date" IS NULL
AND U0."lastmod" >= '2018-10-01'
AND U0."lastmod" <= '2018-11-01'
)
OR
(U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
OR
(U0."publication_date" >= '2018-08-01'
AND U0."publication_date" < '2018-10-31')
OR
(U0."publication_date" IS NULL
AND U0."lastmod" >= '2018-08-01'
AND U0."lastmod" < '2018-10-31')
OR
(U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" >= '2018-07-31 16:00:00+00:00'
AND U0."created_at" < '2018-10-30 16:00:00+00:00')
)
ORDER BY "crawler_url"."url" ASC
Unique (cost=22885.16..22962.39 rows=15446 width=88)
-> Sort (cost=22885.16..22923.77 rows=15446 width=88)
Sort Key: crawler_url.url
-> Hash Right Join (cost=18669.29..21068.51 rows=15446 width=88)
Hash Cond: (crawler_url.id = characteristics_text_urls.url_id)
-> Seq Scan on crawler_url (cost=0.00..1691.88 rows=55288 width=88)
-> Hash (cost=18476.21..18476.21 rows=15446 width=8)
-> Hash Right Join (cost=14982.09..18476.21 rows=15446 width=8)
Hash Cond: (characteristics_text_urls.text_id = u0.id)
-> Seq Scan on characteristics_text_urls (cost=0.00..1907.25 rows=115525 width=8)
-> Hash (cost=14789.01..14789.01 rows=15446 width=4)
-> Bitmap Heap Scan on characteristics_text u0 (cost=516.57..14789.01 rows=15446 width=4)
Recheck Cond: (((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone)))
-> BitmapOr (cost=516.57..516.57 rows=16081 width=0)
-> Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq (cost=0.00..4.53 rows=11 width=0)
Index Cond: ((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..6.49 rows=166 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..14.61 rows=413 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone))
-> Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq (cost=0.00..74.61 rows=3419 width=0)
Index Cond: ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..108.20 rows=3503 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..284.95 rows=8569 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))