如何使用嵌套在或中的多个和条件索引SQL

如何使用嵌套在或中的多个和条件索引SQL,sql,postgresql,performance,indexing,query-planner,Sql,Postgresql,Performance,Indexing,Query Planner,我想加速下面的sql(成本是19685.75)。我可以索引这个sql吗?它有多个复杂的嵌套和条件,并与WHERE语句或在WHERE语句中组合 SELECT DISTINCT ON ("crawler_url"."url") U0."id" FROM "characteristics_text" U0 LEFT OUTER JOIN "characteristics_text_urls" ON (U0."id" = "characteristics_text_urls"."text

我想加速下面的sql(成本是19685.75)。我可以索引这个sql吗?它有多个复杂的嵌套和条件,并与WHERE语句或在WHERE语句中组合

SELECT DISTINCT
    ON ("crawler_url"."url") U0."id"
FROM "characteristics_text" U0 LEFT OUTER
JOIN "characteristics_text_urls"
    ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
JOIN "crawler_url"
    ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
WHERE ( 
    (
        U0."publication_date" BETWEEN '2018-01-01' AND '2018-12-31'
        AND EXTRACT('month' FROM U0."publication_date") = 10
    )
        OR 
    (
        U0."publication_date" IS NULL
        AND U0."lastmod" BETWEEN '2018-01-01' AND '2018-12-31'
        AND EXTRACT('month' FROM U0."lastmod") = 10
    )
        OR 
    (
        U0."publication_date" IS NULL
        AND U0."lastmod" IS NULL
        AND U0."created_at" BETWEEN '2018-01-01 00:00:00+08:00' AND '2018-12-31 23:59:59.999999+08:00'
        AND EXTRACT('month' FROM U0."created_at" AT TIME ZONE 'Asia/Hong_Kong') = 10
    )
        OR 
    (
        U0."publication_date" >= '2018-08-01'
        AND U0."publication_date" < '2018-10-31'
    )
        OR 
    (
        U0."publication_date" IS NULL
        AND U0."lastmod" >='2018-08-01'
        AND U0."lastmod" < '2018-10-31'
    )
        OR 
    (
        U0."publication_date" IS NULL
        AND U0."lastmod" IS NULL
        AND U0."created_at" >= '2018-07-31 16:00:00+00:00'
        AND U0."created_at" < '2018-10-30 16:00:00+00:00'
    ) 
)
ORDER BY  "crawler_url"."url" ASC, U0."created_at" DESC
我为created_at、lastmod和publication_date添加了三个索引;和这些字段的一个多列索引

但是在postgres EXPAIN查询中,这个where子句仍然使用Seq Scan,而不是Index Scan

->  Seq Scan on characteristics_text u0  (cost=0.00..19685.75 rows=14535 width=12)
    Filter: (
            (
                (publication_date >= '2018-01-01'::date) AND 
                (publication_date <= '2018-12-31'::date) AND 
                (
                        date_part(
                            'month'::text, (publication_date)::timestamp without time zone
                ) = 10::double precision)
            ) OR 

                ((publication_date IS NULL) AND (lastmod >= '2018-01-01'::date) AND (lastmod <= '2018-12-31'::date) AND (date_part('month'::text, (lastmod)::timestamp without time zone) = 10::double precision)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2017-12-31 16:00:00+00'::timestamp with time zone) AND (created_at <= '2018-12-31 15:59:59.999999+00'::timestamp with time zone) AND (date_part('month'::text, timezone('Asia/Hong_Kong'::text, created_at)) = 10::double precision)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))
)
  • 如果索引为空,搜索是否有效?搜索为空的字段是否需要索引
  • 于2018年11月4日更新

    当我尝试通过逐个测试字段来最小化查询时,字段
    publication\u date
    last\u mod
    分别触发索引扫描,而
    created\u at
    无法:

    是因为在处创建的
    是时间戳吗?但为什么索引对时间戳不起作用呢

    explain SELECT DISTINCT
        ON ("crawler_url"."url") U0."id"
    FROM "characteristics_text" U0 LEFT OUTER
    JOIN "characteristics_text_urls"
        ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
    JOIN "crawler_url"
        ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
    WHERE ( 
    (
            U0."created_at" BETWEEN '2018-01-01 00:00:00+08:00' AND '2018-12-31 23:59:59.999999+08:00'
            AND EXTRACT('month' FROM U0."created_at" AT TIME ZONE 'Asia/Hong_Kong') = 10
        )   
    )
    ORDER BY  "crawler_url"."url" ASC, U0."created_at" DESC
    
    
    
    Unique  (cost=18004.05..18006.01 rows=393 width=86)
    ->  Sort  (cost=18004.05..18005.03 rows=393 width=86)
            Sort Key: crawler_url.url, u0.created_at
            ->  Nested Loop Left Join  (cost=0.71..17987.11 rows=393 width=86)
                ->  Nested Loop Left Join  (cost=0.42..17842.25 rows=393 width=16)
                        ->  Seq Scan on characteristics_text u0  (cost=0.00..15467.37 rows=393 width=12)
                            Filter: ((created_at >= '2017-12-31 16:00:00+00'::timestamp with time zone) AND (created_at <= '2018-12-31 15:59:59.999999+00'::timestamp with time zone) AND (date_part('month'::text, timezone('Asia/Hong_Kong'::text, created_at)) = 10::double precision))
                        ->  Index Scan using characteristics_text_urls_65eb77fe on characteristics_text_urls  (cost=0.42..6.03 rows=1 width=8)
                            Index Cond: (u0.id = text_id)
                ->  Index Scan using crawler_url_pkey on crawler_url  (cost=0.29..0.36 rows=1 width=78)
                        Index Cond: (characteristics_text_urls.url_id = id)
    

    我怀疑你是否能得到一个有用的索引。您可能会考虑将此查询分解为4或5个部分,然后使用联合将结果粘在一起。(UNION将删除重复项,而UNION ALL将返回所有行)


    联合是一个相当昂贵的操作,因此需要考虑它返回多少行。如果联合删除了足够多的行,那么使用索引可以获得比联合损失更多的效率。如果返回了许多行,则当前表单的性能与它将得到的一样好。

    我怀疑您是否能得到一个在这里有用的索引。您可能会考虑将此查询分解为4或5个部分,然后使用联合将结果粘在一起。(UNION将删除重复项,而UNION ALL将返回所有行)

    联合是一个相当昂贵的操作,因此需要考虑它返回多少行。如果联合删除了足够多的行,那么使用索引可以获得比联合损失更多的效率。如果返回了许多行,则当前表单的性能与预期的一样好。

    好的,完整表扫描(seq_Scan)实际上比多个索引扫描要快。这取决于过滤条件的特定“选择性”

    首先,你的
    WHERE
    子句有六个过滤条件,它们是
    ed。这意味着如果你想使用索引,PostgreSQL需要使用它6次,然后执行“Index OR”来合并结果。这可能并不便宜

    因此,首先,您需要知道6种过滤条件中每一种的预期选择性。这是相对于表的总行数选择的行数。做它;几个简单的SQL查询将为您提供答案。把答案贴在这里

    现在,如果所有六个选择性的总和超过5%,那么全表扫描(您现在的算法)会更快。不要为索引操心

    否则,以下索引可能会有所帮助:

    create index ix1 on characteristics_text (
      publication_date, 
      lastmod,
      created_at,
      1);
    
    好的,一个完整的表扫描(seq_扫描)实际上可以比多个索引扫描更快。这取决于过滤条件的特定“选择性”

    首先,你的
    WHERE
    子句有六个过滤条件,它们是
    ed。这意味着如果你想使用索引,PostgreSQL需要使用它6次,然后执行“Index OR”来合并结果。这可能并不便宜

    因此,首先,您需要知道6种过滤条件中每一种的预期选择性。这是相对于表的总行数选择的行数。做它;几个简单的SQL查询将为您提供答案。把答案贴在这里

    现在,如果所有六个选择性的总和超过5%,那么全表扫描(您现在的算法)会更快。不要为索引操心

    否则,以下索引可能会有所帮助:

    create index ix1 on characteristics_text (
      publication_date, 
      lastmod,
      created_at,
      1);
    

    2018年全年10万条记录中有60%的记录,这使得数据库使用seq scan。从一整年到一个月的时间间隔可以进行索引扫描

      AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
        AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
    
    和U0。“创建于”>=“2018-10-01 00:00:00+00:00”
    和U0。“创建于”=“2018-10-01”
    和U0。“发布日期”=“2018-10-01”
    U0.“lastmod”=“2018-10-01 00:00:00+00:00”
    和U0。“创建于”=“2018-08-01”
    以及U0.“发布日期”<'2018-10-31')
    或
    (U0.“发布日期”为空
    U0.“lastmod”>=“2018-08-01”
    U0.“lastmod”<'2018-10-31')
    或
    (U0.“发布日期”为空
    和U0。“lastmod”为空
    和U0。“创建于”>=“2018-07-31 16:00:00+00:00”
    “创建时间”<'2018-10-30 16:00:00+00:00')
    )
    按“爬虫”url、“url”ASC排序
    
    EXPLAIN语句显示每个和条件的索引扫描,所以总共有6个索引扫描

    Unique  (cost=22885.16..22962.39 rows=15446 width=88)
    ->  Sort  (cost=22885.16..22923.77 rows=15446 width=88)
            Sort Key: crawler_url.url
            ->  Hash Right Join  (cost=18669.29..21068.51 rows=15446 width=88)
                Hash Cond: (crawler_url.id = characteristics_text_urls.url_id)
                ->  Seq Scan on crawler_url  (cost=0.00..1691.88 rows=55288 width=88)
                ->  Hash  (cost=18476.21..18476.21 rows=15446 width=8)
                        ->  Hash Right Join  (cost=14982.09..18476.21 rows=15446 width=8)
                            Hash Cond: (characteristics_text_urls.text_id = u0.id)
                            ->  Seq Scan on characteristics_text_urls  (cost=0.00..1907.25 rows=115525 width=8)
                            ->  Hash  (cost=14789.01..14789.01 rows=15446 width=4)
                                    ->  Bitmap Heap Scan on characteristics_text u0  (cost=516.57..14789.01 rows=15446 width=4)
                                        Recheck Cond: (((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone)))
                                        ->  BitmapOr  (cost=516.57..516.57 rows=16081 width=0)
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq  (cost=0.00..4.53 rows=11 width=0)
                                                    Index Cond: ((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..6.49 rows=166 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..14.61 rows=413 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq  (cost=0.00..74.61 rows=3419 width=0)
                                                    Index Cond: ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..108.20 rows=3503 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..284.95 rows=8569 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))
    
    Unique(成本=22885.16..22962.39行=15446宽度=88)
    ->排序(成本=22885.16..22923.77行=15446宽度=88)
    排序键:crawler_url.url
    ->哈希右连接(成本=18669.29..21068.51行=15446宽度=88)
    哈希条件:(爬虫程序\u url.id=特征\u文本\u url.url\u id)
    ->爬虫url上的序列扫描(成本=0.00..1691.88行=55288宽度=88)
    ->散列(成本=18476.21..18476.21行=15446宽度=8)
    ->哈希右连接(成本=14982.09..18476.21行=15446宽度=8)
    散列条件:(特征\u文本\u URL.text\u id=u0.id)
    ->序列特征扫描\u文本\u URL(成本=0.00..1907.25行=115525宽度=8)
    ->散列(成本=14789.01..14789.01行=15446宽度=4)
    ->特征上的位图堆扫描\u文本u0(成本=516.57..14789.01行=15446宽度=4)
    重新检查条件:((发布日期>='2018-10-01'::日期)和(发布日期='2018-10-01'::日期)和(lastmod='2018-10-01 00:00:00+00'::带时区的时间戳)和(在
    
      AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
        AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
    
    SELECT DISTINCT
        ON ("crawler_url"."url") U0."id"
    FROM "characteristics_text" U0 LEFT OUTER
    JOIN "characteristics_text_urls"
        ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
    JOIN "crawler_url"
        ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
    WHERE (
            (U0."publication_date" >= '2018-10-01'
            AND U0."publication_date" <= '2018-11-01')
    
            OR (U0."publication_date" IS NULL
            AND U0."lastmod" >= '2018-10-01'
            AND U0."lastmod" <= '2018-11-01'
            )
    
            OR 
    
            (U0."publication_date" IS NULL
            AND U0."lastmod" IS NULL
            AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
            AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
    
            OR 
    
            (U0."publication_date" >= '2018-08-01'
            AND U0."publication_date" < '2018-10-31')
    
            OR 
    
            (U0."publication_date" IS NULL
            AND U0."lastmod" >= '2018-08-01'
            AND U0."lastmod" < '2018-10-31')
    
            OR 
    
            (U0."publication_date" IS NULL
            AND U0."lastmod" IS NULL
            AND U0."created_at" >= '2018-07-31 16:00:00+00:00'
            AND U0."created_at" < '2018-10-30 16:00:00+00:00')
        )
    ORDER BY  "crawler_url"."url" ASC
    
    Unique  (cost=22885.16..22962.39 rows=15446 width=88)
    ->  Sort  (cost=22885.16..22923.77 rows=15446 width=88)
            Sort Key: crawler_url.url
            ->  Hash Right Join  (cost=18669.29..21068.51 rows=15446 width=88)
                Hash Cond: (crawler_url.id = characteristics_text_urls.url_id)
                ->  Seq Scan on crawler_url  (cost=0.00..1691.88 rows=55288 width=88)
                ->  Hash  (cost=18476.21..18476.21 rows=15446 width=8)
                        ->  Hash Right Join  (cost=14982.09..18476.21 rows=15446 width=8)
                            Hash Cond: (characteristics_text_urls.text_id = u0.id)
                            ->  Seq Scan on characteristics_text_urls  (cost=0.00..1907.25 rows=115525 width=8)
                            ->  Hash  (cost=14789.01..14789.01 rows=15446 width=4)
                                    ->  Bitmap Heap Scan on characteristics_text u0  (cost=516.57..14789.01 rows=15446 width=4)
                                        Recheck Cond: (((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone)))
                                        ->  BitmapOr  (cost=516.57..516.57 rows=16081 width=0)
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq  (cost=0.00..4.53 rows=11 width=0)
                                                    Index Cond: ((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..6.49 rows=166 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..14.61 rows=413 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq  (cost=0.00..74.61 rows=3419 width=0)
                                                    Index Cond: ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..108.20 rows=3503 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date))
                                                ->  Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx  (cost=0.00..284.95 rows=8569 width=0)
                                                    Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))