Sql 为什么这个查询不使用索引？_Sql_Postgresql

Sql 为什么这个查询不使用索引？

sql postgresql

Sql 为什么这个查询不使用索引？,sql,postgresql,Sql,Postgresql,我在以下查询中遇到Postgres优化器的奇怪行为： select count(product0_.id) as col_0_0_ from Product product0_ where product0_.active=true and (product0_.aggregatorId is null or product0_.aggregatorId in ($1 , $2 , $3)) Product大约有54列，active是一个具有btree索引的布尔值，aggregat

我在以下查询中遇到Postgres优化器的奇怪行为：

select count(product0_.id) as col_0_0_ from Product product0_ 
 where product0_.active=true 
 and (product0_.aggregatorId is null 
 or product0_.aggregatorId in ($1 , $2 , $3))

Product

大约有54列，

active

是一个具有btree索引的布尔值，

aggregatorId

是“varchar（15）”并具有btree索引

在此查询上，未使用“aggregatorId”的索引：

Aggregate  (cost=169995.75..169995.76 rows=1 width=32) (actual time=3904.726..3904.727 rows=1 loops=1)
  ->  Seq Scan on product product0_  (cost=0.00..165510.39 rows=1794146 width=32) (actual time=0.055..2407.195 rows=1851827 loops=1)
        Filter: (active AND ((aggregatorid IS NULL) OR ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))))
        Rows Removed by Filter: 542146
Total runtime: 3904.925 ms

但是，如果我们通过省略此列的null检查来减少查询，则会使用索引：

Aggregate  (cost=17600.93..17600.94 rows=1 width=32) (actual time=614.933..614.935 rows=1 loops=1)
  ->  Index Scan using idx_prod_aggr on product product0_  (cost=0.43..17487.56 rows=45347 width=32) (actual time=19.284..594.509 rows=12099 loops=1)
      Index Cond: ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))
      Filter: active
    Rows Removed by Filter: 49130
Total runtime: 150.255 ms

据我所知，btree索引可以处理空检查，所以我不明白为什么索引不用于完整查询。product表包含大约230万个条目，因此速度不是很快

编辑：该指数非常标准：

CREATE INDEX idx_prod_aggr
  ON product
  USING btree
  (aggregatorid COLLATE pg_catalog."default");

您的问题看起来很有趣，所以我复制了您的场景—postgres 9.1，表中有1M行，一个布尔列，一个varchar列，两个都已索引，表的一半有空名称

当varchar列未被索引时，我有相同的解释分析输出。然而，对于索引postgres，在NULL条件和IN条件下使用位图扫描，然后将它们与OR条件合并

然后他在布尔条件下使用seq scan（因为索引是分开的）

见输出：

"Bitmap Heap Scan on a  (cost=17.34..21.35 rows=1 width=18) (actual time=0.048..0.048 rows=0 loops=1)"
"  Recheck Cond: ((name IS NULL) OR ((name)::text = ANY ('{1,2,3}'::text[])))"
"  Filter: (active IS TRUE)"
"  ->  BitmapOr  (cost=17.34..17.34 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..4.41 rows=1 width=0) (actual time=0.010..0.010 rows=0 loops=1)"
"              Index Cond: (name IS NULL)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..12.93 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)"
"              Index Cond: ((name)::text = ANY ('{1,2,3}'::text[]))"
"Total runtime: 0.077 ms"

这让我觉得您遗漏了一些细节，如果是，请将它们添加到您的问题中。

因为where子句中使用的列有许多相同的值（根据您的数字，占所有表行的78%），数据库将得出结论，使用完整表扫描比浪费额外时间读取索引更便宜

大多数数据库供应商的经验法则是，如果索引不能将搜索范围缩小到所有表记录的5%左右，则可能不会使用索引。

您能给我们展示一下

解释分析的输出吗？@a_horse_with_no_name我已经添加了这两个解释分析结果。是否有可能有太多行的结果为空聚合器？@DraganBozanovic太多意味着什么？空值占多数（约180万）。请看我下面的回答。有没有办法告诉博士后在特定情况下使用不同的行为？因为在这里，使用索引也会更快。@uWealner:在运行查询之前，您可以使用设置enable_seqscan=off来关闭seq扫描的使用-但是如果对这么多行进行索引查找实际上会更便宜，我会感到惊讶/faster@a_horse_with_no_name是的，你说得对，禁用seqscan后需要10秒。。。
"Bitmap Heap Scan on a  (cost=17.34..21.35 rows=1 width=18) (actual time=0.048..0.048 rows=0 loops=1)"
"  Recheck Cond: ((name IS NULL) OR ((name)::text = ANY ('{1,2,3}'::text[])))"
"  Filter: (active IS TRUE)"
"  ->  BitmapOr  (cost=17.34..17.34 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..4.41 rows=1 width=0) (actual time=0.010..0.010 rows=0 loops=1)"
"              Index Cond: (name IS NULL)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..12.93 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)"
"              Index Cond: ((name)::text = ANY ('{1,2,3}'::text[]))"
"Total runtime: 0.077 ms"