PostgreSQL 10-IN和任何性能无法解释的行为

PostgreSQL 10-IN和任何性能无法解释的行为,sql,postgresql,Sql,Postgresql,我从数组/列表中id所在的大表中进行选择。 检查了几个变体,结果令我惊讶 1。使用任意和数组 EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM cca_data_hours WHERE datetime = '2018-01-07 19:00:00'::timestamp without time zone AND id_web_page = ANY (ARRAY[1, 2, 8, 3 /* ~50k ids */]) 结果 "Index

我从数组/列表中id所在的大表中进行选择。 检查了几个变体,结果令我惊讶

1。使用任意和数组

EXPLAIN (ANALYZE,BUFFERS)
SELECT * FROM cca_data_hours
    WHERE
    datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
    id_web_page = ANY (ARRAY[1, 2, 8, 3 /* ~50k ids */])
结果

"Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..576.79 rows=15 width=188) (actual time=0.035..0.998 rows=6 loops=1)"
"  Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"  Filter: (id_web_page = ANY ('{1,2,8,3, (...)"
" Rows Removed by Filter: 5"
"  Buffers: shared hit=3"
"Planning time: 57.625 ms"
"Execution time: 1.065 ms"
"Hash Join  (cost=439.77..472.66 rows=8 width=188) (actual time=90.806..90.858 rows=6 loops=1)"
"  Hash Cond: (cca_data_hours.id_web_page = "*VALUES*".column1)"
"  Buffers: shared hit=3"
"  ->  Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..33.06 rows=15 width=188) (actual time=0.035..0.060 rows=11 loops=1)"
"        Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"        Buffers: shared hit=3"
"  ->  Hash  (cost=436.99..436.99 rows=200 width=4) (actual time=90.742..90.742 rows=4 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        ->  HashAggregate  (cost=434.99..436.99 rows=200 width=4) (actual time=90.709..90.717 rows=4 loops=1)"
"              Group Key: "*VALUES*".column1"
"              ->  Values Scan on "*VALUES*"  (cost=0.00..362.49 rows=28999 width=4) (actual time=0.008..47.056 rows=28999 loops=1)"
"Planning time: 53.607 ms"
"Execution time: 91.681 ms"

2。在和值中使用

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM cca_data_hours
    WHERE
    datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
    id_web_page IN (VALUES (1),(2),(8),(3) /* ~50k ids */)
结果

"Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..576.79 rows=15 width=188) (actual time=0.035..0.998 rows=6 loops=1)"
"  Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"  Filter: (id_web_page = ANY ('{1,2,8,3, (...)"
" Rows Removed by Filter: 5"
"  Buffers: shared hit=3"
"Planning time: 57.625 ms"
"Execution time: 1.065 ms"
"Hash Join  (cost=439.77..472.66 rows=8 width=188) (actual time=90.806..90.858 rows=6 loops=1)"
"  Hash Cond: (cca_data_hours.id_web_page = "*VALUES*".column1)"
"  Buffers: shared hit=3"
"  ->  Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..33.06 rows=15 width=188) (actual time=0.035..0.060 rows=11 loops=1)"
"        Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"        Buffers: shared hit=3"
"  ->  Hash  (cost=436.99..436.99 rows=200 width=4) (actual time=90.742..90.742 rows=4 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        ->  HashAggregate  (cost=434.99..436.99 rows=200 width=4) (actual time=90.709..90.717 rows=4 loops=1)"
"              Group Key: "*VALUES*".column1"
"              ->  Values Scan on "*VALUES*"  (cost=0.00..362.49 rows=28999 width=4) (actual time=0.008..47.056 rows=28999 loops=1)"
"Planning time: 53.607 ms"
"Execution time: 91.681 ms"

我预计案例2会更快,但情况并非如此。
为什么要缓慢地使用值?

比较
解释分析
结果,在给定的示例中,旧版本似乎没有使用可用的索引来
ANY(ARRAY[])
变得更快的原因是在9.2版中

允许在普通索引扫描和仅索引扫描(Tom Lane)中使用
索引\u col op ANY(数组[…])
条件


您从中获得建议的站点是关于9.0版的

我从未见过有人在(值(1)、(2)、(3))中使用类似于插入的
列。通常人们使用(1,2,3)中的
,在Postgres中,它会在内部转换为
column=ANY(ARRAY[1,2,3])
。我读了这篇文章,这并不是所有情况下的万能解决方案。如果是的话,博士后们会在幕后做这件事。您的第一个查询在1ms内执行,您的目标是什么?我现在准备优化的查询,稍后我的表可能有大约100m行