PostgreSQL 10-IN和任何性能无法解释的行为_Sql_Postgresql

PostgreSQL 10-IN和任何性能无法解释的行为

sql postgresql

PostgreSQL 10-IN和任何性能无法解释的行为,sql,postgresql,Sql,Postgresql,我从数组/列表中id所在的大表中进行选择。检查了几个变体，结果令我惊讶 1。使用任意和数组 EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM cca_data_hours WHERE datetime = '2018-01-07 19:00:00'::timestamp without time zone AND id_web_page = ANY (ARRAY[1, 2, 8, 3 /* ~50k ids */]) 结果 "Index

我从数组/列表中id所在的大表中进行选择。检查了几个变体，结果令我惊讶

1。使用任意和数组

EXPLAIN (ANALYZE,BUFFERS)
SELECT * FROM cca_data_hours
    WHERE
    datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
    id_web_page = ANY (ARRAY[1, 2, 8, 3 /* ~50k ids */])

结果

"Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..576.79 rows=15 width=188) (actual time=0.035..0.998 rows=6 loops=1)"
"  Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"  Filter: (id_web_page = ANY ('{1,2,8,3, (...)"
" Rows Removed by Filter: 5"
"  Buffers: shared hit=3"
"Planning time: 57.625 ms"
"Execution time: 1.065 ms"

"Hash Join  (cost=439.77..472.66 rows=8 width=188) (actual time=90.806..90.858 rows=6 loops=1)"
"  Hash Cond: (cca_data_hours.id_web_page = "*VALUES*".column1)"
"  Buffers: shared hit=3"
"  ->  Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..33.06 rows=15 width=188) (actual time=0.035..0.060 rows=11 loops=1)"
"        Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"        Buffers: shared hit=3"
"  ->  Hash  (cost=436.99..436.99 rows=200 width=4) (actual time=90.742..90.742 rows=4 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        ->  HashAggregate  (cost=434.99..436.99 rows=200 width=4) (actual time=90.709..90.717 rows=4 loops=1)"
"              Group Key: "*VALUES*".column1"
"              ->  Values Scan on "*VALUES*"  (cost=0.00..362.49 rows=28999 width=4) (actual time=0.008..47.056 rows=28999 loops=1)"
"Planning time: 53.607 ms"
"Execution time: 91.681 ms"

2。在和值中使用

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM cca_data_hours
    WHERE
    datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
    id_web_page IN (VALUES (1),(2),(8),(3) /* ~50k ids */)

结果

"Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..576.79 rows=15 width=188) (actual time=0.035..0.998 rows=6 loops=1)"
"  Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"  Filter: (id_web_page = ANY ('{1,2,8,3, (...)"
" Rows Removed by Filter: 5"
"  Buffers: shared hit=3"
"Planning time: 57.625 ms"
"Execution time: 1.065 ms"

"Hash Join  (cost=439.77..472.66 rows=8 width=188) (actual time=90.806..90.858 rows=6 loops=1)"
"  Hash Cond: (cca_data_hours.id_web_page = "*VALUES*".column1)"
"  Buffers: shared hit=3"
"  ->  Index Scan using cca_data_hours_pri on cca_data_hours  (cost=0.28..33.06 rows=15 width=188) (actual time=0.035..0.060 rows=11 loops=1)"
"        Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
"        Buffers: shared hit=3"
"  ->  Hash  (cost=436.99..436.99 rows=200 width=4) (actual time=90.742..90.742 rows=4 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        ->  HashAggregate  (cost=434.99..436.99 rows=200 width=4) (actual time=90.709..90.717 rows=4 loops=1)"
"              Group Key: "*VALUES*".column1"
"              ->  Values Scan on "*VALUES*"  (cost=0.00..362.49 rows=28999 width=4) (actual time=0.008..47.056 rows=28999 loops=1)"
"Planning time: 53.607 ms"
"Execution time: 91.681 ms"

我预计案例2会更快，但情况并非如此。

为什么要缓慢地使用值？

比较

解释分析

结果，在给定的示例中，旧版本似乎没有使用可用的索引来

键

。

ANY（ARRAY[]）

变得更快的原因是在9.2版中

允许在普通索引扫描和仅索引扫描（Tom Lane）中使用

索引\u col op ANY（数组[…]）

条件

您从中获得建议的站点是关于9.0版的

我从未见过有人在（值（1）、（2）、（3））中使用类似于插入的

列。通常人们使用（1,2,3）中的列
，在Postgres中，它会在内部转换为column=ANY（ARRAY[1,2,3]）
。我读了这篇文章，这并不是所有情况下的万能解决方案。如果是的话，博士后们会在幕后做这件事。您的第一个查询在1ms内执行，您的目标是什么？我现在准备优化的查询，稍后我的表可能有大约100m行