Postgresql 部分索引表达式的统计不一致_Postgresql_Postgresql 9.6

Postgresql 部分索引表达式的统计不一致

postgresql

Postgresql 部分索引表达式的统计不一致,postgresql,postgresql-9.6,Postgresql,Postgresql 9.6,[x86_64-pc-linux-gnu上的PostgreSQL 9.6.1，由gcc（Debian 6.2.0-10）6.2.0 20161027编译，64位] 我有一个带有时间戳范围的表： create table testing.test as select tsrange(d, null) ts from generate_series(timestamp '2000-01-01', timestamp '2018-01-01', interval '1 minute') s(d);

[x86_64-pc-linux-gnu上的PostgreSQL 9.6.1，由gcc（Debian 6.2.0-10）6.2.0 20161027编译，64位]

我有一个带有时间戳范围的表：

create table testing.test as 
select tsrange(d, null) ts from 
generate_series(timestamp '2000-01-01', timestamp '2018-01-01', interval '1 minute') s(d);

我需要运行以下查询：

select * 
from testing.test 
where lower(ts)> '2017-06-17 20:00:00'::timestamp and upper_inf(ts)

select * 
from testing.test 
where lower(ts)> '2017-06-17 20:00:00'::timestamp and latest = true

解释无索引表的分析结果：

Seq Scan on test  (cost=0.00..72482.26 rows=1052013 width=14) (actual time=2165.477..2239.781 rows=283920 loops=1)
  Filter: (upper_inf(ts) AND (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone))
  Rows Removed by Filter: 9184081
Planning time: 0.046 ms
Execution time: 2250.221 ms

接下来，我将添加以下部分索引：

create index lower_rt_inf ON testing.test using btree(lower(ts)) where upper_inf(ts);    
analyze testing.test;

Index Scan using lower_rt_inf on test  (cost=0.04..10939.03 rows=1051995 width=14) (actual time=0.037..52.083 rows=283920 loops=1)
  Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 0.156 ms
Execution time: 62.900 ms

解释带有部分索引的表的分析结果：

create index lower_rt_inf ON testing.test using btree(lower(ts)) where upper_inf(ts);    
analyze testing.test;

Index Scan using lower_rt_inf on test  (cost=0.04..10939.03 rows=1051995 width=14) (actual time=0.037..52.083 rows=283920 loops=1)
  Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 0.156 ms
Execution time: 62.900 ms

以及：

然后，我创建了一个与前一个类似的索引，但没有部分条件：

create index lower_rt_full ON testing.test using btree(lower(ts));
analyze testing.test;

现在使用相同的索引，但成本/行不同：

Index Scan using lower_rt_inf on test  (cost=0.04..1053.87 rows=101256 width=14) (actual time=0.029..58.613 rows=283920 loops=1)
  Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 0.280 ms
Execution time: 71.794 ms

还有一点：

select * from testing.test where lower(ts)> '2017-06-17 20:00:00'::timestamp;

Index Scan using lower_rt_full on test  (cost=0.04..3159.52 rows=303767 width=14) (actual time=0.036..64.208 rows=283920 loops=1)
  Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 0.099 ms
Execution time: 78.759 ms

如何有效地使用表达式的部分索引？

这里发生的事情是，索引

lower\u rt\u full

上的统计信息用于估计行数，但作为部分索引的

lower\u rt\u inf

则不是

因为函数是PostgreSQL的黑盒，所以它不知道

lower（ts）

的分布，并且使用了错误的估计

在创建了

lower\u rt\u full

并分析了表之后，PostgreSQL对这种分布有了很好的了解，可以更好地进行估计。即使索引不用于执行查询，它也用于查询规划

由于

upper\u-inf

也是一个函数（黑匣子），如果您在测试中有一个索引

（upper\u-inf（ts），lower（ts））

，您将得到更好的估计

有关为什么不考虑使用部分索引来估计结果行数的解释，请参见

backend/utils/adt/selfuncs.c

中的

检查变量

中的此注释，该注释尝试查找有关表达式的统计数据：

*它有统计数据吗？我们只考虑统计数据。
*非部分索引，因为部分索引可能
*不反映整个关系统计；以上
*检查唯一性是我们从中获取的唯一信息
*部分索引。

谢谢你的回答。在索引（lower（rt））中使用函数的问题？或者在部分索引的情况下使用该函数

因为，如果我添加一个单独的字段“latest”：

并执行以下查询：

select * 
from testing.test 
where lower(ts)> '2017-06-17 20:00:00'::timestamp and upper_inf(ts)

select * 
from testing.test 
where lower(ts)> '2017-06-17 20:00:00'::timestamp and latest = true

我的结果是：

Index Scan using lower_latest_rt on test  (cost=0.04..11406.44 rows=285833 width=23) (actual time=0.027..178.054 rows=283920 loops=1)
Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 1.788 ms
Execution time: 188.481 ms