Postgresql 这是在Postgres中创建部分索引的正确方法吗？_Postgresql_Indexing_Postgresql 9.3_Where In_Partial Index

Postgresql 这是在Postgres中创建部分索引的正确方法吗？

postgresql indexing

Postgresql 这是在Postgres中创建部分索引的正确方法吗？,postgresql,indexing,postgresql-9.3,where-in,partial-index,Postgresql,Indexing,Postgresql 9.3,Where In,Partial Index,我们有一个包含400万条记录的表，对于一个特定的常用用例，我们只对salesforce用户类型为“Standard”的记录感兴趣，400万条记录中只有大约10000条记录。可能存在的其他用户类型有“PowerPartner”、“CSLitePortal”、“CustomerSuccess”、“PowerCustomerSuccess”和“CsnOnly” 所以对于这个用例，我认为创建一个部分索引会更好因此，我计划创建此部分索引，以加快对用户类型为“Standard”的记录的查询，并防止来自we

我们有一个包含400万条记录的表，对于一个特定的常用用例，我们只对salesforce用户类型为“Standard”的记录感兴趣，400万条记录中只有大约10000条记录。可能存在的其他用户类型有“PowerPartner”、“CSLitePortal”、“CustomerSuccess”、“PowerCustomerSuccess”和“CsnOnly”

所以对于这个用例，我认为创建一个部分索引会更好

因此，我计划创建此部分索引，以加快对用户类型为“Standard”的记录的查询，并防止来自web的请求超时：

CREATE INDEX user_type_idx ON user_table(userType)
WHERE userType NOT IN
   ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');

查找查询将被删除

select * from user_table where userType='Standard';

请确认这是否是创建部分索引的正确方法？这将非常有帮助。

要使用索引，必须在编写查询时在查询中使用

WHERE

条件

PostgreSQL具有一定的扣减能力，但它无法推断

userType='Standard'

与索引中的条件等效

使用

EXPLAIN

确定是否可以使用索引。

要使用索引，必须在编写查询时在查询中使用

WHERE

条件

PostgreSQL具有一定的扣减能力，但它无法推断

userType='Standard'

与索引中的条件等效

使用

EXPLAIN

了解您的索引是否可以使用。

Postgres可以使用该索引，但其使用方式（略）比指定

的索引效率低，其中user\u type='Standard'

我创建了一个包含400万行的小测试表，其中10.000行具有用户类型

'Standard'

。其他值使用以下脚本随机分布：

create table user_table
(
  id serial primary key,
  some_date date not null,
  user_type text not null,
  some_ts timestamp not null, 
  some_number integer not null, 
  some_data text, 
  some_flag boolean
);

insert into user_table (some_date, user_type, some_ts, some_number, some_data, some_flag)
select current_date, 
       case (random() * 4 + 1)::int
         when 1 then 'PowerPartner'
         when 2 then 'CSPLitePortal'
         when 3 then 'CustomerSuccess'
         when 4 then 'PowerCustomerSuccess'
         when 5 then 'CsnOnly'
       end, 
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,4e6 - 10000) as t(i)
union all 
select current_date, 
       'Standard',
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,10000) as t(i);

（我创建的表格不只是几列，因为规划者的选择也受表格大小和宽度的影响）

第一次测试使用索引NOT IN：

create index ix_not_in on user_table(user_type) 
where user_type not in ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');

结果：

查询计划
--------------------------------------------------------------------------------------------------------------------------------
stuff.user_表上的位图堆扫描（成本=139.68..14631.83行=11598宽度=139）（实际时间=1.035..2.171行=10000循环=1）
输出：id、一些日期、用户类型、一些号码、一些数据、一些标志
复查条件：（用户\表格。用户\类型='标准'：：文本）
缓冲区：共享命中=262
->ix上的位图索引扫描不在（成本=0.00..136.79行=11598宽度=0）（实际时间=1.007..1.007行=10000循环=1）
索引条件：（user\u table.user\u type='Standard'：：text）
缓冲区：共享命中=40
总运行时间：2.506毫秒

（以上是我运行语句大约10次以消除缓存问题后的典型执行时间）

正如您所见，planner使用位图索引扫描，这是一种“有损”扫描，需要额外的步骤来过滤误报

使用以下索引时：

create index ix_standard on user_table(id) 
where user_type = 'Standard';

这将导致以下计划：

查询计划
----------------------------------------------------------------------------------------------------------------------------------------
使用stuff.user_表上的ix_标准进行索引扫描（成本=0.29..443.16行=10267宽度=139）（实际时间=0.011..1.498行=10000循环=1）
输出：id、一些日期、用户类型、一些号码、一些数据、一些标志
缓冲区：共享命中=313
总运行时间：1.815毫秒

结论:

您的索引已被使用，但仅针对您感兴趣的类型的索引更有效

运行时并没有太大的不同。我对每个解释执行了大约10次，并且

ix_标准

索引的平均值略低于2ms，而

ix_非

索引的平均值略高于2ms，因此没有真正的性能差异

但一般来说，随着表大小的增加，索引扫描的伸缩性会比位图索引扫描更好。这基本上是由于“重新检查条件”——特别是如果没有足够的内存来保存位图（对于较大的表）。

Postgres可以使用它，但它的使用方式（稍微）比指定

其中用户类型为“标准”的索引效率低
我创建了一个包含400万行的小测试表，其中10.000行具有用户类型'Standard'
。其他值使用以下脚本随机分布：
create table user_table
(
  id serial primary key,
  some_date date not null,
  user_type text not null,
  some_ts timestamp not null, 
  some_number integer not null, 
  some_data text, 
  some_flag boolean
);

insert into user_table (some_date, user_type, some_ts, some_number, some_data, some_flag)
select current_date, 
       case (random() * 4 + 1)::int
         when 1 then 'PowerPartner'
         when 2 then 'CSPLitePortal'
         when 3 then 'CustomerSuccess'
         when 4 then 'PowerCustomerSuccess'
         when 5 then 'CsnOnly'
       end, 
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,4e6 - 10000) as t(i)
union all 
select current_date, 
       'Standard',
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,10000) as t(i);

（我创建的表格不只是几列，因为规划者的选择也受表格大小和宽度的影响）
第一次测试使用索引NOT IN：
create index ix_not_in on user_table(user_type) 
where user_type not in ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');

结果：
查询计划
----------------------------------------------------------------------------