Postgresql JSONB索引是否比本机索引慢？_Postgresql_Postgresql 9.6

Postgresql JSONB索引是否比本机索引慢？

postgresql

Postgresql JSONB索引是否比本机索引慢？,postgresql,postgresql-9.6,Postgresql,Postgresql 9.6,我有一个大表（3000万行），它有~10个jsonbB树索引当我使用很少的条件创建查询时，查询速度相对较快当我添加更多条件时，尤其是带有稀疏jsonb索引的条件（例如，介于0和1000000之间的整数），查询速度会显著下降我想知道jsonb索引是否比本机索引慢？我是否希望通过切换到本机列而不是JSON来提高性能表格定义： id integer type text data jsonb company_index ARRAY exchange_ind

我有一个大表（3000万行），它有~10个

jsonb

B树索引

当我使用很少的条件创建查询时，查询速度相对较快

当我添加更多条件时，尤其是带有稀疏

jsonb

索引的条件（例如，介于0和1000000之间的整数），查询速度会显著下降

我想知道

jsonb

索引是否比本机索引慢？我是否希望通过切换到本机列而不是JSON来提高性能

表格定义：

id  integer 
type    text    
data    jsonb   
company_index   ARRAY   
exchange_index  ARRAY   
eligible boolean

SELECT id, data, type 
FROM collection.bundles    
WHERE ( (ARRAY['.X'] && bundles.exchange_index)  AND   
type IN ('discussion') AND  
( ((data->>'sentiment_score')::bigint > 0 AND 
(data->'display_tweet'->'stocktwit'->'id') IS NOT NULL) )  AND  
(  eligible = true  )  AND  
((data->'display_tweet'->'stocktwit')->>'id')::bigint IS NULL )  
ORDER BY id DESC   
LIMIT 50

Limit  (cost=0.56..16197.56 rows=50 width=212) (actual time=31900.874..31900.874 rows=0 loops=1)
  Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
  I/O Timings: read=7644.206 write=7.294
  ->  Index Scan using bundles2_id_desc_idx on bundles  (cost=0.56..2401044.17 rows=7412 width=212) (actual time=31900.871..31900.871 rows=0 loops=1)
        Filter: (eligible AND ('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text) AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
        Rows Removed by Filter: 16093269
        Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
        I/O Timings: read=7644.206 write=7.294
Planning time: 0.366 ms
Execution time: 31900.909 ms

查询示例：

id  integer 
type    text    
data    jsonb   
company_index   ARRAY   
exchange_index  ARRAY   
eligible boolean

SELECT id, data, type 
FROM collection.bundles    
WHERE ( (ARRAY['.X'] && bundles.exchange_index)  AND   
type IN ('discussion') AND  
( ((data->>'sentiment_score')::bigint > 0 AND 
(data->'display_tweet'->'stocktwit'->'id') IS NOT NULL) )  AND  
(  eligible = true  )  AND  
((data->'display_tweet'->'stocktwit')->>'id')::bigint IS NULL )  
ORDER BY id DESC   
LIMIT 50

Limit  (cost=0.56..16197.56 rows=50 width=212) (actual time=31900.874..31900.874 rows=0 loops=1)
  Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
  I/O Timings: read=7644.206 write=7.294
  ->  Index Scan using bundles2_id_desc_idx on bundles  (cost=0.56..2401044.17 rows=7412 width=212) (actual time=31900.871..31900.871 rows=0 loops=1)
        Filter: (eligible AND ('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text) AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
        Rows Removed by Filter: 16093269
        Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
        I/O Timings: read=7644.206 write=7.294
Planning time: 0.366 ms
Execution time: 31900.909 ms

输出：

id  integer 
type    text    
data    jsonb   
company_index   ARRAY   
exchange_index  ARRAY   
eligible boolean

SELECT id, data, type 
FROM collection.bundles    
WHERE ( (ARRAY['.X'] && bundles.exchange_index)  AND   
type IN ('discussion') AND  
( ((data->>'sentiment_score')::bigint > 0 AND 
(data->'display_tweet'->'stocktwit'->'id') IS NOT NULL) )  AND  
(  eligible = true  )  AND  
((data->'display_tweet'->'stocktwit')->>'id')::bigint IS NULL )  
ORDER BY id DESC   
LIMIT 50

Limit  (cost=0.56..16197.56 rows=50 width=212) (actual time=31900.874..31900.874 rows=0 loops=1)
  Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
  I/O Timings: read=7644.206 write=7.294
  ->  Index Scan using bundles2_id_desc_idx on bundles  (cost=0.56..2401044.17 rows=7412 width=212) (actual time=31900.871..31900.871 rows=0 loops=1)
        Filter: (eligible AND ('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text) AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
        Rows Removed by Filter: 16093269
        Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
        I/O Timings: read=7644.206 write=7.294
Planning time: 0.366 ms
Execution time: 31900.909 ms

注意： 在该查询中使用的每个

jsonb

条件上都有

jsonb

B树索引<代码>交易所指数和公司指数都有GIN指数

更新在Laurenz更改查询之后：

Limit  (cost=150634.15..150634.27 rows=50 width=211) (actual time=15925.828..15925.828 rows=0 loops=1)
  Buffers: shared hit=1137490 read=680349 written=2
  I/O Timings: read=2896.702 write=0.038
  ->  Sort  (cost=150634.15..150652.53 rows=7352 width=211) (actual time=15925.827..15925.827 rows=0 loops=1)
        Sort Key: bundles.id DESC
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=1137490 read=680349 written=2
        I/O Timings: read=2896.702 write=0.038
        ->  Bitmap Heap Scan on bundles  (cost=56666.15..150316.40 rows=7352 width=211) (actual time=15925.816..15925.816 rows=0 loops=1)
              Recheck Cond: (('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text))
              Filter: (eligible AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
              Rows Removed by Filter: 273230
              Heap Blocks: exact=175975
              Buffers: shared hit=1137490 read=680349 written=2
              I/O Timings: read=2896.702 write=0.038
              ->  BitmapAnd  (cost=56666.15..56666.15 rows=23817 width=0) (actual time=1895.890..1895.890 rows=0 loops=1)
                    Buffers: shared hit=37488 read=85559
                    I/O Timings: read=325.535
                    ->  Bitmap Index Scan on bundles2_exchange_index_ops_idx  (cost=0.00..6515.57 rows=863703 width=0) (actual time=218.690..218.690 rows=892669 loops=1)
                          Index Cond: ('{.X}'::text[] && exchange_index)
                          Buffers: shared hit=7 read=313
                          I/O Timings: read=1.458
                    ->  Bitmap Index Scan on bundles_eligible_idx  (cost=0.00..23561.74 rows=2476877 width=0) (actual time=436.719..436.719 rows=2569331 loops=1)
                          Index Cond: (eligible = true)
                          Buffers: shared hit=37473
                    ->  Bitmap Index Scan on bundles2_type_idx  (cost=0.00..26582.83 rows=2706276 width=0) (actual time=1052.267..1052.267 rows=2794517 loops=1)
                          Index Cond: (type = 'discussion'::text)
                          Buffers: shared hit=8 read=85246
                          I/O Timings: read=324.077
Planning time: 0.433 ms
Execution time: 15928.959 ms

所有你喜欢的索引都没有被使用，所以问题不在于它们是否快

这里有几个因素在起作用：

在索引扫描期间，看到

脏了的

和

写的

页面，我怀疑您的表中有相当多的“死元组”。当索引扫描访问它们并发现它们已死亡时，它会“杀死”这些索引项，以便后续的索引扫描不必重复该工作

如果重复查询，您可能会注意到块的数量和执行时间会减少

您可以通过在工作台上运行

VACUUM

或确保autovacuum足够频繁地处理工作台来减少该问题

然而，您的主要问题是

LIMIT

子句引诱PostgreSQL使用以下策略：

由于您只需要按索引顺序排列50个结果行，因此只需按索引顺序检查表行，并放弃所有与复杂条件不匹配的行，直到有50个结果

不幸的是，它必须扫描16093319行，直到找到它的50个命中。表“high

id

”末尾的行与条件不匹配。PostgreSQL不知道这种相关性

解决方案是阻止PostgreSQL走这条路。最简单的方法是删除

id

上的所有索引，但给出它的名称可能是不可行的

另一种方法是防止PostgreSQL在计划扫描时“看到”

LIMIT

子句：

SELECT id, data, type
FROM (SELECT id, data, type
      FROM collection.bundles
      WHERE /* all your complicated conditions */
      OFFSET 0) subquery
ORDER BY id DESC
LIMIT 50;

备注：您没有显示您的索引定义，但听起来您的索引定义相当多，可能太多了。索引很昂贵，所以请确保只定义那些给您带来明显好处的索引。

这太宽泛了；你不能笼统地回答这样的问题。请为您的一些查询添加

EXPLAIN（ANALYZE，BUFFERS）

output，然后我们就可以知道发生了什么。@LaurenzAlbe addedText，请不要图像。您还可以添加表和索引定义吗？谢谢。索引

的定义捆绑2\u id\u desc\u idx

？它是一个使用“id desc”作为定义的树索引哇，太棒了，谢谢。我现在要试试这个。是的，我们有很多索引-该应用程序是一个SaaS仪表盘，用于过滤推文，因此我认为很难删除它们，不幸的是，我们在frontendQuery上有很多过滤选项，时间缩短了一半，再次感谢！如果你有兴趣，请解释上面的帖子。还有工作要做，但我认为更可能是数据库重组…还有一件事，如果我更改了表，使所有复杂的条件都是单独表中的各个列（即使用id上的联接），您希望看到性能提升吗？。。。或者说不可能？看起来您的某些条件没有正确索引。很难说-您仍然没有告诉我

createindex

语句。