Postgresql Postgres选择限值为1的性能

Postgresql Postgres选择限值为1的性能,postgresql,Postgresql,我有一个表,在PostgreSQL中有大约5000万条记录。尝试通过“标记”筛选最“喜欢”的帖子。这两个字段都有b树索引。因为我得到了“爱”的标签 EXPLAIN analyse select user_id from posts where tags @> array['love'] order by likes desc nulls last limit 12 Limit (cost=0.57..218.52 rows=12 width=12) (actual time=2.658

我有一个表,在PostgreSQL中有大约5000万条记录。尝试通过“标记”筛选最“喜欢”的帖子。这两个字段都有b树索引。因为我得到了“爱”的标签

EXPLAIN analyse select user_id from posts where tags @> array['love'] order by likes desc nulls last limit 12

Limit  (cost=0.57..218.52 rows=12 width=12) (actual time=2.658..14.243 rows=12 loops=1)
  ->  Index Scan using idx_likes on posts  (cost=0.57..55759782.55 rows=3070010 width=12) (actual time=2.657..14.239 rows=12 loops=1)
        Filter: (tags @> '{love}'::text[])
        Rows Removed by Filter: 10584
Planning time: 0.297 ms
Execution time: 14.276 ms
14毫秒很好,但如果我试着为“塔米尔”拍摄,它会突然变成超过22秒!!显然,查询计划器做错了什么

EXPLAIN analyse select user_id from posts where tags @> array['tamir'] order by likes desc nulls last limit 12

Limit  (cost=0.57..25747.73 rows=12 width=12) (actual time=17552.406..22839.503 rows=12 loops=1)
  ->  Index Scan using idx_likes on posts  (cost=0.57..55759782.55 rows=25988 width=12) (actual time=17552.405..22839.484 rows=12 loops=1)
        Filter: (tags @> '{tamir}'::text[])
        Rows Removed by Filter: 11785083
Planning time: 0.253 ms
Execution time: 22839.569 ms
阅读后,我在ORDER BY中添加了“user_id”,而“tamir”的速度非常快,为0.2ms!现在它正在进行排序和位图堆扫描,而不是索引扫描

EXPLAIN analyse select user_id from posts where tags @> array['tamir'] order by likes desc nulls last, user_id limit 12

Limit  (cost=101566.17..101566.20 rows=12 width=12) (actual time=0.237..0.238 rows=12 loops=1)
  ->  Sort  (cost=101566.17..101631.14 rows=25988 width=12) (actual time=0.237..0.237 rows=12 loops=1)
        Sort Key: likes DESC NULLS LAST, user_id
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Bitmap Heap Scan on posts  (cost=265.40..100970.40 rows=25988 width=12) (actual time=0.074..0.214 rows=126 loops=1)
              Recheck Cond: (tags @> '{tamir}'::text[])
              Heap Blocks: exact=44
              ->  Bitmap Index Scan on idx_tags  (cost=0.00..258.91 rows=25988 width=0) (actual time=0.056..0.056 rows=126 loops=1)
                    Index Cond: (tags @> '{tamir}'::text[])
Planning time: 0.287 ms
Execution time: 0.277 ms
但是“爱”怎么办?现在它从14毫秒变为2.3秒

EXPLAIN analyse select user_id from posts where tags @> array['love'] order by likes desc nulls last, user_id limit 12

Limit  (cost=7347142.18..7347142.21 rows=12 width=12) (actual time=2360.784..2360.786 rows=12 loops=1)
  ->  Sort  (cost=7347142.18..7354817.20 rows=3070010 width=12) (actual time=2360.783..2360.784 rows=12 loops=1)
        Sort Key: likes DESC NULLS LAST, user_id
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Bitmap Heap Scan on posts  (cost=28316.58..7276762.77 rows=3070010 width=12) (actual time=595.274..2171.571 rows=1517679 loops=1)
              Recheck Cond: (tags @> '{love}'::text[])
              Heap Blocks: exact=642705
              ->  Bitmap Index Scan on idx_tags  (cost=0.00..27549.08 rows=3070010 width=0) (actual time=367.080..367.080 rows=1517679 loops=1)
                    Index Cond: (tags @> '{love}'::text[])
Planning time: 0.226 ms
Execution time: 2360.863 ms
有人能解释一下为什么会发生这种情况以及解决办法吗

更新


“标记”字段有gin索引,不是b树,只是输入错误。

b树索引对于搜索数组字段中的元素不是很有用。您应该从
标记
字段中删除b树索引,并改用gin索引:

drop index idx_tags;
create index idx_tags using gin(tags);
并且不要按
用户id添加订单
——这会破坏当有很多行带有您搜索的标记时,使用您的
idx\u likes
进行订购的可能性



另外,
likes
字段可能应该是
notnull默认值0

“tag”字段实际上有gin索引,而不是b树,很抱歉混淆了。我更新了问题。如果删除用户id,某些标记的性能会很差。更新like的默认值会影响性能吗?然后您可以尝试
altertable posts alter column tags set statistics 1000分析帖子。您的表可能太大,无法使用默认统计信息=100。看来“统计信息”解决了这个问题!谢谢