Postgresql 一个玩具数据集。我不能使用IndexOnlyScan,因为实际上,我需要8-10列来自main,同样的来自secondary,而且索引的大小将大于500gb(遗憾的是已经测试过了)。(关于仅加载id列:正确,但我想检查一下,稍后重新访问索引以获取main.

Postgresql 一个玩具数据集。我不能使用IndexOnlyScan,因为实际上,我需要8-10列来自main,同样的来自secondary,而且索引的大小将大于500gb(遗憾的是已经测试过了)。(关于仅加载id列:正确,但我想检查一下,稍后重新访问索引以获取main.,postgresql,query-optimization,Postgresql,Query Optimization,一个玩具数据集。我不能使用IndexOnlyScan,因为实际上,我需要8-10列来自main,同样的来自secondary,而且索引的大小将大于500gb(遗憾的是已经测试过了)。(关于仅加载id列:正确,但我想检查一下,稍后重新访问索引以获取main.main_列的成本是否值得我在排序中获得的速度增加。在您的示例中,main.main_列不会被检索。但我得到了大致的想法,并同意它)请参阅我的编辑。main.main\u列实际上由我提出的查询检索,而不是子查询。@Neamar上的索引点(mai


一个玩具数据集。我不能使用IndexOnlyScan,因为实际上,我需要8-10列来自main,同样的来自secondary,而且索引的大小将大于500gb(遗憾的是已经测试过了)。(关于仅加载id列:正确,但我想检查一下,稍后重新访问索引以获取main.main_列的成本是否值得我在排序中获得的速度增加。在您的示例中,main.main_列不会被检索。但我得到了大致的想法,并同意它)请参阅我的编辑。
main.main\u列
实际上由我提出的查询检索,而不是子查询。@Neamar
上的索引点(main\u列,id)
不是因为它只提供索引扫描,而是因为它允许避免排序。您可以轻松地在表和查询中添加一个额外的列,以击败IOS,并看到这一点。在不愿意通过嵌套循环向下推排序的情况下,计划者愿意利用排序索引。您可以ld要求改进规划器,但最好直接发送到pgsql黑客邮件列表,不是这样。@O.Jones我认为在一定规模下唯一的优化是JIT编译。其他优化可能只在一定规模下才有价值,但这已包含在成本估算中。因此,我们考虑并重新考虑了它们CTD基于成本,而不是仅仅考虑不考虑。规划者不考虑每种可能的方式来运行计划,但是如果不考虑一些可能性,它不会因为规模增大而开始考虑。
Limit  (cost=3742.93..3743.05 rows=50 width=12) (actual time=5.010..5.322 rows=50 loops=1)
  Output: main.id, main.main_column, secondary.secondary_column
  ->  Sort  (cost=3742.93..3743.76 rows=332 width=12) (actual time=5.006..5.094 rows=50 loops=1)
        Output: main.id, main.main_column, secondary.secondary_column
        Sort Key: main.id
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Nested Loop Left Join  (cost=11.42..3731.90 rows=332 width=12) (actual time=0.123..4.446 rows=334 loops=1)
              Output: main.id, main.main_column, secondary.secondary_column
              Inner Unique: true
              ->  Bitmap Heap Scan on public.main  (cost=11.00..1036.99 rows=332 width=8) (actual time=0.106..1.021 rows=334 loops=1)
                    Output: main.id, main.main_column
                    Recheck Cond: (main.main_column = 5)
                    Heap Blocks: exact=334
                    ->  Bitmap Index Scan on main_column  (cost=0.00..10.92 rows=332 width=0) (actual time=0.056..0.057 rows=334 loops=1)
                          Index Cond: (main.main_column = 5)
              ->  Index Scan using secondary_main_id on public.secondary  (cost=0.42..8.12 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=334)
                    Output: secondary.id, secondary.main_id, secondary.secondary_column
                    Index Cond: (secondary.main_id = main.id)
Planning Time: 0.761 ms
Execution Time: 5.423 ms
Limit  (cost=1048.44..1057.21 rows=1 width=12) (actual time=1.219..2.027 rows=50 loops=1)
  Output: m.id, m.main_column, secondary.secondary_column
  ->  Nested Loop Left Join  (cost=1048.44..1057.21 rows=1 width=12) (actual time=1.216..1.900 rows=50 loops=1)
        Output: m.id, m.main_column, secondary.secondary_column
        Inner Unique: true
        ->  Subquery Scan on m  (cost=1048.02..1048.77 rows=1 width=8) (actual time=1.201..1.515 rows=50 loops=1)
              Output: m.id, m.main_column
              Filter: (m.main_column = 5)
              ->  Limit  (cost=1048.02..1048.14 rows=50 width=8) (actual time=1.196..1.384 rows=50 loops=1)
                    Output: main.id, main.main_column
                    ->  Sort  (cost=1048.02..1048.85 rows=332 width=8) (actual time=1.194..1.260 rows=50 loops=1)
                          Output: main.id, main.main_column
                          Sort Key: main.id
                          Sort Method: top-N heapsort  Memory: 27kB
                          ->  Bitmap Heap Scan on public.main  (cost=11.00..1036.99 rows=332 width=8) (actual time=0.054..0.753 rows=334 loops=1)
                                Output: main.id, main.main_column
                                Recheck Cond: (main.main_column = 5)
                                Heap Blocks: exact=334
                                ->  Bitmap Index Scan on main_column  (cost=0.00..10.92 rows=332 width=0) (actual time=0.029..0.030 rows=334 loops=1)
                                      Index Cond: (main.main_column = 5)
        ->  Index Scan using secondary_main_id on public.secondary  (cost=0.42..8.44 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=50)
              Output: secondary.id, secondary.main_id, secondary.secondary_column
              Index Cond: (secondary.main_id = m.id)
Planning Time: 0.161 ms
Execution Time: 2.115 ms
create index main_column on main(main_column, id); 
-- prepare q3 AS
select m.id, main_column, secondary_column
from (
    select id, main_column
        , row_number() OVER (ORDER BY id, main_column) AS rn
    from main
    where main_column = 5
) m
left join secondary on m.id = secondary.main_id
WHERE m.rn <= 50
ORDER BY m.id
LIMIT 50
        ;
PREPARE q6 AS
WITH
-- MATERIALIZED -- not needed before version 12
xxx AS (
        SELECT DISTINCT x.id
        FROM main x
        WHERE x.main_column = 5
        ORDER BY x.id
        LIMIT 50
        )
select m.id, m.main_column, s.secondary_column
from main m
left join secondary s on m.id = s.main_id
WHERE EXISTS (
        SELECT *
        FROM xxx x WHERE x.id = m.id
        )
order by m.id
-- limit 50
        ;