PostgreSQL:BRIN索引工作不正常(有损块太多)

PostgreSQL:BRIN索引工作不正常(有损块太多),postgresql,indexing,Postgresql,Indexing,我有一个很大的表,我将行分批放入其中(每个表大约有1M行,现在表中总共有96个批),插入行,并更新一定数量的行(但总是插入相同的批id),整个批在一个事务中完成。我插入了最后一个批次,并使用该批次id(id_transformace=1333)的where条件进行了查询。它花了很长时间才完成,而且产生了太多有损耗的块。我不明白为什么,因为只有有损块应该在上一批和上一批之间的“边缘”。我在其他表中也遇到了这个问题,我在这些表中批量插入行(那些表中甚至没有更新部分)。有人能解释一下是什么导致了这个问

我有一个很大的表,我将行分批放入其中(每个表大约有1M行,现在表中总共有96个批),插入行,并更新一定数量的行(但总是插入相同的批id),整个批在一个事务中完成。我插入了最后一个批次,并使用该批次id(id_transformace=1333)的where条件进行了查询。它花了很长时间才完成,而且产生了太多有损耗的块。我不明白为什么,因为只有有损块应该在上一批和上一批之间的“边缘”。我在其他表中也遇到了这个问题,我在这些表中批量插入行(那些表中甚至没有更新部分)。有人能解释一下是什么导致了这个问题吗。 我有一点关于自动真空干扰桌子物理间距的理论,但我不完全了解它,所以我非常高兴听到有经验的人的见解

第一次尝试:

"Limit  (cost=313.52..2349.39 rows=1000 width=207) (actual time=10390.834..10516.425 rows=1000 loops=1)"
"  Output: id, date_key_trainjr,..."
"  Buffers: shared hit=247024 read=46542"
"  I/O Timings: read=1554.586"
"  ->  Bitmap Heap Scan on reports.cdc_s5_gpps  (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=10390.832..10516.180 rows=1000 loops=1)"
"        Output: id, date_key_trainjr, ..."
"        Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
"        Rows Removed by Index Recheck: 5190343"
"        Heap Blocks: lossy=293516"
"        Buffers: shared hit=247024 read=46542"
"        I/O Timings: read=1554.586"
"        ->  Bitmap Index Scan on index_cdc_s5_gpps_tran_br  (cost=0.00..53.08 rows=1185418 width=0) (actual time=17.484..17.484 rows=3512320 loops=1)"
"              Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
"              Buffers: shared hit=50"
"Planning Time: 0.430 ms"
"Execution Time: 10516.683 ms"
"Limit  (cost=313.52..2349.39 rows=1000 width=207) (actual time=40308.886..40459.645 rows=1000 loops=1)"
"  Output: id, date_key_trainjr,..."
"  Buffers: shared hit=11 read=293555"
"  I/O Timings: read=13316.262"
"  ->  Bitmap Heap Scan on reports.cdc_s5_gpps  (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=40308.867..40459.386 rows=1000 loops=1)"
"        Output: id, date_key_trainjr,..."
"        Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
"        Rows Removed by Index Recheck: 5190343"
"        Heap Blocks: lossy=293516"
"        Buffers: shared hit=11 read=293555"
"        I/O Timings: read=13316.262"
"        ->  Bitmap Index Scan on index_cdc_s5_gpps_tran_br  (cost=0.00..53.08 rows=1185418 width=0) (actual time=23.991..23.991 rows=3512320 loops=1)"
"              Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
"              Buffers: shared hit=11 read=39"
"              I/O Timings: read=5.167"
"Planning Time: 1.521 ms"
"Execution Time: 40460.087 ms"
CREATE INDEX index_cdc_s5_gpps_tran_br ON cdc_s5_gpps USING brin (id_transformace) WITH (pages_per_range='256')
第二次尝试:

"Limit  (cost=313.52..2349.39 rows=1000 width=207) (actual time=10390.834..10516.425 rows=1000 loops=1)"
"  Output: id, date_key_trainjr,..."
"  Buffers: shared hit=247024 read=46542"
"  I/O Timings: read=1554.586"
"  ->  Bitmap Heap Scan on reports.cdc_s5_gpps  (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=10390.832..10516.180 rows=1000 loops=1)"
"        Output: id, date_key_trainjr, ..."
"        Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
"        Rows Removed by Index Recheck: 5190343"
"        Heap Blocks: lossy=293516"
"        Buffers: shared hit=247024 read=46542"
"        I/O Timings: read=1554.586"
"        ->  Bitmap Index Scan on index_cdc_s5_gpps_tran_br  (cost=0.00..53.08 rows=1185418 width=0) (actual time=17.484..17.484 rows=3512320 loops=1)"
"              Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
"              Buffers: shared hit=50"
"Planning Time: 0.430 ms"
"Execution Time: 10516.683 ms"
"Limit  (cost=313.52..2349.39 rows=1000 width=207) (actual time=40308.886..40459.645 rows=1000 loops=1)"
"  Output: id, date_key_trainjr,..."
"  Buffers: shared hit=11 read=293555"
"  I/O Timings: read=13316.262"
"  ->  Bitmap Heap Scan on reports.cdc_s5_gpps  (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=40308.867..40459.386 rows=1000 loops=1)"
"        Output: id, date_key_trainjr,..."
"        Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
"        Rows Removed by Index Recheck: 5190343"
"        Heap Blocks: lossy=293516"
"        Buffers: shared hit=11 read=293555"
"        I/O Timings: read=13316.262"
"        ->  Bitmap Index Scan on index_cdc_s5_gpps_tran_br  (cost=0.00..53.08 rows=1185418 width=0) (actual time=23.991..23.991 rows=3512320 loops=1)"
"              Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
"              Buffers: shared hit=11 read=39"
"              I/O Timings: read=5.167"
"Planning Time: 1.521 ms"
"Execution Time: 40460.087 ms"
CREATE INDEX index_cdc_s5_gpps_tran_br ON cdc_s5_gpps USING brin (id_transformace) WITH (pages_per_range='256')
索引定义:

"Limit  (cost=313.52..2349.39 rows=1000 width=207) (actual time=10390.834..10516.425 rows=1000 loops=1)"
"  Output: id, date_key_trainjr,..."
"  Buffers: shared hit=247024 read=46542"
"  I/O Timings: read=1554.586"
"  ->  Bitmap Heap Scan on reports.cdc_s5_gpps  (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=10390.832..10516.180 rows=1000 loops=1)"
"        Output: id, date_key_trainjr, ..."
"        Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
"        Rows Removed by Index Recheck: 5190343"
"        Heap Blocks: lossy=293516"
"        Buffers: shared hit=247024 read=46542"
"        I/O Timings: read=1554.586"
"        ->  Bitmap Index Scan on index_cdc_s5_gpps_tran_br  (cost=0.00..53.08 rows=1185418 width=0) (actual time=17.484..17.484 rows=3512320 loops=1)"
"              Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
"              Buffers: shared hit=50"
"Planning Time: 0.430 ms"
"Execution Time: 10516.683 ms"
"Limit  (cost=313.52..2349.39 rows=1000 width=207) (actual time=40308.886..40459.645 rows=1000 loops=1)"
"  Output: id, date_key_trainjr,..."
"  Buffers: shared hit=11 read=293555"
"  I/O Timings: read=13316.262"
"  ->  Bitmap Heap Scan on reports.cdc_s5_gpps  (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=40308.867..40459.386 rows=1000 loops=1)"
"        Output: id, date_key_trainjr,..."
"        Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
"        Rows Removed by Index Recheck: 5190343"
"        Heap Blocks: lossy=293516"
"        Buffers: shared hit=11 read=293555"
"        I/O Timings: read=13316.262"
"        ->  Bitmap Index Scan on index_cdc_s5_gpps_tran_br  (cost=0.00..53.08 rows=1185418 width=0) (actual time=23.991..23.991 rows=3512320 loops=1)"
"              Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
"              Buffers: shared hit=11 read=39"
"              I/O Timings: read=5.167"
"Planning Time: 1.521 ms"
"Execution Time: 40460.087 ms"
CREATE INDEX index_cdc_s5_gpps_tran_br ON cdc_s5_gpps USING brin (id_transformace) WITH (pages_per_range='256')
块(页)大小:8192B 平均行大小215B 因此,每个BRIN范围9754行,这应该是索引重新检查(而不是5.1M)删除的行的最大数量(最坏情况)

我还试图扩大工作记忆

set work_mem = '2 GB'
但对通过重新检查删除的有损块和行数没有影响

Rows Removed by Index Recheck: 5190343
Heap Blocks: lossy=293516
如果我将工作内存设置为可能的最小大小(64kB),则得到相同的数字,因此这不是位图不适合工作内存的问题(我在其他堆栈讨论中了解到了这一点,但我的情况似乎并非如此)

x86_64-pc-linux-gnu上的PostgreSQL 11.8,由gcc(gcc)4.8.5编译 20150623(Red Hat 4.8.5-39),64位


即使在与INSERT相同的事务中,更新也会在表中留下漏洞。如果表被清空,那么将来的一些插入将用无序元组填充这些漏洞,从而导致BRIN索引降级。您是否可以在批量加载之前进行更新,可能是在临时表或临时表中

我不明白为什么,因为只有有损块应该在上一批和上一批之间的“边缘”


BRIN索引只返回有损块。这就是他们所能做的。

我可以在临时表中进行更新,……但是我想如果我删除某个批次,它会产生同样的问题,但是如果我在新插入之前删除之后再执行,在删除的情况下,真空完全帮助我吗?