PostgreSQL:BRIN索引工作不正常(有损块太多)
我有一个很大的表,我将行分批放入其中(每个表大约有1M行,现在表中总共有96个批),插入行,并更新一定数量的行(但总是插入相同的批id),整个批在一个事务中完成。我插入了最后一个批次,并使用该批次id(id_transformace=1333)的where条件进行了查询。它花了很长时间才完成,而且产生了太多有损耗的块。我不明白为什么,因为只有有损块应该在上一批和上一批之间的“边缘”。我在其他表中也遇到了这个问题,我在这些表中批量插入行(那些表中甚至没有更新部分)。有人能解释一下是什么导致了这个问题吗。 我有一点关于自动真空干扰桌子物理间距的理论,但我不完全了解它,所以我非常高兴听到有经验的人的见解 第一次尝试:PostgreSQL:BRIN索引工作不正常(有损块太多),postgresql,indexing,Postgresql,Indexing,我有一个很大的表,我将行分批放入其中(每个表大约有1M行,现在表中总共有96个批),插入行,并更新一定数量的行(但总是插入相同的批id),整个批在一个事务中完成。我插入了最后一个批次,并使用该批次id(id_transformace=1333)的where条件进行了查询。它花了很长时间才完成,而且产生了太多有损耗的块。我不明白为什么,因为只有有损块应该在上一批和上一批之间的“边缘”。我在其他表中也遇到了这个问题,我在这些表中批量插入行(那些表中甚至没有更新部分)。有人能解释一下是什么导致了这个问
"Limit (cost=313.52..2349.39 rows=1000 width=207) (actual time=10390.834..10516.425 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Buffers: shared hit=247024 read=46542"
" I/O Timings: read=1554.586"
" -> Bitmap Heap Scan on reports.cdc_s5_gpps (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=10390.832..10516.180 rows=1000 loops=1)"
" Output: id, date_key_trainjr, ..."
" Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Rows Removed by Index Recheck: 5190343"
" Heap Blocks: lossy=293516"
" Buffers: shared hit=247024 read=46542"
" I/O Timings: read=1554.586"
" -> Bitmap Index Scan on index_cdc_s5_gpps_tran_br (cost=0.00..53.08 rows=1185418 width=0) (actual time=17.484..17.484 rows=3512320 loops=1)"
" Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Buffers: shared hit=50"
"Planning Time: 0.430 ms"
"Execution Time: 10516.683 ms"
"Limit (cost=313.52..2349.39 rows=1000 width=207) (actual time=40308.886..40459.645 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Buffers: shared hit=11 read=293555"
" I/O Timings: read=13316.262"
" -> Bitmap Heap Scan on reports.cdc_s5_gpps (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=40308.867..40459.386 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Rows Removed by Index Recheck: 5190343"
" Heap Blocks: lossy=293516"
" Buffers: shared hit=11 read=293555"
" I/O Timings: read=13316.262"
" -> Bitmap Index Scan on index_cdc_s5_gpps_tran_br (cost=0.00..53.08 rows=1185418 width=0) (actual time=23.991..23.991 rows=3512320 loops=1)"
" Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Buffers: shared hit=11 read=39"
" I/O Timings: read=5.167"
"Planning Time: 1.521 ms"
"Execution Time: 40460.087 ms"
CREATE INDEX index_cdc_s5_gpps_tran_br ON cdc_s5_gpps USING brin (id_transformace) WITH (pages_per_range='256')
第二次尝试:
"Limit (cost=313.52..2349.39 rows=1000 width=207) (actual time=10390.834..10516.425 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Buffers: shared hit=247024 read=46542"
" I/O Timings: read=1554.586"
" -> Bitmap Heap Scan on reports.cdc_s5_gpps (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=10390.832..10516.180 rows=1000 loops=1)"
" Output: id, date_key_trainjr, ..."
" Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Rows Removed by Index Recheck: 5190343"
" Heap Blocks: lossy=293516"
" Buffers: shared hit=247024 read=46542"
" I/O Timings: read=1554.586"
" -> Bitmap Index Scan on index_cdc_s5_gpps_tran_br (cost=0.00..53.08 rows=1185418 width=0) (actual time=17.484..17.484 rows=3512320 loops=1)"
" Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Buffers: shared hit=50"
"Planning Time: 0.430 ms"
"Execution Time: 10516.683 ms"
"Limit (cost=313.52..2349.39 rows=1000 width=207) (actual time=40308.886..40459.645 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Buffers: shared hit=11 read=293555"
" I/O Timings: read=13316.262"
" -> Bitmap Heap Scan on reports.cdc_s5_gpps (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=40308.867..40459.386 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Rows Removed by Index Recheck: 5190343"
" Heap Blocks: lossy=293516"
" Buffers: shared hit=11 read=293555"
" I/O Timings: read=13316.262"
" -> Bitmap Index Scan on index_cdc_s5_gpps_tran_br (cost=0.00..53.08 rows=1185418 width=0) (actual time=23.991..23.991 rows=3512320 loops=1)"
" Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Buffers: shared hit=11 read=39"
" I/O Timings: read=5.167"
"Planning Time: 1.521 ms"
"Execution Time: 40460.087 ms"
CREATE INDEX index_cdc_s5_gpps_tran_br ON cdc_s5_gpps USING brin (id_transformace) WITH (pages_per_range='256')
索引定义:
"Limit (cost=313.52..2349.39 rows=1000 width=207) (actual time=10390.834..10516.425 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Buffers: shared hit=247024 read=46542"
" I/O Timings: read=1554.586"
" -> Bitmap Heap Scan on reports.cdc_s5_gpps (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=10390.832..10516.180 rows=1000 loops=1)"
" Output: id, date_key_trainjr, ..."
" Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Rows Removed by Index Recheck: 5190343"
" Heap Blocks: lossy=293516"
" Buffers: shared hit=247024 read=46542"
" I/O Timings: read=1554.586"
" -> Bitmap Index Scan on index_cdc_s5_gpps_tran_br (cost=0.00..53.08 rows=1185418 width=0) (actual time=17.484..17.484 rows=3512320 loops=1)"
" Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Buffers: shared hit=50"
"Planning Time: 0.430 ms"
"Execution Time: 10516.683 ms"
"Limit (cost=313.52..2349.39 rows=1000 width=207) (actual time=40308.886..40459.645 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Buffers: shared hit=11 read=293555"
" I/O Timings: read=13316.262"
" -> Bitmap Heap Scan on reports.cdc_s5_gpps (cost=313.52..2121240.53 rows=1041780 width=207) (actual time=40308.867..40459.386 rows=1000 loops=1)"
" Output: id, date_key_trainjr,..."
" Recheck Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Rows Removed by Index Recheck: 5190343"
" Heap Blocks: lossy=293516"
" Buffers: shared hit=11 read=293555"
" I/O Timings: read=13316.262"
" -> Bitmap Index Scan on index_cdc_s5_gpps_tran_br (cost=0.00..53.08 rows=1185418 width=0) (actual time=23.991..23.991 rows=3512320 loops=1)"
" Index Cond: (cdc_s5_gpps.id_transformace = 1333)"
" Buffers: shared hit=11 read=39"
" I/O Timings: read=5.167"
"Planning Time: 1.521 ms"
"Execution Time: 40460.087 ms"
CREATE INDEX index_cdc_s5_gpps_tran_br ON cdc_s5_gpps USING brin (id_transformace) WITH (pages_per_range='256')
块(页)大小:8192B
平均行大小215B
因此,每个BRIN范围9754行,这应该是索引重新检查(而不是5.1M)删除的行的最大数量(最坏情况)
我还试图扩大工作记忆
set work_mem = '2 GB'
但对通过重新检查删除的有损块和行数没有影响
Rows Removed by Index Recheck: 5190343
Heap Blocks: lossy=293516
如果我将工作内存设置为可能的最小大小(64kB),则得到相同的数字,因此这不是位图不适合工作内存的问题(我在其他堆栈讨论中了解到了这一点,但我的情况似乎并非如此)
x86_64-pc-linux-gnu上的PostgreSQL 11.8,由gcc(gcc)4.8.5编译
20150623(Red Hat 4.8.5-39),64位
即使在与INSERT相同的事务中,更新也会在表中留下漏洞。如果表被清空,那么将来的一些插入将用无序元组填充这些漏洞,从而导致BRIN索引降级。您是否可以在批量加载之前进行更新,可能是在临时表或临时表中 我不明白为什么,因为只有有损块应该在上一批和上一批之间的“边缘”
BRIN索引只返回有损块。这就是他们所能做的。我可以在临时表中进行更新,……但是我想如果我删除某个批次,它会产生同样的问题,但是如果我在新插入之前删除之后再执行,在删除的情况下,真空完全帮助我吗?