Postgresql 减少表分区内存使用的建议（psql 11）_Postgresql_Rows_Partitioning_Partition_Postgresql 11

Postgresql 减少表分区内存使用的建议（psql 11）

postgresql

Postgresql 减少表分区内存使用的建议（psql 11）,postgresql,rows,partitioning,partition,postgresql-11,Postgresql,Rows,Partitioning,Partition,Postgresql 11,我有几个表将有2000-4000万行，因此我的查询通常需要花费大量时间来执行。在进行分区之前，是否有任何建议可以对查询进行故障排除/详细分析，比如大部分内存的消耗位置，或者有任何更多的建议此外，我也有一些用于分析的查询，这些查询运行在整个日期范围内（必须遍历整个数据）因此，我需要一个整体解决方案来保持基本查询的速度，并且分析查询不会因内存不足或数据库崩溃而失败一个表的大小接近120GB，其他表只有大量的行。我尝试以每周和每月日期为基础对表进行分区，但是查询的内存不足，在有分区的情况下锁的

我有几个表将有2000-4000万行，因此我的查询通常需要花费大量时间来执行。在进行分区之前，是否有任何建议可以对查询进行故障排除/详细分析，比如大部分内存的消耗位置，或者有任何更多的建议

此外，我也有一些用于分析的查询，这些查询运行在整个日期范围内（必须遍历整个数据）

因此，我需要一个整体解决方案来保持基本查询的速度，并且分析查询不会因内存不足或数据库崩溃而失败

一个表的大小接近120GB，其他表只有大量的行。我尝试以每周和每月日期为基础对表进行分区，但是查询的内存不足，在有分区的情况下锁的数量增加了一个很大的因素，正常的表查询需要13个锁，分区表上的查询需要250个锁（每月分区）和1000个锁（每周分区）。我读到，当我们有分区时，总有一个开销

分析查询：

SELECT id
from TABLE1
where id NOT IN (
   SELECT DISTINCT id
   FROM TABLE2
);

TABLE1

和

TABLE2

被划分，第一个按

event\u data\u timestamp

划分，第二个按

event\u timestamp

划分

分析查询耗尽内存并消耗大量锁，但基于日期的查询速度非常快

查询：

EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM Table1_monthly WHERE event_timestamp > '2019-01-01' and id NOT IN (SELECT DISTINCT id FROM Table2_monthly where event_data_timestamp > '2019-01-01');

 Append  (cost=32731.14..653650.98 rows=4656735 width=16) (actual time=2497.747..15405.447 rows=10121827 loops=1)
   Buffers: shared hit=3 read=169100
   ->  Seq Scan on TABLE1_monthly_2019_01_26  (cost=32731.14..77010.63 rows=683809 width=16) (actual time=2497.746..3489.767 rows=1156382 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
         Rows Removed by Filter: 462851
         Buffers: shared read=44559
         SubPlan 1
           ->  HashAggregate  (cost=32728.64..32730.64 rows=200 width=16) (actual time=248.084..791.054 rows=1314570 loops=6)
                 Group Key: TABLE2_monthly_2019_01_26.cid
                 Buffers: shared read=24568
                 ->  Append  (cost=0.00..32277.49 rows=180458 width=16) (actual time=22.969..766.903 rows=1314570 loops=1)
                       Buffers: shared read=24568
                       ->  Seq Scan on TABLE2_monthly_2019_01_26  (cost=0.00..5587.05 rows=32135 width=16) (actual time=22.965..123.734 rows=211977 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                             Rows Removed by Filter: 40282
                             Buffers: shared read=4382
                       ->  Seq Scan on TABLE2_monthly_2019_02_25  (cost=0.00..5573.02 rows=32054 width=16) (actual time=0.700..121.657 rows=241977 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                             Buffers: shared read=4371
                       ->  Seq Scan on TABLE2_monthly_2019_03_27  (cost=0.00..5997.60 rows=34496 width=16) (actual time=0.884..123.043 rows=253901 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                             Buffers: shared read=4704
                       ->  Seq Scan on TABLE2_monthly_2019_04_26  (cost=0.00..6581.55 rows=37855 width=16) (actual time=0.690..129.537 rows=282282 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                             Buffers: shared read=5162
                       ->  Seq Scan on TABLE2_monthly_2019_05_26  (cost=0.00..6585.38 rows=37877 width=16) (actual time=1.248..122.794 rows=281553 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                             Buffers: shared read=5165
                       ->  Seq Scan on TABLE2_monthly_2019_06_25  (cost=0.00..999.60 rows=5749 width=16) (actual time=0.750..23.020 rows=42880 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                             Buffers: shared read=784
                       ->  Seq Scan on TABLE2_monthly_2019_07_25  (cost=0.00..12.75 rows=73 width=16) (actual time=0.007..0.007 rows=0 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                       ->  Seq Scan on TABLE2_monthly_2019_08_24  (cost=0.00..12.75 rows=73 width=16) (actual time=0.003..0.004 rows=0 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                       ->  Seq Scan on TABLE2_monthly_2019_09_23  (cost=0.00..12.75 rows=73 width=16) (actual time=0.003..0.004 rows=0 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
                       ->  Seq Scan on TABLE2_monthly_2019_10_23  (cost=0.00..12.75 rows=73 width=16) (actual time=0.007..0.007 rows=0 loops=1)
                             Filter: (event_data_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone)
   ->  Seq Scan on TABLE1_monthly_2019_02_25  (cost=32731.14..88679.16 rows=1022968 width=16) (actual time=1008.738..2341.807 rows=1803957 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
         Rows Removed by Filter: 241978
         Buffers: shared hit=1 read=25258
   ->  Seq Scan on TABLE1_monthly_2019_03_27  (cost=32731.14..97503.58 rows=1184315 width=16) (actual time=1000.795..2474.769 rows=2114729 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
         Rows Removed by Filter: 253901
         Buffers: shared hit=1 read=29242
   ->  Seq Scan on TABLE1_monthly_2019_04_26  (cost=32731.14..105933.54 rows=1338447 width=16) (actual time=892.820..2405.941 rows=2394619 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
         Rows Removed by Filter: 282282
         Buffers: shared hit=1 read=33048
   ->  Seq Scan on TABLE1_monthly_2019_05_26  (cost=32731.14..87789.65 rows=249772 width=16) (actual time=918.397..2614.059 rows=2340789 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
         Rows Removed by Filter: 281553
         Buffers: shared read=32579
   ->  Seq Scan on TABLE1_monthly_2019_06_25  (cost=32731.14..42458.60 rows=177116 width=16) (actual time=923.367..1141.672 rows=311351 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
         Rows Removed by Filter: 42880
         Buffers: shared read=4414
   ->  Seq Scan on TABLE1_monthly_2019_07_25  (cost=32731.14..32748.04 rows=77 width=16) (actual time=0.008..0.008 rows=0 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
   ->  Seq Scan on TABLE1_monthly_2019_08_24  (cost=32731.14..32748.04 rows=77 width=16) (actual time=0.003..0.003 rows=0 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
   ->  Seq Scan on TABLE1_monthly_2019_09_23  (cost=32731.14..32748.04 rows=77 width=16) (actual time=0.003..0.003 rows=0 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
   ->  Seq Scan on TABLE1_monthly_2019_10_23  (cost=32731.14..32748.04 rows=77 width=16) (actual time=0.003..0.003 rows=0 loops=1)
         Filter: ((event_timestamp > '2019-01-01 00:00:00+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
 Planning Time: 244.669 ms
 Execution Time: 15959.111 ms
(69 rows)

一个连接两个大分区表以生成1000万行的查询将消耗资源，这是无法避免的

您可以通过减少

工作记忆

，以内存消耗换取速度：较小的VAKUE将使查询速度变慢，但占用的内存更少

我认为最好的办法是保持

work\u mem

高，但要减少

max\u connections

，这样就不会很快耗尽内存。此外，将更多RAM放入机器是最便宜的硬件调优技术之一

您可以稍微改进查询：

删除
```
独立的
```
，它是无用的，会消耗CPU资源，并且会使您的估计值偏离
```
分析表2
```
，以便获得更好的估计值

关于分区：如果这些查询扫描所有分区，那么对于分区表，查询速度会变慢

分区对您是否是一个好主意取决于您是否有其他查询受益于分区：

首先也是最重要的一点是，大规模删除，这是通过删除分区实现的
顺序扫描，其中分区键是扫描筛选器的一部分

与流行的观点相反，如果您有大的表，分区并不是总能让您受益的东西：许多查询会因为分区而变慢

锁是您最不担心的：只需增加

max\u locks\u per\u transaction

在这两种情况下分区列的名称是什么？非常感谢您提供了这个解释得很好的答案，我一直在与我的团队讨论这个问题，根据您的建议，分区在一半的查询中帮助了我们，但另一半仍然失败，我们将再试一次。非常感谢你这么漂亮的解释。