Sql Postgres 12或13中UUID的分区问题_Sql_Postgresql_Database Design_Uuid_Partitioning

Sql Postgres 12或13中UUID的分区问题

sql postgresql database-design

Sql Postgres 12或13中UUID的分区问题,sql,postgresql,database-design,uuid,partitioning,Sql,Postgresql,Database Design,Uuid,Partitioning,我被要求在一个新的表格中复制一部分数据给博士后。数据包含部件列表，在下表定义中进行了简化： CREATE TABLE IF NOT EXISTS assembly_item ( id uuid NOT NULL DEFAULT NULL, assembly_id. uuid. NOT NULL DEFAULT NULL, done_dts timestamp NOT NULL DEFAULT

我被要求在一个新的表格中复制一部分数据给博士后。数据包含部件列表，在下表定义中进行了简化：

CREATE TABLE IF NOT EXISTS assembly_item (
    id               uuid       NOT NULL DEFAULT  NULL,
    assembly_id.     uuid.      NOT NULL DEFAULT  NULL,
    done_dts         timestamp  NOT NULL DEFAULT 'epoch', 

CONSTRAINT assembly_item_pk
    PRIMARY KEY (id) 
);

原始文档中有几十个属性，目前只有几亿行。这些记录分布在多个安装中，不存储在本地Postgres中。据猜测，这个表上的插入数加起来很快，在一年内将增长到10亿行。日期很少更新，也从不删除。（这可能会及时发生，但并不经常发生。）相同的

id

不会与不同的

assembly\u id

值重复。因此，

id

上分区级别的unique是安全的。这里的目标是将这些数据卸载到Postgres上，并在本地服务器的缓存中只保留最新的数据

这看起来是分区的自然选择，我正在寻找一些关于合理策略的指导。您可以从简化的结构中看到，我们有一个唯一的行

id

，一个父

assembly\u id

，以及一个时间戳。我查看了原始数据库中的现有查询，主要搜索字段是父记录标识符

assembly\u id

。

assembly

和

assembly\u item

之间的基数约为1:200

为了使分区变得最有用，似乎需要根据一个值来分割数据，该值使查询计划器能够智能地修剪分区。我已经想到了一些想法，但还没有200米的赛道可以再次测试。同时，我考虑的是：

使用

范围

或

完成数据的YYYY-MM
上的列表
按月分区。将所有查询重写为按日期范围的范围


根据assembly_id:：text
的前两个字符进行HASH
分区，这样我就有了256个大小相当的分区。我认为这可以让我们搜索assembly\u id
并删除许多没有匹配项的分区，但是当我设置它时，它看起来很奇怪


我很感激我问了一个有点推测性的问题，我希望这里有一些能让我的第一次尝试更成功的建议。一旦我有了一些数据集，我就可以更直接地进行实验了
我已经包括了实验性的设置代码，为了简洁起见，只列出了分区的一个示例
使用列表分区的示例设置
使用哈希分区的示例设置。
有什么建议吗？真的吗？我想不出date还是UUID哈希是更好的分区键。
但我可以这样说：你们的任何一个解决方案都可以更有效
基于uuid
您添加分区键列并用触发器函数填充它的计划是非常低效的。而且没有必要。（除了触发功能本身的问题。）
似乎有误会。你有一个评论：
--注意：必须在主键中包含分区键。这是一条规则
不完全是
分区表上的唯一约束（以及主键）必须
包括所有分区键列。这种限制的存在是因为
构成约束的各个索引只能直接
在自己的分区内强制执行唯一性；因此,
分区结构本身必须保证没有
在不同的分区中重复
分区键列。不是分区键。

在（assembly\u id）
上具有哈希分区的设置与同一列上的PK一起工作。像这样：
创建表（如果不存在）程序集\u项\u散列(
程序集id uuid不为空
，id uuid不为空
，assembly_done_dts时间戳不为空默认值“历元”
，主键（程序集id，id）
)按散列（程序集_id）进行分区；
为具有（模数256，余数0）的值创建表assembly\u item\u hash\u 000 assembly\u item\u hash分区；
为具有（模数256，余数1）的值创建表assembly_item_hash_001 assembly_item_hash分区；
--等等。

简单得多
唯一的缺点是：PK索引更大，uuid
占用16个字节
如果这是一个问题，您可能会退回到您心目中生成的partition\u key
。每个分区有一个触发器。（啊，开销！）但将列改为整数而不是文本，并使用更高效的内置哈希函数uuid\u hash（）
。这是用于内部哈希分区的函数。但是现在我们显式地使用它并使用LIST
分区：
创建表（如果不存在）程序集\u项\u散列(
id uuid不为空
，程序集id uuid不为空
，分区\ U键int4不为空
，assembly_done_dts时间戳不为空默认值“历元”
，主键（分区键，id）
)按列表分区（分区\ U键）；

理论上，向每个表行添加4个字节，从每个索引项中保存12个字节。由于对齐填充，您在表和索引中又丢失了4个字节，最终在磁盘上的总空间与以前相同（大致上-表和索引膨胀可能有所不同）。
除非“列俄罗斯方块”允许您更有效地适应该列，否则每行最多可赢得8个字节。。。见：



基于时间戳的列表分区
不要使用citext
。不必要的并发症
改用整数表示YYYY-MM。更小，更快。我建议使用以下基本功能：
创建函数f_yyyymm（时间戳）
返回整数
语言sql并行安全不可变AS
'选择（摘录（从$1开始的年份）*100+摘录（从$1开始的月份）：：int'；

见：


同一个id永远不会重复
------------------------------------
-- Define table partitioned by list
------------------------------------
-- Could alternatively use RANGE here to partition by month.

BEGIN;

-- Drop parent table, if they exists.
-- This destroys ALL partitions automatically, even without a CASCADE clause.
DROP TABLE IF EXISTS assembly_item_list CASCADE;

CREATE TABLE IF NOT EXISTS assembly_item_list (
    id                              uuid          NOT NULL DEFAULT NULL,
    assembly_id                     uuid          NOT NULL DEFAULT NULL,
    assembly_done_dts               timestamp     NOT NULL DEFAULT 'epoch', -- Copied in from assembly.done_dts when rows are pushed to Postgres.
    year_and_month                  citext        NOT NULL DEFAULT NULL,    -- YYYY-MM from assembly_done_dts, calculated in insert function. Can't use a generated column as a partition key.

-- Reminder: id values come from the various source tables in IB. The upsert writes over matches ON CONFLICT with this ID.
-- Note: You *must* include the partition key in the primary key. It's a rule.
CONSTRAINT assembly_item_list_pk
    PRIMARY KEY (year_and_month, id) 
) PARTITION BY LIST (year_and_month);

-- Previous year partitions built here...

-- Build out 2021 completely.
CREATE TABLE assembly_item_list_2021_01 partition of assembly_item_list HASH (assembly_id) ('2021-01');
CREATE TABLE assembly_item_list_2021_02 partition of assembly_item_list HASH (assembly_id) ('2021-02');
-- etc.

-- In case I screw up at the end of the year....
CREATE TABLE assembly_item_list_default partition of assembly_item_list default; 

COMMIT; 

------------------------------------
-- Define table partitioned by hash
------------------------------------

BEGIN;

-- Drop parent table, if they exists.
-- This destroys ALL partitions automatically, even without a CASCADE clause.
DROP TABLE IF EXISTS assembly_item_hash CASCADE;

CREATE TABLE IF NOT EXISTS assembly_item_hash (
    id                              uuid          NOT NULL DEFAULT NULL,
    assembly_id                     uuid          NOT NULL DEFAULT NULL,
    assembly_done_dts               timestamp     NOT NULL DEFAULT 'epoch', -- Copied in from assembly.done_dts when rows are pushed to Postgres.
    partition_key                   text          NOT NULL DEFAULT NULL,    -- '00', '0A', etc. Populated in a BEFORE INSERT trigger on the partition. Can't use a generated column as a partition key, can't use a column reference in DEFAULT. 

-- Reminder: id values come from the various source tables in IB. The upsert writes over matches ON CONFLICT with this ID.
-- Note: You *must* include the partition key in the primary key. It's a rule.
CONSTRAINT assembly_item_hash_pk
    PRIMARY KEY (partition_key, id) 
) PARTITION BY HASH (partition_key);

-----------------------------------------------------
-- Create trigger function to populate partition_key
-----------------------------------------------------
-- The partition key is a two-character hex string, like '00', '3E', and so on.
CREATE OR REPLACE FUNCTION set_partition_key()
    RETURNS TRIGGER AS $$
    BEGIN
        NEW.partition_key = UPPER(LEFT(NEW.assembly_id, 2));
        RETURN NEW;
END;
$$ language plpgsql IMMUTABLE; -- I don't think that I need to worry about IMMUTABLE here. 01234567890ABCDEF shouldn't break. 

-----------------------------------------------------
-- Build partitions
-----------------------------------------------------
-- Note: Have to assign triggers to partitions individually.
-- Seems that it would be easier to add the logic to my central insert function.

CREATE TABLE assembly_item_hash_00 partition of assembly_item_hash FOR VALUES WITH (modulus 256, remainder 0);
CREATE TRIGGER set_partition_key_trigger_00
    BEFORE INSERT OR UPDATE ON assembly_item_hash_00
    FOR EACH ROW
    EXECUTE PROCEDURE set_partition_key();

CREATE TABLE assembly_item_hash_01 partition of assembly_item_hash FOR VALUES WITH (modulus 256, remainder 1);
CREATE TRIGGER set_partition_key_trigger_01
    BEFORE INSERT OR UPDATE ON assembly_item_hash_01
    FOR EACH ROW
    EXECUTE PROCEDURE set_partition_key();
    
-- And so on for all 256 partitions.

COMMIT;