Postgresql postgres-估计时间戳列的索引大小_Postgresql_Indexing

Postgresql postgres-估计时间戳列的索引大小

postgresql indexing

Postgresql postgres-估计时间戳列的索引大小,postgresql,indexing,Postgresql,Indexing,有一个postgres表条目，带有类型为timestamp的'made_at'列，不带时区该表在该列和另一列上都有btree索引（USER_ID，外键）：如您所见，日期被截断为“天”。以这种方式构造的索引的总大小为130MB——ENTRIES表中有4000000行问题：如果我要关注时间到第二个，我如何估计指数的大小？基本上，在第二天而不是第二天截断时间戳（我希望应该很容易做到）。有趣的问题！根据我的调查，它们的尺寸是一样的我的直觉告诉我，两个索引的大小应该没有区别，因为PostgreSQ

有一个postgres表条目，带有类型为

timestamp的'made_at'列，不带时区

该表在该列和另一列上都有btree索引（USER_ID，外键）：

如您所见，日期被截断为“天”。以这种方式构造的索引的总大小为130MB——ENTRIES表中有4000000行

问题：如果我要关注时间到第二个，我如何估计指数的大小？基本上，在第二天而不是第二天截断时间戳（我希望应该很容易做到）。

有趣的问题！根据我的调查，它们的尺寸是一样的

我的直觉告诉我，两个索引的大小应该没有区别，因为PostgreSQL中的时间戳类型的大小是固定的（），我假设truncate函数只是将适当数量的最低有效时间位归零，但我认为我最好用一些事实来支持我的猜测

我在heroku PostgreSQL上构建了一个免费的dev数据库，并生成了一个具有4M个随机时间戳的表，截断为day和second值，如下所示：

test_db=> SELECT * INTO ts_test FROM 
                        (SELECT id, 
                                ts, 
                                date_trunc('day', ts) AS trunc_day, 
                                date_trunc('second', ts) AS trunc_s 
                         FROM (select generate_series(1, 4000000) AS id, 
                               now() - '1 year'::interval * round(random() * 1000) AS ts) AS sub) 
                         AS subq;
SELECT 4000000

test_db=> create index ix_day_trunc on ts_test (id, trunc_day);
CREATE INDEX
test_db=> create index ix_second_trunc on ts_test (id, trunc_s);
CREATE INDEX
test_db=> \d ts_test
           Table "public.ts_test"
  Column   |           Type           | Modifiers 
-----------+--------------------------+-----------
 id        | integer                  | 
 ts        | timestamp with time zone | 
 trunc_day | timestamp with time zone | 
 trunc_s   | timestamp with time zone | 
Indexes:
    "ix_day_trunc" btree (id, trunc_day)
    "ix_second_trunc" btree (id, trunc_s)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_day_trunc'));
          pg_size_pretty 
          ----------------
          120  MB
          (1 row)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_second_trunc'));
          pg_size_pretty 
          ----------------
          120 MB
          (1 row)

谢谢，感谢你的回答和例子。这很有趣——显然，我对数据库索引是如何构建的知之甚少；我假设，由于树中的叶节点会有更多的，呃，“bucket”或“nodes”，所以树的总大小也会更大。你能指出我的想法有什么问题吗？谢谢很难弄清楚你在想什么：）。为什么假设树中会有更多的叶节点？无论列的内容如何，都有相同数量的行要索引。fair ough=）我将尝试解释我的意思。我的直觉是这样的-如果有1000条消息，并且都在同一天，那么索引将是无用的-因为，很明显，所有记录到日期都有相同的时间戳-所以索引不能帮助我们缩小单个记录的范围。他们都在同一个“桶”里；它们都是同一个树节点上的叶子，不是吗？例如，如果我们在小时内取整，那么我们将有24个节点（假设一个合理的正态分布），并且实际的行悬挂在更小的行中=）Alex-你提出了一个非常好的观点。恐怕我不能权威地回答。正确答案可能取决于特定的btree实现细节。您应该担心您的列是否为varchar，例如，在这种情况下，索引大小取决于列大小

test_db=> SELECT * INTO ts_test FROM 
                        (SELECT id, 
                                ts, 
                                date_trunc('day', ts) AS trunc_day, 
                                date_trunc('second', ts) AS trunc_s 
                         FROM (select generate_series(1, 4000000) AS id, 
                               now() - '1 year'::interval * round(random() * 1000) AS ts) AS sub) 
                         AS subq;
SELECT 4000000

test_db=> create index ix_day_trunc on ts_test (id, trunc_day);
CREATE INDEX
test_db=> create index ix_second_trunc on ts_test (id, trunc_s);
CREATE INDEX
test_db=> \d ts_test
           Table "public.ts_test"
  Column   |           Type           | Modifiers 
-----------+--------------------------+-----------
 id        | integer                  | 
 ts        | timestamp with time zone | 
 trunc_day | timestamp with time zone | 
 trunc_s   | timestamp with time zone | 
Indexes:
    "ix_day_trunc" btree (id, trunc_day)
    "ix_second_trunc" btree (id, trunc_s)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_day_trunc'));
          pg_size_pretty 
          ----------------
          120  MB
          (1 row)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_second_trunc'));
          pg_size_pretty 
          ----------------
          120 MB
          (1 row)