Sql Postgres使用带有'split_part'的索引`_Sql_Postgresql_Indexing_Split_Jsonb

Sql Postgres使用带有'split_part'的索引`

sql postgresql indexing

Sql Postgres使用带有'split_part'的索引`,sql,postgresql,indexing,split,jsonb,Sql,Postgresql,Indexing,Split,Jsonb,上下文：我有一个测试表： => \d+ test Table "public.test" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---------------+------------------------+----------

上下文：

我有一个

测试

表：

=> \d+ test 
                                       Table "public.test"
Column     |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
---------------+------------------------+-----------+----------+---------+----------+-------- 
------+-------------
 id            | character varying(255) |           |          |         | extended |              
| 
 configuration | jsonb                  |           |          |         | extended |              
|

configuration

列包含“定义良好”的json，它有一个名为

source\u url

的键（跳过其他不相关的键）。

配置

列的示例值为：

{
"source_url": "https://<resource-address>?Signature=R1UzTGphWEhrTTFFZnc0Q4qkGRxkA5%2BHFZSfx3vNEvRsrlDcHdntArfHwkWiT7Qxi%2BWVJ4DbHJeFp3GpbS%2Bcb1H3r1PXPkfKB7Fjr6tFRCetDWAOtwrDrVOkR9G1m7iOePdi1RW%2Fn1LKE7MzQUImpkcZXkpHTUgzXpE3TPgoeVtVOXXt3qQBARpdSixzDU8dW%2FcftEkMDVuj4B%2Bwiecf6st21MjBPjzD4GNVA%2F6bgvKA6ExrdYmM5S6TYm1lz2e6juk81%2Fk4eDecUtjfOj9ekZiGJVMyrD5Tyw%2FTWOrfUB2VM1uw1PFT2Gqet87jNRDAtiIrJiw1lfB7Od1AwNxIk0Rqkrju8jWxmQhvb1BJLV%2BoRH56OHdm5nHXFmQdldVpyagQ8bQXoKmYmZPuxQb6t9FAyovGMav3aMsxWqIuKTxLzjB89XmgwBTxZSv5E9bkWUbom2%2BWq4O3%2BCrVxYwsqg%3D%3D&Expires-At=1569340020&Issued-At=1568293200"
    .
    .
}

说明：

查询首先在
```
处拆分源url
并在=
```
处过期，然后在
```
&
```
上拆分结果字符串并拾取其左侧部分，从而获得所需的确切纪元时间，如
```
text
```
当
```
Expires At
```
是
```
源url
```
一旦它将历元时间提取为
```
text
```
，它首先将其转换为
```
bigint
```
，然后将其转换为Postgres时间戳，然后比较该时间戳是否小于或等于距离
```
now（）24小时的时间（）
```


将选择通过上述条件的所有行


因此，在每次运行结束时，调度器都会刷新将在未来24小时内过期的所有URL（包括已经过期的URL）

问题：
虽然这解决了我的问题，但我真的不喜欢这个解决方案。这有很多字符串操作，我发现它们有点不干净。有没有更干净的方法
如果我们“必须”使用上述解决方案，我们甚至可以使用索引进行此类查询吗？我知道可以为函数lower（）
，upper（）
extra编制索引，但我真的想不出任何方法可以为这个查询编制索引
备选方案：
除非有真正的清洁解决方案，否则我将采用以下方法：

我将在configuration
json中引入一个名为expires\u at
的新键，确保每次插入一行时，该键都被正确的值填充
然后直接查询这个新添加的字段（在configuration
列上有索引）

我承认，通过这种方式，我重复的信息将在
过期，但在我所能想到的所有可能的解决方案中，这是我发现最干净的一个
你们有没有比这更好的办法？


编辑：
已将查询更新为将substring（）
与regex一起使用，而不是内部split\u part（）
：
select*from test where to_timestamp（分割部分（子字符串（配置->'source\u url'from'Expires At=\d+'），'='，2）：：bigint）鉴于您当前的数据模型，我没有发现您的where
条件那么糟糕
您可以使用
CREATE INDEX ON test ( 
   to_timestamp(
      split_part(
         split_part(
            configuration->>'source_url',
            'Expires-At=',
            2
         ),
         '&',
         1
      )::bigint
   )
);

基本上，您必须为=
左侧的整个表达式编制索引。只有当涉及的所有函数和运算符都是不可变的时，才能这样做，我认为在您的例子中就是这样
不过我会改变数据模型。首先，我看不到在jsonb
列中有单个值的价值。为什么不将URL改为文本列
您可以更进一步，将URL拆分为存储在列中的各个部分
如果所有这些都是一个好主意，那么这取决于您如何使用数据库中的值：通常，将您在WHERE
条件等中使用的数据部分分开，并将其余部分“一块”保留是一个好主意。这在某种程度上是一个品味问题。
如果您发现URI解析模块不干净，则可以使用该模块。您可以使用plperl或plpythonu，以及您喜欢的URI解析器库。但如果您的json真的“定义良好”，我看不出有什么意义。除非您已经在使用plperl或plpythonu，否则添加这些依赖项可能会增加比删除更多的“污垢”
您可以构建索引：
create index on test (to_timestamp(split_part(split_part(configuration->>'source_url', 'Expires-At=', 2), '&', 1)::bigint));
set enable_seqscan TO off;
explain select * from test where to_timestamp(split_part(split_part(configuration->>'source_url', 'Expires-At=', 2), '&', 1)::bigint) <= now() + interval '24 hours';
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_to_timestamp_idx1 on test  (cost=0.13..8.15 rows=1 width=36)
   Index Cond: (to_timestamp(((split_part(split_part((configuration ->> 'source_url'::text), 'Expires-At='::text, 2), '&'::text, 1))::bigint)::double precision) <= (now() + '24:00:00'::interval))

在测试时创建索引（to_timestamp（split_part）（split_part（配置->>'source_url'，'Expires At='，2'，'&'，1）：：bigint））；
将enable_seqscan设置为off；
解释选择*fromtest where to_timestamp（split_part（split_part（配置->'source_url'，'Expires At='，2），'&'，1）：：bigint）>'source_url'：text），'Expires At='：text，2），'&'：：bigint:：双精度）为了简洁起见，我跳过了jsonb
列中的其他键。它还包含两个键，这就是为什么它的类型是jsonb
。不要<代码>拆分部分便宜得多。
CREATE INDEX ON test ( 
   to_timestamp(
      split_part(
         split_part(
            configuration->>'source_url',
            'Expires-At=',
            2
         ),
         '&',
         1
      )::bigint
   )
);

create index on test (to_timestamp(split_part(split_part(configuration->>'source_url', 'Expires-At=', 2), '&', 1)::bigint));
set enable_seqscan TO off;
explain select * from test where to_timestamp(split_part(split_part(configuration->>'source_url', 'Expires-At=', 2), '&', 1)::bigint) <= now() + interval '24 hours';
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_to_timestamp_idx1 on test  (cost=0.13..8.15 rows=1 width=36)
   Index Cond: (to_timestamp(((split_part(split_part((configuration ->> 'source_url'::text), 'Expires-At='::text, 2), '&'::text, 1))::bigint)::double precision) <= (now() + '24:00:00'::interval))