Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/79.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Postgresql在大数据集上的性能重叠_Sql_Postgresql_Performance_Query Optimization - Fatal编程技术网

Postgresql在大数据集上的性能重叠

Postgresql在大数据集上的性能重叠,sql,postgresql,performance,query-optimization,Sql,Postgresql,Performance,Query Optimization,我正在努力使用重叠构建性能查询 表定义: CREATE TABLE date_range_table ( id int NOT NULL, item_id int NOT NULL, item1_id int NOT NULL, item2_id int NOT NULL, item3_id int NOT NULL, item4_id int NULL, item5_id int NULL, date_from date NOT NULL, date_to

我正在努力使用重叠构建性能查询

表定义:

CREATE TABLE date_range_table
(
  id int NOT NULL,
  item_id int NOT NULL,
  item1_id int NOT NULL,
  item2_id int NOT NULL,
  item3_id int NOT NULL,
  item4_id int NULL,
  item5_id int NULL,
  date_from date NOT NULL,
  date_to date NOT NULL
  CONSTRAINT pk_date_range_table PRIMARY KEY (id)
  -- Some other constraints
)

-- Unique constraints(partial)

CREATE UNIQUE INDEX ix_date_range_table_items_null
ON date_range_table USING btree
(item_id, item1_id, item2_id, item3_id, date_from, date_to)
WHERE item4_id IS NULL AND item5_id IS NULL;

CREATE UNIQUE INDEX ix_date_range_table_items_not_null
ON date_range_table USING btree
(item_id, item1_id, item2_id, item3_id, item4_id, item5_id, date_from, date_to)
WHERE item4_id IS NOT NULL AND item5_id IS NOT NULL;
我还拥有与暂存表(ETL)相同的表

现在我需要检查staging_date_range_表和date_range_表中是否存在任何跨接(重叠)

到目前为止,我所拥有的:

SELECT count(t.*)
FROM staging_date_range_table t
JOIN date_range_table td
  on
      t.item_id = td.item_id
  AND t.item1_id = td.item1_id
  AND t.item2_id = td.item2_id
  AND t.item3_id = td.item3_id
  AND COALESCE(t.item4_id, 0) = COALESCE(td.item4_id, 0)
  AND COALESCE(t.item5_id, 0) = COALESCE(td.item5_id, 0)

WHERE (t.date_from, t.date_to) OVERLAPS (td.date_from, td.date_to)
工作集:

staging_date_range_table: 100k rows

date_range_table: 20mil rows
这将运行10小时+

有没有办法加快速度

==========更新==========

从较小的暂存集中进行解释(替换列名):

“更新表t(成本=95785.89..1923251.02行=1宽度=3560)”
“->合并联接(成本=95785.89..1923251.02行=1宽度=3560)”
“合并条件:((td.item_id=t.。_fcd195352FB4BDB386B496C42E58904BB)和(td.item1_id=t.。_2c7d5721c3def81d253271f0c2065421))”
“联接筛选器:((COALESCE(t."u842c0c2670edb9fe4edec9e4bac082,'0'::bigint)=COALESCE(td.item4'u id,'0'::bigint)和(COALESCE(t."u522fbbc4b23b13d84bad22e151f4c9df,'0':bigint)=COALESCE(td.item5'u id,'0'::bigint))和(t."u9aea1bf14bfbfb167b72276a824712179=td.item2'u id)和(t.Ŝd389;)ca4a6a6u=c4a6a6a6a8a6a6a6a6a6a6a((t..(t.)3027783fd3d10afad84a9a15552b3445 td.date_from)或(t.(t.)0DC00DD4DFBF2864A0CDF57034916C2 td.date_to)和"重叠"(t.(t.)3027783fd3d10afad84a9a15552b3445; td.date to)::带时区的时间戳,(t.(t.(0DC00DD4DFBF2864A0CDF57044DFBF2864A0CDF57034916916C2::带时区的
“->使用索引(日期上的索引)进行索引扫描(日期范围表td)(成本=0.56..1594851.20行=8202145宽度=62)”
“->物化(成本=95785.34..95889.13行=20759宽度=3550)”
“->排序(成本=95785.34..95837.23行=20759宽度=3550)”
“排序键:t.。_fcd195352fb4bd386b496c42e58904bb,t._2c7d5721c3def81d253271f0c065421”
“->在表t上的位图堆扫描(成本=1353.30..30862.77行=20759宽度=3550)”
“重新检查条件:(\u rs<100)”
“->idx_2f01fb51327eb8ab144f717aad1c80487b711093f6efd7af3a上的位图索引扫描(成本=0.00..1348.11行=20759宽度=0)”
“索引条件:(\u rs<100)”
实际解释(替换列名):

“更新表t(成本=2329696.92..2475637.46行=188宽度=2742)(实际时间=167057.275..167057.277行=0循环=1)”
“缓冲区:共享命中=417读取=370671,临时读取=269876写入=270422”
“->合并联接(成本=2329696.92..2475637.46行=188宽度=2742)(实际时间=167057.274..167057.275行=0循环=1)”
“合并条件:((td.item1_id=t.)(td.item2_id=t._2c7d5721c3def81d253271f0c2065421)和(td.item3_id=t._452088c89804e1b5d34a6d266ca6c51a))”
“联接筛选器:((t.。。。。。。。。。。。。。。。。。。。。。。。。(t::带时区的时间戳)
“由联接筛选器删除的行:187204665”
“缓冲区:共享命中=417读取=370671,临时读取=269876写入=270422”
“->排序(成本=1980647.14..2001152.50行=8202145宽度=62)(实际时间=19793.734..24438.091行=8176251循环=1)”
“排序键:td.item1_id,td.item_id,(COALESCE(td.item4_id,'0'::bigint)),(COALESCE(td.item5_id,'0'::bigint)),td.item2_id,td.item3_id”
“排序方法:外部合并磁盘:658152kB”
缓冲区:共享命中=382读取=339568,临时读取=264238写入=264767
“->Seq日期扫描范围表td(成本=0.00..421967.45行=8202145宽度=62)(实际时间=0.021..4518.051行=8202145循环=1)”
“缓冲区:共享命中=378读取=339568”
“->物化(成本=348994.45..349641.59行=129428宽度=2732)(实际时间=614.072..9890.352行=187202992循环=1)”
“缓冲区:共享命中=35读取=31103,临时读取=5638写入=5655”
“->排序(成本=348994.45..349318.02行=129428宽度=2732)(实际时间=614.069..920.578行=129777循环=1)”
“排序键:t.。”9afea17b49bfb167b72276a824712179,t.“fcd195352fb4bd386b496c42e58904bb,(合并(t.。”u842C0C2670EDB9FE4EDE4C9E4BAC082,'0':bigint)),(合并(t.。”u522FBC4B23B13B13D84BAD251F4C9DF,'0':bigint)),t.。。”u2C7D5721CF8153271F065421,t.。”45208880C46C56C51a
“排序方法:外部合并磁盘:36632kB”
“缓冲区:共享命中=35读取=31103,临时读取=5638写入=5655”
“->表t上的序列扫描(成本=0.00..32755.85行=129428宽度=2732)(实际时间=0.574..168.827行=129777循环=1)”
“过滤器:(_rs<100)”
“缓冲区:共享命中=35读取=31103”
“计划时间:4.976毫秒”
“执行时间:167135.877毫秒”
==========更新==========

增加左下角的
generate_series


您应该尝试
daterange
而不是将“from”和“to”日期分开,并使用“overlaps”运算符
&&
。这允许您使用GiST索引,并且您可能可以获得快速嵌套循环联接。其背后的原因是来自合并联接的大多数行都被
重叠
条件过滤掉

WHERE daterange(t.date_from,  t.date_to,  '[]')
   && daterange(td.date_from, td.date_to, '[]')
建议索引:

CREATE INDEX ON date_range_table USING gist
   (daterange(date_from, date_to, '[]'));

没有
解释(分析、缓冲)
很难说。我建议设置一个
"Update on _2103301527_2fd34e_staging_date_range_table t  (cost=2329696.92..2475637.46 rows=188 width=2742) (actual time=167057.275..167057.277 rows=0 loops=1)"
"  Buffers: shared hit=417 read=370671, temp read=269876 written=270422"
"  ->  Merge Join  (cost=2329696.92..2475637.46 rows=188 width=2742) (actual time=167057.274..167057.275 rows=0 loops=1)"
"        Merge Cond: ((td.item1_id = t._9afea17b49bfb167b72276a824712179) AND (td.item_id = t._fcd195352fb4bd386b496c42e58904bb) AND ((COALESCE(td.item4_id, '0'::bigint)) = (COALESCE(t._842c0c2670edb9fe4ede4cc9e4bac082, '0'::bigint))) AND ((COALESCE(td.item5_id, '0'::bigint)) = (COALESCE(t._522ffbc4b23b13d84bad22e151f4c9df, '0'::bigint))) AND (td.item2_id = t._2c7d5721c3def81d253271f0c2065421) AND (td.item3_id = t._452088c89804e1b5d34a6d266ca6c51a))"
"        Join Filter: (((t._3027783fd3d10afad84a9a15552b3445 <> td.date_from) OR (t._0dc00dd4dfdbf2864a0cdf57034916c2 <> td.date_to)) AND ""overlaps""((t._3027783fd3d10afad84a9a15552b3445)::timestamp with time zone, (t._0dc00dd4dfdbf2864a0cdf57034916c2)::timestamp with time zone, (td.date_from)::timestamp with time zone, (td.date_to)::timestamp with time zone))"
"        Rows Removed by Join Filter: 187204665"
"        Buffers: shared hit=417 read=370671, temp read=269876 written=270422"
"        ->  Sort  (cost=1980647.14..2001152.50 rows=8202145 width=62) (actual time=19793.734..24438.091 rows=8176251 loops=1)"
"              Sort Key: td.item1_id, td.item_id, (COALESCE(td.item4_id, '0'::bigint)), (COALESCE(td.item5_id, '0'::bigint)), td.item2_id, td.item3_id"
"              Sort Method: external merge  Disk: 658152kB"
"              Buffers: shared hit=382 read=339568, temp read=264238 written=264767"
"              ->  Seq Scan on date_range_table td  (cost=0.00..421967.45 rows=8202145 width=62) (actual time=0.021..4518.051 rows=8202145 loops=1)"
"                    Buffers: shared hit=378 read=339568"
"        ->  Materialize  (cost=348994.45..349641.59 rows=129428 width=2732) (actual time=614.072..9890.352 rows=187202992 loops=1)"
"              Buffers: shared hit=35 read=31103, temp read=5638 written=5655"
"              ->  Sort  (cost=348994.45..349318.02 rows=129428 width=2732) (actual time=614.069..920.578 rows=129777 loops=1)"
"                    Sort Key: t._9afea17b49bfb167b72276a824712179, t._fcd195352fb4bd386b496c42e58904bb, (COALESCE(t._842c0c2670edb9fe4ede4cc9e4bac082, '0'::bigint)), (COALESCE(t._522ffbc4b23b13d84bad22e151f4c9df, '0'::bigint)), t._2c7d5721c3def81d253271f0c2065421, t._452088c89804e1b5d34a6d266ca6c51a"
"                    Sort Method: external merge  Disk: 36632kB"
"                    Buffers: shared hit=35 read=31103, temp read=5638 written=5655"
"                    ->  Seq Scan on _2103301527_2fd34e_staging_date_range_table t  (cost=0.00..32755.85 rows=129428 width=2732) (actual time=0.574..168.827 rows=129777 loops=1)"
"                          Filter: (_rs < 100)"
"                          Buffers: shared hit=35 read=31103"
"Planning Time: 4.976 ms"
"Execution Time: 167135.877 ms"
WHERE daterange(t.date_from,  t.date_to,  '[]')
   && daterange(td.date_from, td.date_to, '[]')
CREATE INDEX ON date_range_table USING gist
   (daterange(date_from, date_to, '[]'));