理解/优化Postgresql中的SQL查询

理解/优化Postgresql中的SQL查询,sql,postgresql,database-performance,query-performance,sqlperformance,Sql,Postgresql,Database Performance,Query Performance,Sqlperformance,所以我有一个查询,我在最坏的情况下运行它,需要10-12分钟。 如果我删除where查询上的时间检查,它将下降到20-30秒,因此我想知道如何优化它 我试着在时间戳到时间的转换上添加索引,但它没有真正的帮助。。。 rs表register_status超过7000万行,register_日期约为280k,而cp表register_status不到1k 查询的思想是获取CP的所有结果,这些结果在一段日期内按状态分组,包括在一个时间范围内。这是最坏的情况,所以这是数据库中的第一个日期,如果用户选择全天

所以我有一个查询,我在最坏的情况下运行它,需要10-12分钟。 如果我删除where查询上的时间检查,它将下降到20-30秒,因此我想知道如何优化它

我试着在时间戳到时间的转换上添加索引,但它没有真正的帮助。。。 rs表register_status超过7000万行,register_日期约为280k,而cp表register_status不到1k

查询的思想是获取CP的所有结果,这些结果在一段日期内按状态分组,包括在一个时间范围内。这是最坏的情况,所以这是数据库中的第一个日期,如果用户选择全天作为时间范围。查询如下:

explain analyze SELECT
    COUNT(rs.status) filter (where rs.status = 'Occ') as total_occ,
    COUNT(rs.status) filter (where rs.status = 'Part') as total_part,
    COUNT(rs.status) filter (where rs.status = 'OOS') as total_oos,
    COUNT(rs.status) filter (where rs.status = 'OOC') as total_ooc,
    cp.id as charge_point_id,
    cp.address,
    cp.type as charge_point_type,
    cp.latitude,
    cp.longitude 
FROM register_date rd 
    inner join register_status rs on rs.fk_register_date = rd.id
    inner join charge_point cp on cp.id = rs.fk_charge_point
WHERE 
 rd.date::date >= '2016-11-01' and rd.date::date <= '2019-08-01'
 AND
 rd.date::time >= time '00:00' AND rd.date::time <= time '23:59'
group by cp.id
解释分析结果如下,我可以看到大量的空间使用

"Finalize GroupAggregate  (cost=34412.78..34536.10 rows=780 width=124) (actual time=689440.380..699740.172 rows=813 loops=1)"
"  Group Key: cp.id"
"  ->  Gather Merge  (cost=34412.78..34519.27 rows=722 width=124) (actual time=689421.445..699736.996 rows=1579 loops=1)"
"        Workers Planned: 1"
"        Workers Launched: 1"
"        ->  Partial GroupAggregate  (cost=33412.77..33438.04 rows=722 width=124) (actual time=649515.576..659674.461 rows=790 loops=2)"
"              Group Key: cp.id"
"              ->  Sort  (cost=33412.77..33414.57 rows=722 width=96) (actual time=649496.720..654001.697 rows=24509314 loops=2)"
"                    Sort Key: cp.id"
"                    Sort Method: external merge  Disk: 2649104kB"
"                    Worker 0:  Sort Method: external merge  Disk: 2652840kB"
"                    ->  Nested Loop  (cost=0.56..33378.49 rows=722 width=96) (actual time=1.343..504948.423 rows=24509314 loops=2)"
"                          ->  Parallel Seq Scan on register_date rd  (cost=0.00..6443.69 rows=4 width=4) (actual time=0.021..294.724 rows=139760 loops=2)"
"                                Filter: (((date)::date >= '2016-11-01'::date) AND ((date)::date <= '2019-08-01'::date) AND ((date)::time without time zone >= '00:00:00'::time without time zone) AND ((date)::time without time zone <= '23:59:00'::time without time zone))"
"                          ->  Nested Loop  (cost=0.56..6725.90 rows=780 width=100) (actual time=0.077..3.574 rows=175 loops=279519)"
"                                ->  Seq Scan on charge_point cp  (cost=0.00..21.80 rows=780 width=92) (actual time=0.002..0.077 rows=813 loops=279519)"
"                                ->  Index Only Scan using register_status_fk_charge_point_fk_register_date_status_key on register_status rs  (cost=0.56..8.58 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=227248947)"
"                                      Index Cond: ((fk_charge_point = cp.id) AND (fk_register_date = rd.id))"
"                                      Heap Fetches: 49018627"
"Planning Time: 0.506 ms"
"Execution Time: 700065.010 ms"

使用横向连接可能会更快:

SELECT cp.*, rd.*
FROM charge_point cp CROSS JOIN LATERAL
     (SELECT COUNT(*) filter (where rs.status = 'Occ') as total_occ,
             COUNT(*) filter (where rs.status = 'Part') as total_part,
             COUNT(*) filter (where rs.status = 'OOS') as total_oos,
             COUNT(*) filter (where rs.status = 'OOC') as total_ooc,
      FROM register_date rd JOIN
           register_status rs 
           ON rs.fk_register_date = rd.id
      WHERE cp.id = rs.fk_charge_point AND
            rd.date >= '2016-11-01' and
            rd.date < '2019-08-01' + interval '1 day'
     ) rd;
建议使用寄存器\日期FK\费用\点、日期和寄存器\状态ID、状态的索引


请注意,我更改了日期比较,因此它们对索引更友好。我认为没有理由按时间过滤,所以我删除了这些条件。

横向连接可能会更快:

SELECT cp.*, rd.*
FROM charge_point cp CROSS JOIN LATERAL
     (SELECT COUNT(*) filter (where rs.status = 'Occ') as total_occ,
             COUNT(*) filter (where rs.status = 'Part') as total_part,
             COUNT(*) filter (where rs.status = 'OOS') as total_oos,
             COUNT(*) filter (where rs.status = 'OOC') as total_ooc,
      FROM register_date rd JOIN
           register_status rs 
           ON rs.fk_register_date = rd.id
      WHERE cp.id = rs.fk_charge_point AND
            rd.date >= '2016-11-01' and
            rd.date < '2019-08-01' + interval '1 day'
     ) rd;
建议使用寄存器\日期FK\费用\点、日期和寄存器\状态ID、状态的索引


请注意,我更改了日期比较,因此它们对索引更友好。我认为没有理由按时间过滤,所以我删除了这些条件。

我使用Gordon的方法开发了一个新的查询,结果速度快了很多,从10-12分钟到20-40秒:

SELECT cp.*, rd.* from charge_point cp cross join lateral
    (select
        COUNT(rs.status) filter (where rs.status = 'Occ') as total_occ,
        COUNT(rs.status) filter (where rs.status = 'Part') as total_part,
        COUNT(rs.status) filter (where rs.status = 'OOS') as total_oos,
        COUNT(rs.status) filter (where rs.status = 'OOC') as total_ooc,
        rs.fk_charge_point as cpid
    FROM register_date rd 
        inner join register_status rs on rs.fk_register_date = rd.id
    WHERE 
        rd.date::date >= '2019-02-01' and rd.date::date <= '2019-08-01'
        AND
        rd.date::time >= time '00:00' AND rd.date::time <= time '23:59'
    group by rs.fk_charge_point) rd
where cp.id = rd.cpid

我仍然需要检查添加任何索引是否会使查询速度更快,但到目前为止它看起来不错

我使用Gordon的方法开发了一个新的查询,结果速度快了很多,从10-12分钟到20-40秒:

SELECT cp.*, rd.* from charge_point cp cross join lateral
    (select
        COUNT(rs.status) filter (where rs.status = 'Occ') as total_occ,
        COUNT(rs.status) filter (where rs.status = 'Part') as total_part,
        COUNT(rs.status) filter (where rs.status = 'OOS') as total_oos,
        COUNT(rs.status) filter (where rs.status = 'OOC') as total_ooc,
        rs.fk_charge_point as cpid
    FROM register_date rd 
        inner join register_status rs on rs.fk_register_date = rd.id
    WHERE 
        rd.date::date >= '2019-02-01' and rd.date::date <= '2019-08-01'
        AND
        rd.date::time >= time '00:00' AND rd.date::time <= time '23:59'
    group by rs.fk_charge_point) rd
where cp.id = rd.cpid

我仍然需要检查添加任何索引是否会使它更快,但到目前为止它看起来不错

注:cp;[另外:我认为这里不需要横向,只有不拆分时间戳就足够了,IMO]我需要时间,因为它可能只需要日期时间在上午5点到下午6点之间的结果,例如,正如我所说,我在最坏情况下运行查询,以防用户选择从00:00到23:59,这没有道理,但仍然有可能。@user3107720。然后将条件添加回。横向连接仍应使用充电桩和日期的索引。注:cp;[另外:我认为这里不需要横向,只有不拆分时间戳就足够了,IMO]我需要时间,因为它可能只需要日期时间在上午5点到下午6点之间的结果,例如,正如我所说,我在最坏情况下运行查询,以防用户选择从00:00到23:59,这没有道理,但仍然有可能。@user3107720。然后将条件添加回。横向连接仍应使用充电桩和日期的索引。因此,您有一个名为date的时间戳列。因此,您有一个名为date的时间戳列。