使用时间戳列索引和最大函数优化PostgreSQL查询

使用时间戳列索引和最大函数优化PostgreSQL查询,postgresql,query-optimization,query-performance,postgresql-9.5,Postgresql,Query Optimization,Query Performance,Postgresql 9.5,需要加速我正在运行的查询。不确定要添加的适当索引。如果你有什么想法,我将不胜感激。请注意,下面的输出是我的数据的一小部分。实际的表要大得多。实际查询运行大约需要15分钟 查询: SELECT last_known_position_timestamp, mmsi, name, row_number() OVER (PARTITION BY mmsi, date_trunc('hour', GREATEST(last_known_position

需要加速我正在运行的查询。不确定要添加的适当索引。如果你有什么想法,我将不胜感激。请注意,下面的输出是我的数据的一小部分。实际的表要大得多。实际查询运行大约需要15分钟

查询:

SELECT  last_known_position_timestamp,
        mmsi,
        name,
        row_number()  OVER (PARTITION BY mmsi, date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp) ) + INTERVAL '1 hours'
                        ORDER BY GREATEST(last_known_position_timestamp, predicted_position_timestamp)  DESC) AS row_number
FROM    test
WHERE   date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp) ) + INTERVAL '1 hours' > timezone('UTC', now()) - INTERVAL '672 hours'
说明:

"WindowAgg  (cost=137178.97..149678.96 rows=333333 width=263)"
"  ->  Sort  (cost=137178.97..138012.31 rows=333333 width=248)"
"        Sort Key: mmsi, ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval)), (GREATEST(last_known_position_timestamp, predicted_position_timestamp)) DESC"
"        ->  Seq Scan on test  (cost=0.00..78931.33 rows=333333 width=248)"
"              Filter: ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"
谢谢大家!

编辑1:被要求给出详细的解释和分析,所以在这里。

"WindowAgg  (cost=109508.97..120342.30 rows=333333 width=48) (actual time=561.804..561.804 rows=0 loops=1)"
"  Output: last_known_position_timestamp, mmsi, name, row_number() OVER (?), (GREATEST(last_known_position_timestamp, predicted_position_timestamp)), ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval))"
"  Buffers: shared hit=48098"
"  ->  Sort  (cost=109508.97..110342.31 rows=333333 width=40) (actual time=558.182..558.182 rows=0 loops=1)"
"        Output: mmsi, (GREATEST(last_known_position_timestamp, predicted_position_timestamp)), ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval)), last_known_position_timestamp, name"
"        Sort Key: test.mmsi, ((date_trunc('hour'::text, GREATEST(test.last_known_position_timestamp, test.predicted_position_timestamp)) + '01:00:00'::interval)), (GREATEST(test.last_known_position_timestamp, test.predicted_position_timestamp)) DESC"
"        Sort Method: quicksort  Memory: 25kB"
"        Buffers: shared hit=48098"
"        ->  Seq Scan on vessel.test  (cost=0.00..78931.33 rows=333333 width=40) (actual time=558.174..558.175 rows=0 loops=1)"
"              Output: mmsi, GREATEST(last_known_position_timestamp, predicted_position_timestamp), (date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval), last_known_position_timestamp, name"
"              Filter: ((date_trunc('hour'::text, GREATEST(test.last_known_position_timestamp, test.predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"
"              Rows Removed by Filter: 1000000"
"              Buffers: shared hit=48098"
"Planning Time: 0.098 ms"
"Execution Time: 561.865 ms"
编辑2:

被问到关于日期的问题。对于我需要的输出,我希望我的数据四舍五入到小时。所以16:04变成17:00,17:45变成18:00等等

我从中选择数据的表中有几个月的数据。但我只想要最后4周(考虑到四舍五入)。所以我要这么做

date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp) ) + INTERVAL '1 hours' > timezone('UTC', now()) - INTERVAL '672 hours' > timezone('UTC', now()) - INTERVAL '672 hours' –
这个过程很复杂,因为有时我跟踪的船只有一个实际的位置,有时是预测的。我得选最近的。因此是查询的“最大”部分

编辑3

我添加了一个索引,如下所示:

CREATE INDEX test_index ON test ((date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + INTERVAL '1 hours'))
然后再次解释,这似乎略微降低了成本

"WindowAgg  (cost=97521.39..108354.72 rows=333333 width=48)"
"  ->  Sort  (cost=97521.39..98354.73 rows=333333 width=40)"
"        Sort Key: mmsi, ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval)), (GREATEST(last_known_position_timestamp, predicted_position_timestamp)) DESC"
"        ->  Bitmap Heap Scan on test  (cost=4413.76..66943.75 rows=333333 width=40)"
"              Recheck Cond: ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"
"              ->  Bitmap Index Scan on test_index  (cost=0.00..4330.43 rows=333333 width=0)"
"                    Index Cond: ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"

你所有的时间都花在了顺序扫描上,可能很多时间都在重复执行相同的函数上


您应该在
WHERE
条件中
=
左侧的表达式上创建索引。这可能是不可能的,因为如果您的时间戳是带时区的时间戳类型
timestamp
。在这种情况下,您必须首先使用时区的
构造将时间戳转换为不带时区的
时间戳。

日期和间隔的添加似乎没有意义。他们将如何影响结果?能否提供
EXPLAIN(ANALYZE,BUFFERS,VERBOSE)
output?@LaurenzAlbe--添加了EXPLAIN(ANALYZE,BUFFERS,VERBOSE)和日期信息。我的时间戳没有时区。嗯,它们是UTC,但它们存储为“时间戳”。因此,我应该尝试在以下内容上创建索引?。。。。。。。日期(“小时”,最大值(最后一个已知位置时间戳,预测位置时间戳))+间隔“1小时”是的,完全正确。您还可以简化
OVER
子句中的表达式。