使用时间戳列索引和最大函数优化PostgreSQL查询
需要加速我正在运行的查询。不确定要添加的适当索引。如果你有什么想法,我将不胜感激。请注意,下面的输出是我的数据的一小部分。实际的表要大得多。实际查询运行大约需要15分钟 查询:使用时间戳列索引和最大函数优化PostgreSQL查询,postgresql,query-optimization,query-performance,postgresql-9.5,Postgresql,Query Optimization,Query Performance,Postgresql 9.5,需要加速我正在运行的查询。不确定要添加的适当索引。如果你有什么想法,我将不胜感激。请注意,下面的输出是我的数据的一小部分。实际的表要大得多。实际查询运行大约需要15分钟 查询: SELECT last_known_position_timestamp, mmsi, name, row_number() OVER (PARTITION BY mmsi, date_trunc('hour', GREATEST(last_known_position
SELECT last_known_position_timestamp,
mmsi,
name,
row_number() OVER (PARTITION BY mmsi, date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp) ) + INTERVAL '1 hours'
ORDER BY GREATEST(last_known_position_timestamp, predicted_position_timestamp) DESC) AS row_number
FROM test
WHERE date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp) ) + INTERVAL '1 hours' > timezone('UTC', now()) - INTERVAL '672 hours'
说明:
"WindowAgg (cost=137178.97..149678.96 rows=333333 width=263)"
" -> Sort (cost=137178.97..138012.31 rows=333333 width=248)"
" Sort Key: mmsi, ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval)), (GREATEST(last_known_position_timestamp, predicted_position_timestamp)) DESC"
" -> Seq Scan on test (cost=0.00..78931.33 rows=333333 width=248)"
" Filter: ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"
谢谢大家!
编辑1:被要求给出详细的解释和分析,所以在这里。
"WindowAgg (cost=109508.97..120342.30 rows=333333 width=48) (actual time=561.804..561.804 rows=0 loops=1)"
" Output: last_known_position_timestamp, mmsi, name, row_number() OVER (?), (GREATEST(last_known_position_timestamp, predicted_position_timestamp)), ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval))"
" Buffers: shared hit=48098"
" -> Sort (cost=109508.97..110342.31 rows=333333 width=40) (actual time=558.182..558.182 rows=0 loops=1)"
" Output: mmsi, (GREATEST(last_known_position_timestamp, predicted_position_timestamp)), ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval)), last_known_position_timestamp, name"
" Sort Key: test.mmsi, ((date_trunc('hour'::text, GREATEST(test.last_known_position_timestamp, test.predicted_position_timestamp)) + '01:00:00'::interval)), (GREATEST(test.last_known_position_timestamp, test.predicted_position_timestamp)) DESC"
" Sort Method: quicksort Memory: 25kB"
" Buffers: shared hit=48098"
" -> Seq Scan on vessel.test (cost=0.00..78931.33 rows=333333 width=40) (actual time=558.174..558.175 rows=0 loops=1)"
" Output: mmsi, GREATEST(last_known_position_timestamp, predicted_position_timestamp), (date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval), last_known_position_timestamp, name"
" Filter: ((date_trunc('hour'::text, GREATEST(test.last_known_position_timestamp, test.predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"
" Rows Removed by Filter: 1000000"
" Buffers: shared hit=48098"
"Planning Time: 0.098 ms"
"Execution Time: 561.865 ms"
编辑2:
被问到关于日期的问题。对于我需要的输出,我希望我的数据四舍五入到小时。所以16:04变成17:00,17:45变成18:00等等
我从中选择数据的表中有几个月的数据。但我只想要最后4周(考虑到四舍五入)。所以我要这么做
date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp) ) + INTERVAL '1 hours' > timezone('UTC', now()) - INTERVAL '672 hours' > timezone('UTC', now()) - INTERVAL '672 hours' –
这个过程很复杂,因为有时我跟踪的船只有一个实际的位置,有时是预测的。我得选最近的。因此是查询的“最大”部分
编辑3
我添加了一个索引,如下所示:
CREATE INDEX test_index ON test ((date_trunc('hour', GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + INTERVAL '1 hours'))
然后再次解释,这似乎略微降低了成本
"WindowAgg (cost=97521.39..108354.72 rows=333333 width=48)"
" -> Sort (cost=97521.39..98354.73 rows=333333 width=40)"
" Sort Key: mmsi, ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval)), (GREATEST(last_known_position_timestamp, predicted_position_timestamp)) DESC"
" -> Bitmap Heap Scan on test (cost=4413.76..66943.75 rows=333333 width=40)"
" Recheck Cond: ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"
" -> Bitmap Index Scan on test_index (cost=0.00..4330.43 rows=333333 width=0)"
" Index Cond: ((date_trunc('hour'::text, GREATEST(last_known_position_timestamp, predicted_position_timestamp)) + '01:00:00'::interval) > (timezone('UTC'::text, now()) - '672:00:00'::interval))"
你所有的时间都花在了顺序扫描上,可能很多时间都在重复执行相同的函数上
您应该在
WHERE
条件中=
左侧的表达式上创建索引。这可能是不可能的,因为如果您的时间戳是带时区的时间戳类型timestamp
。在这种情况下,您必须首先使用时区的构造将时间戳转换为不带时区的时间戳。日期和间隔的添加似乎没有意义。他们将如何影响结果?能否提供EXPLAIN(ANALYZE,BUFFERS,VERBOSE)
output?@LaurenzAlbe--添加了EXPLAIN(ANALYZE,BUFFERS,VERBOSE)和日期信息。我的时间戳没有时区。嗯,它们是UTC,但它们存储为“时间戳”。因此,我应该尝试在以下内容上创建索引?。。。。。。。日期(“小时”,最大值(最后一个已知位置时间戳,预测位置时间戳))+间隔“1小时”是的,完全正确。您还可以简化OVER
子句中的表达式。