提高SQL查询的运行时间
我有以下表格结构:提高SQL查询的运行时间,sql,postgresql,Sql,Postgresql,我有以下表格结构: AdPerformance id ad_id impressions Targeting value AdActions app_starts Ad id name parent_id AdTargeting id targeting_ ad_id Targeting id name value AdProduct id ad_id name 我需要通过限制product name来聚
AdPerformance
id
ad_id
impressions
Targeting
value
AdActions
app_starts
Ad
id
name
parent_id
AdTargeting
id
targeting_
ad_id
Targeting
id
name
value
AdProduct
id
ad_id
name
我需要通过限制product name来聚合数据,因此我编写了以下查询:
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN (
SELECT ad_id, value from targeting, ad_targeting
WHERE targeting.id = ad_targeting.id and targeting.name = 'gender'
) targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN
(SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value
但是,ANALYZE
命令中的上述查询大约需要5秒的时间才能查询到1米的记录
有什么办法可以改进吗
我确实有外键索引
已更新
请参阅ANALYZE的输出
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=5787.28..5789.87 rows=259 width=254) (actual time=3283.763..3286.015 rows=5998 loops=1)
Group Key: adobject_performance.ad_id, targeting.value
Buffers: shared hit=3400223
-> Nested Loop Left Join (cost=241.63..5603.63 rows=8162 width=254) (actual time=10.438..2774.664 rows=839720 loops=1)
Buffers: shared hit=3400223
-> Nested Loop (cost=241.21..1590.52 rows=8162 width=250) (actual time=10.412..703.818 rows=839720 loops=1)
Join Filter: (adobject.id = adobject_performance.ad_id)
Buffers: shared hit=36755
-> Hash Join (cost=240.78..323.35 rows=9 width=226) (actual time=10.380..20.332 rows=5998 loops=1)
Hash Cond: (ad_product.ad_id = ad.id)
Buffers: shared hit=190
-> HashAggregate (cost=128.98..188.96 rows=5998 width=4) (actual time=3.788..6.821 rows=5998 loops=1)
Group Key: ad_product.ad_id
Buffers: shared hit=39
-> Seq Scan on ad_product (cost=0.00..113.99 rows=5998 width=4) (actual time=0.011..1.726 rows=5998 loops=1)
Filter: ((product)::text = 'ft2_iPhone'::text)
Rows Removed by Filter: 1
Buffers: shared hit=39
-> Hash (cost=111.69..111.69 rows=9 width=222) (actual time=6.578..6.578 rows=5998 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 241kB
Buffers: shared hit=151
-> Hash Join (cost=30.26..111.69 rows=9 width=222) (actual time=0.154..4.660 rows=5998 loops=1)
Hash Cond: (adobject.parent_id = adobject_targeting.ad_id)
Buffers: shared hit=151
-> Seq Scan on adobject (cost=0.00..77.97 rows=897 width=8) (actual time=0.009..1.449 rows=6001 loops=1)
Buffers: shared hit=69
-> Hash (cost=30.24..30.24 rows=2 width=222) (actual time=0.132..0.132 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
Buffers: shared hit=82
-> Nested Loop (cost=0.15..30.24 rows=2 width=222) (actual time=0.101..0.129 rows=2 loops=1)
Buffers: shared hit=82
-> Seq Scan on targeting (cost=0.00..13.88 rows=2 width=222) (actual time=0.015..0.042 rows=79 loops=1)
Filter: (name = 'age group'::targeting_name)
Rows Removed by Filter: 82
Buffers: shared hit=1
-> Index Scan using advertising_targeting_pkey on adobject_targeting (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=79)
Index Cond: (id = targeting.id)
Buffers: shared hit=81
-> Index Scan using "fki_advertising_peformance_advertising_entity_id -> advertising" on adobject_performance (cost=0.42..89.78 rows=4081 width=32) (actual time=0.007..0.046 rows=140 loops=5998)
Index Cond: (ad_id = ad_product.ad_id)
Buffers: shared hit=36565
-> Index Scan using facebook_advertising_actions_pkey on facebook_adobject_actions (cost=0.42..0.48 rows=1 width=12) (actual time=0.001..0.002 rows=1 loops=839720)
Index Cond: (ad_performance.id = ad_performance_id)
Buffers: shared hit=3363468
Planning time: 1.525 ms
Execution time: 3287.324 ms
(46 rows)
在这里盲目射击,因为我们还没有得到解释的结果,但是,如果你在CTE中拿出你的
targeting
表,Postgres应该更好地处理这个查询:
WITH targeting AS
(
SELECT ad_id, value from targeting, ad_targeting
WHERE targeting.id = ad_targeting.id and targeting.name = 'gender'
)
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN
(SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value
摘自
WITH查询的一个有用特性是它们只计算一次
每次执行父查询,即使它们被引用的次数更多
父查询或同级查询多次执行。因此,价格昂贵
多个位置所需的计算可以放在一个
使用查询以避免冗余工作。另一个可能的应用是
防止对具有副作用的功能进行不必要的多次评估
在这里盲目射击,因为我们还没有得到解释的结果,但是,如果你在CTE中拿出你的
targeting
表,Postgres应该更好地处理这个查询:
WITH targeting AS
(
SELECT ad_id, value from targeting, ad_targeting
WHERE targeting.id = ad_targeting.id and targeting.name = 'gender'
)
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN
(SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value
摘自
WITH查询的一个有用特性是它们只计算一次
每次执行父查询,即使它们被引用的次数更多
父查询或同级查询多次执行。因此,价格昂贵
多个位置所需的计算可以放在一个
使用查询以避免冗余工作。另一个可能的应用是
防止对具有副作用的功能进行不必要的多次评估
我不知道此查询是否能解决您的问题,但请尝试:
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN ad_targeting on ad_targeting.ad_id = ad.parent_id
INNER JOIN targeting on targeting.id = ad_targeting.id and targeting.name = 'gender'
INNER JOIN ad_product on ad_product.ad_id = ad_performance.ad_id
WHERE ad_product.product = 'iphone'
GROUP BY ad_performance.ad_id, targeting_value
也许你会在所有列上创建索引,这些列都是你要输入的,或者条件在哪里,我不知道这个查询是否能解决你的问题,但是试试看:
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN ad_targeting on ad_targeting.ad_id = ad.parent_id
INNER JOIN targeting on targeting.id = ad_targeting.id and targeting.name = 'gender'
INNER JOIN ad_product on ad_product.ad_id = ad_performance.ad_id
WHERE ad_product.product = 'iphone'
GROUP BY ad_performance.ad_id, targeting_value
您可能会在所有要输入的列上创建索引,或者在执行计划似乎不再与查询匹配的条件下创建索引(可能您可以更新查询) 然而,现在的问题是:
->散列联接(成本=30.26..111.69行=9宽度=222)
(实际时间=0.154..4.660行=5998循环=1)
哈希条件:(adobject.parent\u id=adobject\u targeting.ad\u id)
缓冲区:共享命中率=151
->adobject上的顺序扫描(成本=0.00..77.97行=897宽度=8)
(实际时间=0.009..1.449行=6001循环=1)
缓冲区:共享命中=69
->散列(成本=30.24..30.24行=2宽度=222)
(实际时间=0.132..0.132行=2个循环=1)
存储桶:1024批:1内存使用量:1kB
缓冲区:共享命中率=82
->嵌套循环(成本=0.15..30.24行=2宽度=222)
(实际时间=0.101..0.129行=2个循环=1)
缓冲区:共享命中率=82
->目标定位时的顺序扫描(成本=0.00..13.88行=2宽度=222)
(实际时间=0.015..0.042行=79圈=1)
筛选器:(名称=‘年龄组’::目标设定_名称)
被筛选器删除的行:82
缓冲区:共享命中=1
->使用adobject\u targeting\u pkey在adobject\u targeting上进行索引扫描
(成本=0.15..8.17行=1宽=8)
(实际时间=0.001..0.001行=0圈=79)
索引条件:(id=targeting.id)
缓冲区:共享命中率=81
这是adobject
和
targeting JOIN adobject_targeting
USING (id)
WHERE targeting.name = 'age group'
后一个子查询正确估计为2行,但PostgreSQL没有注意到在adobject
中找到的几乎所有行都将匹配这两行中的一行,因此联接的结果将是6000,而不是它估计的9
这会导致优化器在以后错误地选择嵌套循环联接,其中超过一半的查询时间被占用
不幸的是,由于PostgreSQL没有跨表统计信息,因此PostgreSQL无法更好地了解
一个粗略的度量是设置enable_nestloop=off
,但这会降低另一个(正确选择的)嵌套循环联接的性能,因此我不知道这是否是一个净胜利。
如果有帮助,您可以考虑仅在查询的持续时间内更改参数(使用事务和<代码> SET本地< /代码>)。
也许有一种方法可以重写查询,以便找到更好的计划,但如果不知道确切的查询,这很难说。执行计划似乎不再与查询匹配(也许可以更新查询) 然而,现在的问题是:
->散列联接(成本=30.26..111.69行=9宽度=222)
(实际时间=0.154..4.660行=5998循环=1)
哈希条件:(adobject.parent\u id=adobject\u targeting.ad\u id)
缓冲区:共享命中率=151
->adobject上的顺序扫描(成本=0.00..77.97行=897宽度=8)
(实际时间=0.009..1.449行=6001循环=1)
缓冲区:共享命中=69
->散列(成本=30.24..30.24行=2宽度=222)
(实际时间=0.132..0.132行=2个循环=1)
存储桶:1024批:1内存使用量:1kB
缓冲区:共享命中率=82
->嵌套循环(成本=0.15..30.24行=2宽度=222)
(实际时间=0.101..0.129行=2个循环=1)
缓冲区:共享命中率=82
->目标定位时的顺序扫描(成本=0.00..13.88行=2宽度=222)
(实际时间=0.015..0.042行=79圈=1)
筛选器:(名称=‘年龄组’::目标设定_名称)
被筛选器删除的行:82