提高SQL查询的运行时间_Sql_Postgresql

提高SQL查询的运行时间

sql postgresql

提高SQL查询的运行时间,sql,postgresql,Sql,Postgresql,我有以下表格结构： AdPerformance id ad_id impressions Targeting value AdActions app_starts Ad id name parent_id AdTargeting id targeting_ ad_id Targeting id name value AdProduct id ad_id name 我需要通过限制product name来聚

我有以下表格结构：

AdPerformance
   id
   ad_id
   impressions

Targeting
  value


AdActions
   app_starts

Ad
  id
  name
  parent_id

AdTargeting
  id
  targeting_
  ad_id

Targeting
  id
  name
  value

AdProduct
  id
  ad_id
  name

我需要通过限制product name来聚合数据，因此我编写了以下查询：

 SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
 FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN (
        SELECT ad_id, value from targeting, ad_targeting 
        WHERE targeting.id = ad_targeting.id and targeting.name = 'gender' 
     ) targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN 
       (SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value

但是，

ANALYZE

命令中的上述查询大约需要5秒的时间才能查询到1米的记录

有什么办法可以改进吗

我确实有外键索引

已更新

请参阅ANALYZE的输出

                                                                                                                                                                                                          QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=5787.28..5789.87 rows=259 width=254) (actual time=3283.763..3286.015 rows=5998 loops=1)
   Group Key: adobject_performance.ad_id, targeting.value
   Buffers: shared hit=3400223
   ->  Nested Loop Left Join  (cost=241.63..5603.63 rows=8162 width=254) (actual time=10.438..2774.664 rows=839720 loops=1)
         Buffers: shared hit=3400223
         ->  Nested Loop  (cost=241.21..1590.52 rows=8162 width=250) (actual time=10.412..703.818 rows=839720 loops=1)
               Join Filter: (adobject.id = adobject_performance.ad_id)
               Buffers: shared hit=36755
               ->  Hash Join  (cost=240.78..323.35 rows=9 width=226) (actual time=10.380..20.332 rows=5998 loops=1)
                     Hash Cond: (ad_product.ad_id = ad.id)
                     Buffers: shared hit=190
                     ->  HashAggregate  (cost=128.98..188.96 rows=5998 width=4) (actual time=3.788..6.821 rows=5998 loops=1)
                           Group Key: ad_product.ad_id
                           Buffers: shared hit=39
                           ->  Seq Scan on ad_product  (cost=0.00..113.99 rows=5998 width=4) (actual time=0.011..1.726 rows=5998 loops=1)
                                 Filter: ((product)::text = 'ft2_iPhone'::text)
                                 Rows Removed by Filter: 1
                                 Buffers: shared hit=39
                     ->  Hash  (cost=111.69..111.69 rows=9 width=222) (actual time=6.578..6.578 rows=5998 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 241kB
                           Buffers: shared hit=151
                           ->  Hash Join  (cost=30.26..111.69 rows=9 width=222) (actual time=0.154..4.660 rows=5998 loops=1)
                                 Hash Cond: (adobject.parent_id = adobject_targeting.ad_id)
                                 Buffers: shared hit=151
                                 ->  Seq Scan on adobject  (cost=0.00..77.97 rows=897 width=8) (actual time=0.009..1.449 rows=6001 loops=1)
                                       Buffers: shared hit=69
                                 ->  Hash  (cost=30.24..30.24 rows=2 width=222) (actual time=0.132..0.132 rows=2 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                       Buffers: shared hit=82
                                       ->  Nested Loop  (cost=0.15..30.24 rows=2 width=222) (actual time=0.101..0.129 rows=2 loops=1)
                                             Buffers: shared hit=82
                                             ->  Seq Scan on targeting  (cost=0.00..13.88 rows=2 width=222) (actual time=0.015..0.042 rows=79 loops=1)
                                                   Filter: (name = 'age group'::targeting_name)
                                                   Rows Removed by Filter: 82
                                                   Buffers: shared hit=1
                                             ->  Index Scan using advertising_targeting_pkey on adobject_targeting  (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=79)
                                                   Index Cond: (id = targeting.id)
                                                   Buffers: shared hit=81
               ->  Index Scan using "fki_advertising_peformance_advertising_entity_id -> advertising" on adobject_performance  (cost=0.42..89.78 rows=4081 width=32) (actual time=0.007..0.046 rows=140 loops=5998)
                     Index Cond: (ad_id = ad_product.ad_id)
                     Buffers: shared hit=36565
         ->  Index Scan using facebook_advertising_actions_pkey on facebook_adobject_actions  (cost=0.42..0.48 rows=1 width=12) (actual time=0.001..0.002 rows=1 loops=839720)
               Index Cond: (ad_performance.id = ad_performance_id)
               Buffers: shared hit=3363468
 Planning time: 1.525 ms
 Execution time: 3287.324 ms
(46 rows)

在这里盲目射击，因为我们还没有得到解释的结果，但是，如果你在CTE中拿出你的

targeting

表，Postgres应该更好地处理这个查询：

WITH targeting AS 
(
        SELECT ad_id, value from targeting, ad_targeting 
        WHERE targeting.id = ad_targeting.id and targeting.name = 'gender' 
)
SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN  targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN 
       (SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value

摘自

WITH查询的一个有用特性是它们只计算一次每次执行父查询，即使它们被引用的次数更多父查询或同级查询多次执行。因此，价格昂贵多个位置所需的计算可以放在一个使用查询以避免冗余工作。另一个可能的应用是防止对具有副作用的功能进行不必要的多次评估

在这里盲目射击，因为我们还没有得到解释的结果，但是，如果你在CTE中拿出你的

targeting

表，Postgres应该更好地处理这个查询：

WITH targeting AS 
(
        SELECT ad_id, value from targeting, ad_targeting 
        WHERE targeting.id = ad_targeting.id and targeting.name = 'gender' 
)
SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN  targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN 
       (SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value

摘自

我不知道此查询是否能解决您的问题，但请尝试：

 SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
 FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN ad_targeting on ad_targeting.ad_id = ad.parent_id
     INNER JOIN targeting on  targeting.id = ad_targeting.id and targeting.name = 'gender'   
     INNER JOIN ad_product on ad_product.ad_id = ad_performance.ad_id
WHERE ad_product.product = 'iphone'
GROUP BY ad_performance.ad_id, targeting_value

也许你会在所有列上创建索引，这些列都是你要输入的，或者条件在哪里，我不知道这个查询是否能解决你的问题，但是试试看：

 SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
 FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN ad_targeting on ad_targeting.ad_id = ad.parent_id
     INNER JOIN targeting on  targeting.id = ad_targeting.id and targeting.name = 'gender'   
     INNER JOIN ad_product on ad_product.ad_id = ad_performance.ad_id
WHERE ad_product.product = 'iphone'
GROUP BY ad_performance.ad_id, targeting_value

您可能会在所有要输入的列上创建索引，或者在执行计划似乎不再与查询匹配的条件下创建索引（可能您可以更新查询）

然而，现在的问题是：

->散列联接（成本=30.26..111.69行=9宽度=222）
（实际时间=0.154..4.660行=5998循环=1）
哈希条件：（adobject.parent\u id=adobject\u targeting.ad\u id）
缓冲区：共享命中率=151
->adobject上的顺序扫描（成本=0.00..77.97行=897宽度=8）
（实际时间=0.009..1.449行=6001循环=1）
缓冲区：共享命中=69
->散列（成本=30.24..30.24行=2宽度=222）
（实际时间=0.132..0.132行=2个循环=1）
存储桶：1024批：1内存使用量：1kB
缓冲区：共享命中率=82
->嵌套循环（成本=0.15..30.24行=2宽度=222）
（实际时间=0.101..0.129行=2个循环=1）
缓冲区：共享命中率=82
->目标定位时的顺序扫描（成本=0.00..13.88行=2宽度=222）
（实际时间=0.015..0.042行=79圈=1）
筛选器：（名称=‘年龄组’：：目标设定_名称）
被筛选器删除的行：82
缓冲区：共享命中=1
->使用adobject\u targeting\u pkey在adobject\u targeting上进行索引扫描
（成本=0.15..8.17行=1宽=8）
（实际时间=0.001..0.001行=0圈=79）
索引条件：（id=targeting.id）
缓冲区：共享命中率=81

这是

adobject

和

targeting JOIN adobject_targeting
   USING (id)
WHERE targeting.name = 'age group'

后一个子查询正确估计为2行，但PostgreSQL没有注意到在

adobject

中找到的几乎所有行都将匹配这两行中的一行，因此联接的结果将是6000，而不是它估计的9

这会导致优化器在以后错误地选择嵌套循环联接，其中超过一半的查询时间被占用

不幸的是，由于PostgreSQL没有跨表统计信息，因此PostgreSQL无法更好地了解

一个粗略的度量是

设置enable_nestloop=off

，但这会降低另一个（正确选择的）嵌套循环联接的性能，因此我不知道这是否是一个净胜利。如果有帮助，您可以考虑仅在查询的持续时间内更改参数（使用事务和<代码> SET本地< /代码>）。

也许有一种方法可以重写查询，以便找到更好的计划，但如果不知道确切的查询，这很难说。

执行计划似乎不再与查询匹配（也许可以更新查询）

然而，现在的问题是：

->散列联接（成本=30.26..111.69行=9宽度=222）
（实际时间=0.154..4.660行=5998循环=1）
哈希条件：（adobject.parent\u id=adobject\u targeting.ad\u id）
缓冲区：共享命中率=151
->adobject上的顺序扫描（成本=0.00..77.97行=897宽度=8）
（实际时间=0.009..1.449行=6001循环=1）
缓冲区：共享命中=69
->散列（成本=30.24..30.24行=2宽度=222）
（实际时间=0.132..0.132行=2个循环=1）
存储桶：1024批：1内存使用量：1kB
缓冲区：共享命中率=82
->嵌套循环（成本=0.15..30.24行=2宽度=222）
（实际时间=0.101..0.129行=2个循环=1）
缓冲区：共享命中率=82
->目标定位时的顺序扫描（成本=0.00..13.88行=2宽度=222）
（实际时间=0.015..0.042行=79圈=1）
筛选器：（名称=‘年龄组’：：目标设定_名称）
被筛选器删除的行：82