Sql 联接子查询导致对大量行进行索引扫描_Sql_Postgresql

Sql 联接子查询导致对大量行进行索引扫描

sql postgresql

Sql 联接子查询导致对大量行进行索引扫描,sql,postgresql,Sql,Postgresql,我正在编写一个查询，它使用聚合函数来减少数据重复，因为该查询连接了12个表考虑简化查询以显示瓶颈： SELECT r.source_uri AS su_on_r, tag.voted_tag AS vt_on_tag, tag.votes AS v_on_tag, FROM release r INNER JOIN release_barcode barcode ON r.source_

我正在编写一个查询，它使用聚合函数来减少数据重复，因为该查询连接了12个表

考虑简化查询以显示瓶颈：

SELECT r.source_uri       AS su_on_r, 
       tag.voted_tag      AS vt_on_tag,
       tag.votes      AS v_on_tag,  
FROM   release r 
       INNER JOIN release_barcode barcode 
          ON r.source_uri = barcode.source_uri AND barcode.barcode IN ( '75992731324', '075992731324', '0075992731324')
       LEFT JOIN (
              SELECT source_uri, string_agg(voted_tag, '|') as voted_tag, string_agg(votes::text, '|') as votes
              FROM release_voted_tag
              GROUP BY source_uri
              ) tag
              ON r.source_uri = tag.source_uri

release_条形码上的过滤器将rs的数量限制在8米左右，最多21个

左连接的计划如下所示：

->  Merge Left Join  (cost=1461.05..157205.05 rows=125 width=242) (actual time=23.322..1994.827 rows=21 loops=1)    
      Merge Cond: ((r.source_uri)::text = (release_voted_tag.source_uri)::text)                                     
      ->  Sort  (cost=1460.50..1460.81 rows=125 width=178) (actual time=0.974..0.991 rows=21 loops=1)               
            Sort Key: r.source_uri             
            Sort Method: quicksort  Memory: 30kB                                                                    
            ->  Nested Loop  (cost=0.99..1456.15 rows=125 width=178) (actual time=0.071..0.870 rows=21 loops=1)     
                  ->  Index Scan using release_barcode_barcode_idx on release_barcode barcode  (cost=0.43..382.71 rows=125 width=62) (actual time=0.029..0.061 rows=21 loops=1)          
                        Index Cond: ((barcode)::text = ANY ('{75992731324,075992731324,0075992731324}'::text[]))    
                  ->  Index Scan using release_source_uri_idx on release r  (cost=0.56..8.58 rows=1 width=169) (actual time=0.037..0.037 rows=1 loops=21)                                
                        Index Cond: ((source_uri)::text = (barcode.source_uri)::text)                               
      ->  Materialize  (cost=0.55..155340.82 rows=161233 width=132) (actual time=0.026..1625.598 rows=321318 loops=1)                                                                    
            ->  GroupAggregate  (cost=0.55..153325.41 rows=161233 width=132) (actual time=0.024..1446.457 rows=321318 loops=1)                                                           
                  Group Key: release_voted_tag.source_uri                                                           
                  ->  Index Scan using release_voted_tag_source_uri_idx on release_voted_tag  (cost=0.55..136510.34 rows=1151726 width=82) (actual time=0.007..647.964 rows=1151726 loops=1)

以下是完整的计划，其中显示了完整的查询，包括筛选子句：

在我看来，问题在于左连接返回的行数

这个数字超过1百万，远不及我预期的rs过滤器是否适用的数量。我希望返回84行，这相当于：

select release_barcode.source_uri,voted_tag 
from release_barcode,release_voted_tag 
where release_voted_tag.source_uri=release_barcode.source_uri and barcode IN ( '75992731324', '075992731324', '0075992731324');

我假设这可以限制从release_voated_标记中选择的记录数，因为在查询之外应用了ON过滤器

原汁原味如前所述，还涉及其他1:M连接。我最初是这样写的：

SELECT r.source_uri                        AS su_on_r, 
       string_agg(tag.voted_tag, '|')      AS vt_on_tag,
       string_agg(tag.votes::text, '|')    AS v_on_tag,  
       t.title,
       string_agg(distinct tComposer.composer, '|') AS c_on_tComposer 
FROM release r 
JOIN release_barcode barcode 
  ON r.source_uri = barcode.source_uri 
 AND barcode.barcode IN ( '75992731324', '075992731324', '0075992731324')
LEFT JOIN release_voted_tag tag
  ON r.source_uri = tag.source_uri 
LEFT JOIN medium m 
  ON r.source_uri = m.source_uri 
LEFT JOIN track t 
  ON m.id = t.medium 
LEFT JOIN track_composer tComposer 
  ON t.id = tComposer.track
GROUP BY r.source_uri, t.title;

但是，由于通过媒体和曲目连接到track_composer，我们最终得到了多个release_-voted_标记行，当存在多个track_composer时，这些标记行将被聚合。例如，如果有两个曲目_作曲家，则重复字符串_aggtag.voted _标记“|”

注意，我们必须小心使用distinct，因为tag.voated_标记和tag.voates必须稍后关联

我发现我可以用track_composer的相关子查询来解决这个问题，该子查询执行聚合，但性能不是很好，是吗？它每行运行一次

这就是为什么我移动到联接中的子查询，因为这样我可以将聚合放在联接中，并确保只返回一行，从而使联接到其他1:M表。。。理智的

那么问题是。。。为什么要实施一个昂贵的合并左连接？我如何才能使它更高效？

我将其改写为：

SELECT r.source_uri                        AS su_on_r, 
       string_agg(tag.voted_tag, '|')      AS vt_on_tag,
       string_agg(tag.votes::text, '|')    AS v_on_tag,  
FROM release r 
JOIN release_barcode barcode 
  ON r.source_uri = barcode.source_uri 
 AND barcode.barcode IN ( '75992731324', '075992731324', '0075992731324')
LEFT JOIN release_voted_tag tag
  ON r.source_uri = tag.source_uri 
GROUP BY r.source_uri;

甚至：

SELECT r.source_uri                        AS su_on_r, 
       string_agg(tag.voted_tag, '|')      AS vt_on_tag,
       string_agg(tag.votes::text, '|')    AS v_on_tag,  
FROM release r 
LEFT JOIN release_voted_tag tag
  ON r.source_uri = tag.source_uri 
WHERE r.source_uri IN (SELECT source_uri FROM release_barcode WHERE
                       barcode IN ('75992731324','075992731324', '0075992731324')
GROUP BY r.source_uri;

获得一百万行的原因是，首先运行内部查询，然后应用过滤器。如果您希望首先应用过滤器，那么您需要将其添加到内部查询中，这可能不是一个好主意，因为它将创建相关查询重写连接，如其他答复之一所述

请澄清您的问题是什么？感谢这是我最初的尝试，我认为这是一个明显的开始，但是我发现release_-voated_标签在其他1:M表也匹配时重复了。请看我编辑后的文章，在“原始尝试”下。