Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/79.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使此SQL查询相当快_Sql_Postgresql_Join_Left Join_Query Performance - Fatal编程技术网

如何使此SQL查询相当快

如何使此SQL查询相当快,sql,postgresql,join,left-join,query-performance,Sql,Postgresql,Join,Left Join,Query Performance,我想优化一个SQL查询(阅读:使其可行) 下面的PostgreSQL查询检索我需要的记录。我(相信我)可以通过在实际数据库的一小部分上运行查询来确认这一点 SELECT B.*, A1.foo, A1.bar, A2.foo, A2.bar FROM B LEFT JOIN A as A1 on B.n1_id = A1.n_id LEFT JOIN A as A2 on B.n2_id = A2.n_id WHERE B.l_id IN ( SELECT l_id FROM C

我想优化一个SQL查询(阅读:使其可行)

下面的PostgreSQL查询检索我需要的记录。我(相信我)可以通过在实际数据库的一小部分上运行查询来确认这一点

SELECT B.*, A1.foo, A1.bar, A2.foo, A2.bar FROM B LEFT JOIN A as A1 on B.n1_id = A1.n_id LEFT JOIN A as A2 on B.n2_id = A2.n_id WHERE B.l_id IN (
    SELECT l_id FROM C 
        WHERE l_id IN (
            SELECT l_id FROM B 
                WHERE n1_id IN (SELECT n_id FROM A WHERE foo BETWEEN foo_min AND foo_max AND bar BETWEEN bar_min AND bar_max)
            UNION
            SELECT l_id FROM B 
                WHERE n2_id IN (SELECT n_id FROM A WHERE foo BETWEEN foo_min AND foo_max AND bar BETWEEN bar_min AND bar_max)
            ) 
            AND (property1 = 'Y' OR property2 = 'Y')
    )
DB的相关部分如下所示:

table A:
n_id (PK);
foo, int (indexed);
bar, int (indexed);

table B:
l_id (PK);
n1_id (FK, indexed);
n2_id (FK, (indexed);

table C:
l_id (PK, FK);
property1, char (indexed);
property2, char (indexed);
EXPLAIN
告诉我:

"Merge Join  (cost=6590667.27..10067376.97 rows=453419 width=136)"
"  Merge Cond: (A2.n_id = B.n2_id)"
"  ->  Index Scan using pk_A on A A2  (cost=0.57..3220265.29 rows=99883648 width=38)"
"  ->  Sort  (cost=6590613.72..6591747.27 rows=453419 width=98)"
"        Sort Key: B.n2_id"
"        ->  Merge Join  (cost=3071304.25..6548013.91 rows=453419 width=98)"
"              Merge Cond: (A1.n_id = B.n1_id)"
"              ->  Index Scan using pk_A on A A1  (cost=0.57..3220265.29 rows=99883648 width=38)"
"              ->  Sort  (cost=3071250.74..3072384.28 rows=453419 width=60)"
"                    Sort Key: B.n1_id"
"                    ->  Hash Semi Join  (cost=32475.31..3028650.92 rows=453419 width=60)"
"                          Hash Cond: (B.l_id = C.l_id)"
"                          ->  Seq Scan on B B  (cost=0.00..2575104.04 rows=122360504 width=60)"
"                          ->  Hash  (cost=26807.58..26807.58 rows=453419 width=16)"
"                                ->  Nested Loop  (cost=10617.22..26807.58 rows=453419 width=16)"
"                                      ->  HashAggregate  (cost=10616.65..10635.46 rows=1881 width=8)"
"                                            ->  Append  (cost=4081.76..10611.95 rows=1881 width=8)"
"                                                  ->  Nested Loop  (cost=4081.76..5383.92 rows=1078 width=8)"
"                                                        ->  Bitmap Heap Scan on A  (cost=4081.19..4304.85 rows=56 width=8)"
"                                                              Recheck Cond: ((bar >= bar_min) AND (bar <= bar_max) AND (foo >= foo_min) AND (foo <= foo_max))"
"                                                              ->  BitmapAnd  (cost=4081.19..4081.19 rows=56 width=0)"
"                                                                    ->  Bitmap Index Scan on A_bar_idx  (cost=0.00..740.99 rows=35242 width=0)"
"                                                                          Index Cond: ((bar >= bar_min) AND (bar <= bar_max))"
"                                                                    ->  Bitmap Index Scan on A_foo_idx  (cost=0.00..3339.93 rows=159136 width=0)"
"                                                                          Index Cond: ((foo >= foo_min) AND (foo <= foo_max))"
"                                                        ->  Index Scan using nx_B_n1 on B  (cost=0.57..19.08 rows=19 width=16)"
"                                                              Index Cond: (n1_id = A.n_id)"
"                                                  ->  Nested Loop  (cost=4081.76..5209.22 rows=803 width=8)"
"                                                        ->  Bitmap Heap Scan on A A_1  (cost=4081.19..4304.85 rows=56 width=8)"
"                                                              Recheck Cond: ((bar >= bar_min) AND (bar <= bar_max) AND (foo >= foo_min) AND (foo <= foo_max))"
"                                                              ->  BitmapAnd  (cost=4081.19..4081.19 rows=56 width=0)"
"                                                                    ->  Bitmap Index Scan on A_bar_idx  (cost=0.00..740.99 rows=35242 width=0)"
"                                                                          Index Cond: ((bar >= bar_min) AND (bar <= bar_max))"
"                                                                    ->  Bitmap Index Scan on A_foo_idx  (cost=0.00..3339.93 rows=159136 width=0)"
"                                                                          Index Cond: ((foo >= foo_min) AND (foo <= foo_max))"
"                                                        ->  Index Scan using nx_B_n2 on B B_1  (cost=0.57..16.01 rows=14 width=16)"
"                                                              Index Cond: (n2_id = A_1.n_id)"
"                                      ->  Index Scan using pk_C on C  (cost=0.57..8.58 rows=1 width=8)"
"                                            Index Cond: (l_id = B.l_id)"
"                                            Filter: ((property1 = 'Y'::bpchar) OR (property2 = 'Y'::bpchar))"
与A的连接是它变慢的部分。查询返回<100条记录,那么为什么带有A的连接会使它变得如此缓慢呢?。你有没有办法让这个简单的连接更快?简单地从另一个表中添加四列信息,成本应该不会太高

  • 您可以将A(A1,A2)上的
    左连接
    合并为1
    左连接
    ,并将
    用于
    on
    子句
  • INs
    更改为
    存在
  • 使用
    子句将
    联合
    更改为1个查询
  • 试试这个

        SELECT B.*
        , A1.foo
        , A2.bar
    FROM B
    LEFT JOIN A AS A1
        ON (
                B.n1_id = A1.n_id
                OR B.n2_id = A1.n_id
                )
    WHERE EXISTS (
            SELECT l_id
            FROM C
            WHERE EXISTS (
                    SELECT l_id
                    FROM B
                    WHERE EXISTS (
                            SELECT n_id
                            FROM A
                            WHERE foo BETWEEN foo_min
                                    AND foo_max
                                AND bar BETWEEN bar_min
                                    AND bar_max
                                AND (
                                    A.n_id = B.n_id
                                    OR A.n_id = B.n2_id
                                    )
                            )
                        AND B.l_id = C.l_id
                    )
            )
    

    或者像这样:

    SELECT B.*, A1.foo, A2.bar 
    FROM B 
         LEFT JOIN A as A1 on B.n1_id = A1.n_id 
         LEFT JOIN A as A2 on B.n2_id = A2.n_id 
         INNER JOIN C on (C.l_id = B.l_id)
    where 
         A1.foo between A1.foo_min AND A1.foo_max AND 
         A2.bar BETWEEN A2.bar_min AND A2.bar_max and
         b.foo between b.foo_min AND b.foo_max AND 
         b.bar BETWEEN b.bar_min AND bar_max   AND
         (C.property1 = 'Y' OR C.property2 = 'Y')
    

    在我看来,您选择B,其中两个关联的As中至少有一个在给定范围内。此外,您还要求B存在一个C。然后您将显示与B值相关联的两个值的foo和bar

    SELECT B.*, A1.foo, A2.bar
    FROM B 
    LEFT JOIN A A1 ON A1.n_id = B.n1_id
    LEFT JOIN A A2 ON A2.n_id = B.n2_id
    WHERE 
    (
      (A1.foo BETWEEN foo_min AND foo_max AND A1.bar BETWEEN bar_min AND bar_max)
      OR
      (A2.foo BETWEEN foo_min AND foo_max AND A2.bar BETWEEN bar_min AND bar_max)
    )
    AND EXISTS
    (
      SELECT *
      FROM C
      WHERE C.l_id = B.l_id
      AND (property1 = 'Y' OR property2 = 'Y')
    );
    
    B.n1\u id和B.n2\u id是否可以为空?然后需要左外连接。否则,可以使用内部联接替换它们

    编辑:哎呀,我错过了C标准。我已相应地修改了声明

    编辑:根据您的评论,下面是相同的select,带有内部联接和IN子句:

    SELECT B.*, A1.foo, A2.bar
    FROM B 
    INNER JOIN A A1 ON A1.n_id = B.n1_id
    INNER JOIN A A2 ON A2.n_id = B.n2_id
    WHERE 
    (
      (A1.foo BETWEEN foo_min AND foo_max AND A1.bar BETWEEN bar_min AND bar_max)
      OR
      (A2.foo BETWEEN foo_min AND foo_max AND A2.bar BETWEEN bar_min AND bar_max)
    )
    AND B.l_id IN
    (
      SELECT l_id
      FROM C
      WHERE property1 = 'Y' OR property2 = 'Y'
    );
    

    我可以通过以下查询使其工作:

    WITH relevant_a AS (
        SELECT * FROM A 
            WHERE
                foo BETWEEN foo_min AND foo_max 
                AND
                bar BETWEEN bar_min AND bar_max
    )
    WITH relevant_c AS (
        SELECT * FROM C
            WHERE l_id IN (
                SELECT l_id FROM B
                    WHERE n1_id IN (
                        SELECT n_id FROM relevant_a
                    )
                UNION
                SELECT l_id FROM B
                    WHERE n2_id IN (
                        SELECT n_id FROM relevant_a
                    )
            )
            AND
            (property1 = 'Y' OR property2= 'Y')
    ),
    relevant_b AS (
        SELECT * FROM B WHERE l_id IN (
            SELECT l_id FROM relevant_c
        )
    )
    
    SELECT * FROM relevant_b
    
    WITH relevant_a AS (
        SELECT * FROM A 
            WHERE
                foo BETWEEN foo_min AND foo_max 
                AND
                bar BETWEEN bar_min AND bar_max
    ),
    relevant_c AS (
        SELECT * FROM C
            WHERE l_id IN (
                SELECT l_id FROM B
                    WHERE n1_id IN (
                        SELECT n_id FROM relevant_a
                    )
                UNION
                SELECT l_id FROM B
                    WHERE n2_id IN (
                        SELECT n_id FROM relevant_a
                    )
            )
            AND
            (property1 = 'Y' OR property2= 'Y')
    ),
    relevant_b AS (
        SELECT * FROM B WHERE l_id IN (
            SELECT l_id FROM relevant_c
        )
    ),
    a1_data AS (
        SELECT A.n_id, A.foo, A.bar
        FROM A
        WHERE A.n_id IN (
            SELECT n1_id FROM relevant_b
        )
    )
    a2_data AS (
        SELECT A.n_id, A.foo, A.bar
        FROM A
        WHERE A.n_id IN (
            SELECT n2_id FROM relevant_b
        )
    )
    
    SELECT relevant_b.*, a1_data.foo,  a1_data.bar,  a2_data.foo,  a2_data.bar 
    FROM relevant_b
    LEFT JOIN a1_data ON relevant_b.n1_id = a1_data.n_id
    LEFT JOIN a2_data ON relevant_b.n2_id = a2_data.n_id
    
    我不喜欢这个解决方案,因为它似乎是强制性的和多余的。但是,它在<0.1s内完成任务

    我仍然不相信像我这样的SQLNoob能够(回想起来相当容易)提出一个语句,迫使优化器使用一种比它自己提出的策略更好的策略。必须有一个更好的解决方案,但我不会再寻找了


    无论如何,谢谢大家的建议,我在路上确实学到了一些东西。

    表a中foo和bar两列都应该只有一个索引,因此这对于两个interween子句都是最佳的(自行查找将是更好的列顺序) 表B不是为该查询而优化设计的,您应该使用相同的键和列id(交叉表)在两个不同的行中传输值n1_id和n2_id

    接下来的查询将返回相同的数据,性能将得到极大的提高

    with b_norm(l_id, role, n_id) as (
            select l_id, unnest(Array['1','2']) as role, unnest(Array[n1_id, n2_id]) as n_id
            from b
        )
    select *
    from (
            select distinct l_id
            from a
                join b_norm using (n_id)
                join c using (l_id)
            where bar between 0 and 10000
                and foo between 10000 and 20000
                and (c.property1 = 'Y' or c.property2 = 'Y')
        ) as driver
        join b using (l_id)
        join (
            select a.n_id as n1_id, foo as foo1, bar as bar1
            from a
        )  as a1 using (n1_id)
        join (
            select a.n_id as n2_id, foo as foo2, bar as bar2
            from a
        ) as a2 using (n2_id);
    

    我想这里的关键是将这些子查询重新处理为条件联接,因为这将给查询规划人员更多的自由来优化事情。
    EXPLAIN ANALYZE
    将告诉我们更多。您可能希望将其发布到并添加到问题的链接。此外,使用精确的数据类型和所有约束条件的实际表定义也会有所帮助-您可以在
    psql
    中使用
    \d tbl
    获得什么。您可以使用适当的表别名来别名查询中的所有列吗?查询正在做什么并不明显。首先,将INs更改为Exists
    A.n\u id
    应该是
    A2.n\u id
    我假定?对
    A
    的引用无效。您必须重写查询以进行修复。谢谢大家的评论,我将在星期一尝试您的建议。很抱歉,这不起作用。以b.foo和b.bar开头的行无效(也不需要),因为表b没有此类字段。在第一行和第二行之间,应该有OR而不是and。但即使有这些修正,查询也太慢(10秒后取消)。EXPLAIN显示优化器正在执行一个正确的连接,成本超过900万,然后再次以900万进行排序。感谢您的建议,我将在周一进行尝试。在此查询的EXPLAIN中没有嵌套循环,这比我得到的要远。10秒后,我仍然不得不取消查询。我看到一个“合并右连接”,第一行的成本为1200万。有些查询需要时间;-)B中有多少记录?EXISTS子句的作用有多大?比如说,用exists子句,我得到的是B的50%还是B的0.5%?再次说明:B.n1\u id和B.n2\u id可以为空吗?合并联接听起来不错。至少对于许多数据而言。首先排序,然后匹配。顺便说一句,postgresql没有像Oracle那样的散列连接技术,所以合并方法似乎是你能得到的最好的方法。啊,我只是注意到:在你请求的注释中,你说B.n1\u id和B.n2\u id都不能为NULL。然后将左侧连接替换为内部连接。也许这会有所不同。(但实际上dbms应该注意到类似的事情,并将外部联接转换为内部联接本身。)对不起,在回答我自己的问题之前,我没有看到您的评论。我发现一个查询时间<0.1秒,因此这不是需要时间的查询:-)。正如问题A和B中所暗示的,A和B有约1亿条记录,C有约7000万条记录。每一个C都是B。大约1%的C符合属性条款。谢谢你的建议,我将在周一尝试。这种方法缺少属性1和属性2条款。还有一个区别!我需要的A2已经不见了。我仍然执行了查询(添加了属性子句),但在10秒后不得不取消它。EXPLAIN显示了一个嵌套的循环左连接,其成本为2万亿。