Amazon web services 我应该如何设置带有红移条件的左连接的distkey?

Amazon web services 我应该如何设置带有红移条件的左连接的distkey?,amazon-web-services,amazon-redshift,Amazon Web Services,Amazon Redshift,我有一个如下的查询: select a.col1, a.col2, b.col3 from a left join b on (a.id=b.id and b.attribute_id=3) left join c on (a.id=c.id and c.attribute_id=4) admin@dev=# explain select a.col1, a.col2, b.col3 from

我有一个如下的查询:

select
   a.col1,
   a.col2,
   b.col3
from
   a 
   left join b on (a.id=b.id and b.attribute_id=3)
   left join c on (a.id=c.id and c.attribute_id=4)
admin@dev=# explain select
       a.col1,
       a.col2,
       b.col3
    from
       a 
       left join b on (a.id=b.id and b.attribute_id=3)
       left join c on (a.id=c.id and c.attribute_id=4);
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     XN Hash Left Join DS_DIST_NONE  (cost=0.09..0.23 rows=3 width=99)
       Hash Cond: ("outer".id = "inner".id)
       ->  XN Hash Left Join DS_DIST_NONE  (cost=0.05..0.14 rows=3 width=103)
             Hash Cond: ("outer".id = "inner".id)
             ->  XN Seq Scan on a  (cost=0.00..0.03 rows=3 width=70)
             ->  XN Hash  (cost=0.04..0.04 rows=3 width=37)
                   ->  XN Seq Scan on b  (cost=0.00..0.04 rows=3 width=37)
                         Filter: (attribute_id = 3)
       ->  XN Hash  (cost=0.04..0.04 rows=1 width=4)
             ->  XN Seq Scan on c  (cost=0.00..0.04 rows=1 width=4)
                   Filter: (attribute_id = 4)
    (11 rows)

    Time: 123.315 ms

即使将distkey设置为id,也会使我在查询计划中得到一个DS\u BCAST\u内部,最终我只需要100万行的异常查询时间。

将id设置为分发密钥应该可以将数据放在同一位置,并消除广播的需要

create table a (id int distkey, attribute_id int, col1 varchar(10), col2 varchar(10));
create table b (id int distkey, attribute_id int, col3 varchar(10));
create table c (id int distkey, attribute_id int);
您应该看到这样的解释计划:

select
   a.col1,
   a.col2,
   b.col3
from
   a 
   left join b on (a.id=b.id and b.attribute_id=3)
   left join c on (a.id=c.id and c.attribute_id=4)
admin@dev=# explain select
       a.col1,
       a.col2,
       b.col3
    from
       a 
       left join b on (a.id=b.id and b.attribute_id=3)
       left join c on (a.id=c.id and c.attribute_id=4);
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     XN Hash Left Join DS_DIST_NONE  (cost=0.09..0.23 rows=3 width=99)
       Hash Cond: ("outer".id = "inner".id)
       ->  XN Hash Left Join DS_DIST_NONE  (cost=0.05..0.14 rows=3 width=103)
             Hash Cond: ("outer".id = "inner".id)
             ->  XN Seq Scan on a  (cost=0.00..0.03 rows=3 width=70)
             ->  XN Hash  (cost=0.04..0.04 rows=3 width=37)
                   ->  XN Seq Scan on b  (cost=0.00..0.04 rows=3 width=37)
                         Filter: (attribute_id = 3)
       ->  XN Hash  (cost=0.04..0.04 rows=1 width=4)
             ->  XN Seq Scan on c  (cost=0.00..0.04 rows=1 width=4)
                   Filter: (attribute_id = 4)
    (11 rows)

    Time: 123.315 ms
如果表包含300万行或更少,并且写入频率较低,则使用DIST-STYLE-ALL应该是安全的。如果确实使用DIST样式键,请验证分发表不会导致行倾斜(使用以下查询进行检查):


“skew_rows”是数据最多和最少的切片之间的数据比率。应该接近1.00。

每个表上有多少行?请随意编辑您的问题并显示查询计划。这将使提供建议变得更容易一些。然而,最好的建议是。