Amazon web services 我应该如何设置带有红移条件的左连接的distkey?
我有一个如下的查询:Amazon web services 我应该如何设置带有红移条件的左连接的distkey?,amazon-web-services,amazon-redshift,Amazon Web Services,Amazon Redshift,我有一个如下的查询: select a.col1, a.col2, b.col3 from a left join b on (a.id=b.id and b.attribute_id=3) left join c on (a.id=c.id and c.attribute_id=4) admin@dev=# explain select a.col1, a.col2, b.col3 from
select
a.col1,
a.col2,
b.col3
from
a
left join b on (a.id=b.id and b.attribute_id=3)
left join c on (a.id=c.id and c.attribute_id=4)
admin@dev=# explain select
a.col1,
a.col2,
b.col3
from
a
left join b on (a.id=b.id and b.attribute_id=3)
left join c on (a.id=c.id and c.attribute_id=4);
QUERY PLAN
--------------------------------------------------------------------------
XN Hash Left Join DS_DIST_NONE (cost=0.09..0.23 rows=3 width=99)
Hash Cond: ("outer".id = "inner".id)
-> XN Hash Left Join DS_DIST_NONE (cost=0.05..0.14 rows=3 width=103)
Hash Cond: ("outer".id = "inner".id)
-> XN Seq Scan on a (cost=0.00..0.03 rows=3 width=70)
-> XN Hash (cost=0.04..0.04 rows=3 width=37)
-> XN Seq Scan on b (cost=0.00..0.04 rows=3 width=37)
Filter: (attribute_id = 3)
-> XN Hash (cost=0.04..0.04 rows=1 width=4)
-> XN Seq Scan on c (cost=0.00..0.04 rows=1 width=4)
Filter: (attribute_id = 4)
(11 rows)
Time: 123.315 ms
即使将distkey设置为id,也会使我在查询计划中得到一个DS\u BCAST\u内部,最终我只需要100万行的异常查询时间。将id设置为分发密钥应该可以将数据放在同一位置,并消除广播的需要
create table a (id int distkey, attribute_id int, col1 varchar(10), col2 varchar(10));
create table b (id int distkey, attribute_id int, col3 varchar(10));
create table c (id int distkey, attribute_id int);
您应该看到这样的解释计划:
select
a.col1,
a.col2,
b.col3
from
a
left join b on (a.id=b.id and b.attribute_id=3)
left join c on (a.id=c.id and c.attribute_id=4)
admin@dev=# explain select
a.col1,
a.col2,
b.col3
from
a
left join b on (a.id=b.id and b.attribute_id=3)
left join c on (a.id=c.id and c.attribute_id=4);
QUERY PLAN
--------------------------------------------------------------------------
XN Hash Left Join DS_DIST_NONE (cost=0.09..0.23 rows=3 width=99)
Hash Cond: ("outer".id = "inner".id)
-> XN Hash Left Join DS_DIST_NONE (cost=0.05..0.14 rows=3 width=103)
Hash Cond: ("outer".id = "inner".id)
-> XN Seq Scan on a (cost=0.00..0.03 rows=3 width=70)
-> XN Hash (cost=0.04..0.04 rows=3 width=37)
-> XN Seq Scan on b (cost=0.00..0.04 rows=3 width=37)
Filter: (attribute_id = 3)
-> XN Hash (cost=0.04..0.04 rows=1 width=4)
-> XN Seq Scan on c (cost=0.00..0.04 rows=1 width=4)
Filter: (attribute_id = 4)
(11 rows)
Time: 123.315 ms
如果表包含300万行或更少,并且写入频率较低,则使用DIST-STYLE-ALL应该是安全的。如果确实使用DIST样式键,请验证分发表不会导致行倾斜(使用以下查询进行检查):
“skew_rows”是数据最多和最少的切片之间的数据比率。应该接近1.00。每个表上有多少行?请随意编辑您的问题并显示查询计划。这将使提供建议变得更容易一些。然而,最好的建议是。