Sql 用唯一密钥优化多个内部自联接
假设我有一些带有事务的表Sql 用唯一密钥优化多个内部自联接,sql,query-optimization,Sql,Query Optimization,假设我有一些带有事务的表X,其中CUSTOMER\u ID是主键 此外,我还有数百个“功能”(就机器学习而言),即此表上的查询文本X。 所有这些查询都类似于: 问题1: SELECT X.CUSTOMER_ID, WHEN(X.GENDER = "F" AND X.IS_PREGNANT = TRUE) THEN 1 OTHERWISE 0 AS WILL_BUY_FOR_KIDS FROM X 查询xxx: SELECT X.CUSTOMER_ID, WHEN(X.GENDER = "M"
X
,其中CUSTOMER\u ID
是主键
此外,我还有数百个“功能”(就机器学习而言),即此表上的查询文本X
。
所有这些查询都类似于:
问题1:
SELECT
X.CUSTOMER_ID,
WHEN(X.GENDER = "F" AND X.IS_PREGNANT = TRUE) THEN 1 OTHERWISE 0 AS WILL_BUY_FOR_KIDS
FROM X
查询xxx:
SELECT
X.CUSTOMER_ID,
WHEN(X.GENDER = "M" AND X.AVG_AMOUNT > 1000) THEN 1 OTHERWISE 0 AS RICH_DUDE
FROM X
任务是生成包含所有“特征”的表格,这些特征是根据
X
表格计算得出的。
因此,我需要使用“feature”查询的文本(以编程方式)创建输出查询的文本。
比如:
SELECT
*
FROM SOME_QUERY_1
INNER JOIN SOME_QUERY_X
ON SOME_QUERY_1.CUSTOMER_ID = SOME_QUERY_X.CUSTOMER_ID
...
当内部自连接数百个子查询时,上述输出查询可能非常慢。
显然,如果SQL引擎将此查询“重写”为类似(避免连接)的内容,那就太酷了:
几个问题:
(a+b)*a=a^2+b*a
)。关系型algrebra有这样的规则吗李>
首先,您应该将查询编写为:
SELECT X1.A * 2, // Some operation on X1.A
X2.B / 2 // Some operation on X2.B
FROM X X1 JOIN
X X2
ON X1.C = X2.C;
子查询不提供任何值(但稍后我将返回到该值)
如果C
声明为unique
(或primary key
),则字段上有一个索引。我很确定每个数据库仍然会进行加入
,但速度会非常快:
- 处理记录(
)并查找X1
C
- 在索引中查找
C
- 返回内存中已有的记录以获取
X2
您可能会问,为什么数据库没有对此进行优化。嗯,数据库开发人员更关注于优化编写良好的查询,而不是优化编写糟糕的查询。如果不需要
连接
,那么查询写入应该知道这一点。(实际上,JOIN
有一个用途,它过滤掉NULL
值。)看来Oracle的优化器完成了这项工作
drop table x;
create table x (a int primary key,b int);
select x0.b,x1.b,x2.b,x3.b,x4.b,x5.b,x6.b,x7.b,x8.b,x9.b
from (select x.a,x.b from x) x0
join (select x.a,x.b from x) x1 on x1.a = x0.a
join (select x.a,x.b from x) x2 on x2.a = x0.a
join (select x.a,x.b from x) x3 on x3.a = x0.a
join (select x.a,x.b from x) x4 on x4.a = x0.a
join (select x.a,x.b from x) x5 on x5.a = x0.a
join (select x.a,x.b from x) x6 on x6.a = x0.a
join (select x.a,x.b from x) x7 on x7.a = x0.a
join (select x.a,x.b from x) x8 on x8.a = x0.a
join (select x.a,x.b from x) x9 on x9.a = x0.a
;
注意执行计划中的9消除连接
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| X | 1 | 26 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - SEL$44564B95 / X@SEL$2
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
FULL(@"SEL$44564B95" "X"@"SEL$2")
OUTLINE(@"SEL$3")
OUTLINE(@"SEL$2")
OUTLINE(@"SEL$1")
MERGE(@"SEL$3")
MERGE(@"SEL$2")
OUTLINE(@"SEL$5428C7F1")
OUTLINE(@"SEL$5")
OUTLINE(@"SEL$4")
MERGE(@"SEL$5428C7F1")
MERGE(@"SEL$5")
OUTLINE(@"SEL$730B2DEF")
OUTLINE(@"SEL$7")
OUTLINE(@"SEL$6")
MERGE(@"SEL$730B2DEF")
MERGE(@"SEL$7")
OUTLINE(@"SEL$DE510E9C")
OUTLINE(@"SEL$9")
OUTLINE(@"SEL$8")
MERGE(@"SEL$DE510E9C")
MERGE(@"SEL$9")
OUTLINE(@"SEL$6C54F645")
OUTLINE(@"SEL$11")
OUTLINE(@"SEL$10")
MERGE(@"SEL$6C54F645")
MERGE(@"SEL$11")
OUTLINE(@"SEL$5E3B1022")
OUTLINE(@"SEL$13")
OUTLINE(@"SEL$12")
MERGE(@"SEL$5E3B1022")
MERGE(@"SEL$13")
OUTLINE(@"SEL$D60B40D8")
OUTLINE(@"SEL$15")
OUTLINE(@"SEL$14")
MERGE(@"SEL$D60B40D8")
MERGE(@"SEL$15")
OUTLINE(@"SEL$B8655000")
OUTLINE(@"SEL$17")
OUTLINE(@"SEL$16")
MERGE(@"SEL$B8655000")
MERGE(@"SEL$17")
OUTLINE(@"SEL$EC740ABE")
OUTLINE(@"SEL$19")
OUTLINE(@"SEL$18")
MERGE(@"SEL$EC740ABE")
MERGE(@"SEL$19")
OUTLINE(@"SEL$7AC5A3AA")
OUTLINE(@"SEL$20")
MERGE(@"SEL$7AC5A3AA")
OUTLINE(@"SEL$F6D45FB3")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$17")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$15")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$13")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$11")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$9")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$7")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$5")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$3")
OUTLINE(@"SEL$5A225B26")
ELIMINATE_JOIN(@"SEL$5A225B26" "X"@"SEL$19")
OUTLINE_LEAF(@"SEL$44564B95")
ALL_ROWS
DB_VERSION('11.2.0.2')
OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
Column Projection Information (identified by operation id):
1 - "X"."B"[NUMBER,22]
Note
-----
- dynamic sampling used for this statement (level=2)
这些子查询的目的是什么?顺便说一句:postgres足够聪明,可以折叠子查询,但它没有检测到x1.*和x2.*引用同一个元组。结果是在两个索引扫描上合并连接。@wildplasser我已更新描述,以便为问题添加更多上下文。请不要更新问题,这会使现有答案看起来很愚蠢(事实并非如此)。相反,将新材料放入一个新问题中。主要问题恰恰是通过主键避免多个内部自连接。我已经更新了描述,为问题添加了更多上下文。酷。最有趣的是,在其他SQL引擎中,消除连接阶段是常见的吗?有没有经过验证的关系规则可以做到这一点,或者这是一个“黑客”?嗨,拉尔斯:-)这些问题太广泛了。从理论到每个具体提供者的实现还有很长的距离。
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| X | 1 | 26 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - SEL$44564B95 / X@SEL$2
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
FULL(@"SEL$44564B95" "X"@"SEL$2")
OUTLINE(@"SEL$3")
OUTLINE(@"SEL$2")
OUTLINE(@"SEL$1")
MERGE(@"SEL$3")
MERGE(@"SEL$2")
OUTLINE(@"SEL$5428C7F1")
OUTLINE(@"SEL$5")
OUTLINE(@"SEL$4")
MERGE(@"SEL$5428C7F1")
MERGE(@"SEL$5")
OUTLINE(@"SEL$730B2DEF")
OUTLINE(@"SEL$7")
OUTLINE(@"SEL$6")
MERGE(@"SEL$730B2DEF")
MERGE(@"SEL$7")
OUTLINE(@"SEL$DE510E9C")
OUTLINE(@"SEL$9")
OUTLINE(@"SEL$8")
MERGE(@"SEL$DE510E9C")
MERGE(@"SEL$9")
OUTLINE(@"SEL$6C54F645")
OUTLINE(@"SEL$11")
OUTLINE(@"SEL$10")
MERGE(@"SEL$6C54F645")
MERGE(@"SEL$11")
OUTLINE(@"SEL$5E3B1022")
OUTLINE(@"SEL$13")
OUTLINE(@"SEL$12")
MERGE(@"SEL$5E3B1022")
MERGE(@"SEL$13")
OUTLINE(@"SEL$D60B40D8")
OUTLINE(@"SEL$15")
OUTLINE(@"SEL$14")
MERGE(@"SEL$D60B40D8")
MERGE(@"SEL$15")
OUTLINE(@"SEL$B8655000")
OUTLINE(@"SEL$17")
OUTLINE(@"SEL$16")
MERGE(@"SEL$B8655000")
MERGE(@"SEL$17")
OUTLINE(@"SEL$EC740ABE")
OUTLINE(@"SEL$19")
OUTLINE(@"SEL$18")
MERGE(@"SEL$EC740ABE")
MERGE(@"SEL$19")
OUTLINE(@"SEL$7AC5A3AA")
OUTLINE(@"SEL$20")
MERGE(@"SEL$7AC5A3AA")
OUTLINE(@"SEL$F6D45FB3")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$17")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$15")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$13")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$11")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$9")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$7")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$5")
ELIMINATE_JOIN(@"SEL$F6D45FB3" "X"@"SEL$3")
OUTLINE(@"SEL$5A225B26")
ELIMINATE_JOIN(@"SEL$5A225B26" "X"@"SEL$19")
OUTLINE_LEAF(@"SEL$44564B95")
ALL_ROWS
DB_VERSION('11.2.0.2')
OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
Column Projection Information (identified by operation id):
1 - "X"."B"[NUMBER,22]
Note
-----
- dynamic sampling used for this statement (level=2)