Mysql 在sql中执行SCD的通用过程
我在mssql服务器中有2个表。我可以通过自定义insert/update/delete和Merge语句执行scdMysql 在sql中执行SCD的通用过程,mysql,sql,sql-server,tsql,scd,Mysql,Sql,Sql Server,Tsql,Scd,我在mssql服务器中有2个表。我可以通过自定义insert/update/delete和Merge语句执行scd 我想知道是否有任何通用程序可以满足这一目的。我们只要给它两张表,它就会形成SCD。SQL server 2008中有任何选项吗? 谢谢不,没有,也不可能有一个通用的适用于任何表的。有几个原因: 您如何知道哪种SCD类型?(好的,可能是另一个参数,但是…) 你怎么知道哪一列应该被历史化,哪一列应该被覆盖 如何确定哪列是业务键、代理键、过期列等等 要在update语句中指定列,
谢谢不,没有,也不可能有一个通用的适用于任何表的。有几个原因:
- 您如何知道哪种SCD类型?(好的,可能是另一个参数,但是…)
- 你怎么知道哪一列应该被历史化,哪一列应该被覆盖
- 如何确定哪列是业务键、代理键、过期列等等
- 要在update语句中指定列,必须编写动态sql,这是可能的,但上面的一点起作用
UPSERT
来说,通常使用临时表,MERGE
语句对于SCD来说很糟糕,除非在特殊情况下。这是因为您不能将MERGE
语句与INSERT/UPDATE
一起使用,您必须为此禁用外键,因为UPDATE
实现为DELETE然后INSERT
(或者类似的,我记不清了,但我在尝试时遇到了这些问题)
我更喜欢这样做(SCD类型2和SQL Server):
第1步:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;
SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;
UPDATE D_yourDimensionName SET
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn
)
;
INSERT INTO D_yourDimensionName
SELECT * FROM tmpDimYourDimensionName
WHERE sid = 0;
第二步:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;
SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;
UPDATE D_yourDimensionName SET
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn
)
;
INSERT INTO D_yourDimensionName
SELECT * FROM tmpDimYourDimensionName
WHERE sid = 0;
步骤2中的ISNULL(d.sid,0)
非常重要。如果条目已存在,则返回维度的代理项id,否则返回0
第三步:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;
SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;
UPDATE D_yourDimensionName SET
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn
)
;
INSERT INTO D_yourDimensionName
SELECT * FROM tmpDimYourDimensionName
WHERE sid = 0;
就是这样。不,没有,也不可能有一个通用的,无论你传递给它什么表都适用的。有几个原因:
- 您如何知道哪种SCD类型?(好的,可能是另一个参数,但是…)
- 你怎么知道哪一列应该被历史化,哪一列应该被覆盖
- 如何确定哪列是业务键、代理键、过期列等等
- 要在update语句中指定列,必须编写动态sql,这是可能的,但上面的一点起作用
UPSERT
来说,通常使用临时表,MERGE
语句对于SCD来说很糟糕,除非在特殊情况下。这是因为您不能将MERGE
语句与INSERT/UPDATE
一起使用,您必须为此禁用外键,因为UPDATE
实现为DELETE然后INSERT
(或者类似的,我记不清了,但我在尝试时遇到了这些问题)
我更喜欢这样做(SCD类型2和SQL Server):
第1步:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;
SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;
UPDATE D_yourDimensionName SET
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn
)
;
INSERT INTO D_yourDimensionName
SELECT * FROM tmpDimYourDimensionName
WHERE sid = 0;
第二步:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;
SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;
UPDATE D_yourDimensionName SET
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn
)
;
INSERT INTO D_yourDimensionName
SELECT * FROM tmpDimYourDimensionName
WHERE sid = 0;
步骤2中的ISNULL(d.sid,0)
非常重要。如果条目已存在,则返回维度的代理项id,否则返回0
第三步:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;
SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;
UPDATE D_yourDimensionName SET
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn
)
;
INSERT INTO D_yourDimensionName
SELECT * FROM tmpDimYourDimensionName
WHERE sid = 0;
就这样。@user1765876更新了我的答案。@user1765876有什么反馈吗?是的,先生,通过插入更新我已经知道了。。我想知道你是否可以通过merge语句做些什么。是的,你可以,就像我说的,但是你必须禁用外键(如果它是星型模式的话,通常是从facts表中),这取决于你想要实现什么类型的SCD。而且,没有一种合理的方法可以编写一个过程,无论您对它抛出什么表,都可以应用它。当然,您可以为每个维度编写一个过程。请随意。@user1765876更新了我的答案。@user1765876有任何反馈吗?是的,先生,通过插入更新我已经知道了。。我想知道你是否可以通过merge语句做些什么。是的,你可以,就像我说的,但是你必须禁用外键(如果它是星型模式的话,通常是从facts表中),这取决于你想要实现什么类型的SCD。而且,没有一种合理的方法可以编写一个过程,无论您对它抛出什么表,都可以应用它。当然,您可以为每个维度编写一个过程。请随意