Tsql 删除两个日期之间未更改的行
我在SQLServer中继承了一个非常古老且巨大(100万行)的表。这是账户和余额的每日快照,但其中一些账户甚至不再变化,而是每天都在添加(不要问为什么!) 我想:Tsql 删除两个日期之间未更改的行,tsql,sql-server-2012,Tsql,Sql Server 2012,我在SQLServer中继承了一个非常古老且巨大(100万行)的表。这是账户和余额的每日快照,但其中一些账户甚至不再变化,而是每天都在添加(不要问为什么!) 我想: 标识并删除未更改的行,除非有更改 创建一个查询,当删除的行需要时,该查询会提供给我,就像它们仍然存在一样 我有一个可以利用的日期维度表 这将生成当前表: CREATE TABLE #Account_Snapshot( [Snapshot_Id] [int] NOT NULL, [Snapshot_Date] [date] NOT
- 标识并删除未更改的行,除非有更改
- 创建一个查询,当删除的行需要时,该查询会提供给我,就像它们仍然存在一样
CREATE TABLE #Account_Snapshot(
[Snapshot_Id] [int] NOT NULL,
[Snapshot_Date] [date] NOT NULL,
[Account] [nvarchar](20) NOT NULL,
[Balance] [decimal](18, 2) NOT NULL,
CONSTRAINT [PK_Account_Snapshot_1] PRIMARY KEY CLUSTERED
(
[Snapshot_Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO #Account_Snapshot VALUES(1, '2019-01-01', '1', 1505.31)
INSERT INTO #Account_Snapshot VALUES(2, '2019-01-01', '2', 2337.48)
INSERT INTO #Account_Snapshot VALUES(3, '2019-01-01', '3', 1088.07)
INSERT INTO #Account_Snapshot VALUES(4, '2019-02-01', '1', 1505.31)
INSERT INTO #Account_Snapshot VALUES(5, '2019-02-01', '2', 2132.17)
INSERT INTO #Account_Snapshot VALUES(6, '2019-02-01', '3', 1088.07)
INSERT INTO #Account_Snapshot VALUES(7, '2019-03-01', '1', 1505.31)
INSERT INTO #Account_Snapshot VALUES(8, '2019-03-01', '2', 2132.17)
INSERT INTO #Account_Snapshot VALUES(9, '2019-03-01', '3', 749.23)
SELECT * FROM #Account_Snapshot
ORDER BY Account, Snapshot_Date
Snapshot_Id Snapshot_Date Account Balance
----------- ------------- -------------------- ---------------------
1 2019-01-01 1 1505.31
4 2019-02-01 1 1505.31
7 2019-03-01 1 1505.31
2 2019-01-01 2 2337.48
5 2019-02-01 2 2132.17
8 2019-03-01 2 2132.17
3 2019-01-01 3 1088.07
6 2019-02-01 3 1088.07
9 2019-03-01 3 749.23
我需要一个删除逻辑来标识未更改的行并首先删除它们:
DELETE FROM #Account_Snapshot WHERE Snapshot_Id IN (4,6,7,8)
SELECT * FROM #Account_Snapshot
ORDER BY Account, Snapshot_Date
Snapshot_Id Snapshot_Date Account Balance
----------- ------------- -------------------- --------------------
1 2019-01-01 1 1505.31
2 2019-01-01 2 2337.48
5 2019-02-01 2 2132.17
3 2019-01-01 3 1088.07
9 2019-03-01 3 749.23
然后,当表小得多时,我需要有一个查询来创建一个视图,可能需要使用日期维度表来动态生成结果。要识别没有更改的行,我们可以在CTE中使用带分区的行编号,如下所示:
;with cte as (
select *
,ROW_NUMBER() over(partition by [Account],[Balance] order by [Account],Snapshot_Date,[Balance]) [Row]
from #Account_Snapshot
),DataFiltered as (select * from cte where cte.[Row]=1 --not a duplicate
),DataDuplicate as (select * from cte where cte.[Row]>1 --any number larger than one is a duplicate
)
select * from DataFiltered
ORDER BY Account, Snapshot_Date
结果:-
Snapshot_Id Snapshot_Date Account Balance Row
1 2019-01-01 1 1505.31 1
2 2019-01-01 2 2337.48 1
5 2019-02-01 2 2132.17 1
3 2019-01-01 3 1088.07 1
9 2019-03-01 3 749.23 1
Snapshot_Id Snapshot_Date Account Balance Row
4 2019-02-01 1 1505.31 2
7 2019-03-01 1 1505.31 3
8 2019-03-01 2 2132.17 2
6 2019-02-01 3 1088.07 2
要获取重复数据,我们可以将最后两行更改为以下内容:-
select * from DataDuplicate
ORDER BY Account, Snapshot_Date
结果:-
Snapshot_Id Snapshot_Date Account Balance Row
1 2019-01-01 1 1505.31 1
2 2019-01-01 2 2337.48 1
5 2019-02-01 2 2132.17 1
3 2019-01-01 3 1088.07 1
9 2019-03-01 3 749.23 1
Snapshot_Id Snapshot_Date Account Balance Row
4 2019-02-01 1 1505.31 2
7 2019-03-01 1 1505.31 3
8 2019-03-01 2 2132.17 2
6 2019-02-01 3 1088.07 2
要删除重复数据,我们将最后两行替换为:-
delete from #Account_Snapshot where Snapshot_Id in (
select Snapshot_Id from DataDuplicate)
我希望这会有所帮助。这里有一个用于获取帐户上一个余额的选项,在子查询中使用该选项,然后您可以筛选余额等于上一个余额的位置(基本上没有更改),以获取要删除的记录:
--Show what records to remove
SELECT *
FROM (
SELECT *
, LAG([Balance], 1, 0) OVER ( PARTITION BY [Account]
ORDER BY [Snapshot_Date]
) AS [PreBalance]
FROM [#Account_Snapshot]
) AS [p]
WHERE [p].[Balance] = [p].[PreBalance];
--Then just to delete them
DELETE p
FROM (
SELECT *
, LAG([Balance], 1, 0) OVER ( PARTITION BY [Account]
ORDER BY [Snapshot_Date]
) AS [PreBalance]
FROM [#Account_Snapshot]
) AS [p]
WHERE [p].[Balance] = [p].[PreBalance];
删除后,要像以前一样表示数据,是的,您需要一个日历表,从中构建一个帐户/日期表(所有帐户和所有日期的组合),然后交叉应用回原始表以填充余额
例如:
--Quick sample calendar table
INSERT INTO [#cal] (
[CalDate]
)
VALUES ('2019-01-01')
, ('2019-02-01')
, ('2019-03-01');
--So this would get us a full list of all accounts and dates based on you calendar table
SELECT *
FROM [#cal] [a]
INNER JOIN (
SELECT DISTINCT [Account]
FROM [#Account_Snapshot]
) [b]
ON 1 = 1;
结果如下:
CalDate Account
2019-01-01 1
2019-02-01 1
2019-03-01 1
2019-01-01 2
2019-02-01 2
2019-03-01 2
2019-01-01 3
2019-02-01 3
2019-03-01 3
然后在子查询中使用它,并交叉应用回account表以填写余额:
SELECT *
FROM (
SELECT *
FROM [#cal] [a]
INNER JOIN (
SELECT DISTINCT [Account]
FROM [#Account_Snapshot]
) [b]
ON 1 = 1
) AS [AcctCal]
CROSS APPLY (
SELECT TOP 1 [acct].[Balance]
FROM [#Account_Snapshot] [acct]
WHERE [acct].[Account] = [AcctCal].[Account]
AND [acct].[Snapshot_Date] <= [AcctCal].[CalDate]
ORDER BY [acct].[Snapshot_Date] desc
) AS [bal];
试试这个:
;with cte as (
select *,ROW_NUMBER() over (partition by balance order by Snapshot_Id) rn from #Account_Snapshot
)
Delete from cte where rn > 1
是否对当时存在的每个帐户都有每日记录?提示:使用适当的软件(MySQL、Oracle、DB2等)和版本(例如,
sql-server-2014
)标记数据库问题很有帮助。语法和特征的差异通常会影响答案。请注意,tsql
缩小了选择范围,但没有指定数据库。感谢您的提示,我标记了产品和版本。是的,每天都有所有帐户的快照,甚至是历史帐户。我试图改变这一点,但有太多的依赖关系使其无法实现,这就是为什么我试图提出一种不同的方法。可能是使用LAG函数在子查询中获取上一个余额,然后您可以过滤余额等于上一个余额的位置(基本上没有改变)。作为您的另一个问题的一部分,您希望视图像以前一样表示数据,您是否在这些结果中包括快照_id?不,这只是表的id,我真的不关心。这非常好!非常感谢。刚刚在备份上测试过,效果很好。现在我需要一个查询的帮助,这个查询可以为我删除的某个日期生成结果。想让这个答案有用,但它不允许我。哇!那真是个了不起的人!非常感谢你。我正在为一个特定的日期编写一个函数,但这是我一直在寻找的函数。想让这个答案有用,但它不让我这么做。