Performance 在SQL中处理14亿条记录的最佳方法
我谦恭地请求您帮助我完成以下场景,在我的项目中,我必须处理来自表的时间序列数据。我们正在使用Azure SQL Server 表Performance 在SQL中处理14亿条记录的最佳方法,performance,query-optimization,azure-sql,Performance,Query Optimization,Azure Sql,我谦恭地请求您帮助我完成以下场景,在我的项目中,我必须处理来自表的时间序列数据。我们正在使用Azure SQL Server 表dbo.batch_events有14亿行,请参见下面的屏幕截图了解表结构和示例数据: 我必须将dbo.batch\u events表中的device\u name列中的设备名称透视到列名中,并将透视值加载到dbo.time\u series表中 如果我透视设备名称列,我将在dbo.time_series表中创建685列 请参阅下面的屏幕截图,其中显示了源表中所示示例
dbo.batch_events
有14亿行,请参见下面的屏幕截图了解表结构和示例数据:
我必须将dbo.batch\u events
表中的device\u name
列中的设备名称透视到列名中,并将透视值加载到dbo.time\u series
表中
如果我透视设备名称列,我将在dbo.time_series
表中创建685列
请参阅下面的屏幕截图,其中显示了源表中所示示例数据的目标表(dbo.time_series
)结构和预期输出
请告知用SQL编写查询的最佳方法和途径
我编写的查询需要25小时来处理14亿条记录并将它们加载到目标表中
我已经为源表(dbo.batch\u events
)在时间戳列上创建了按天分区,并创建了两个非聚集索引—一个在设备名称上,另一个在时间戳列上
我谦恭地请求您就编写此场景的查询的最佳方法向我提供建议
我创建的存储过程一次处理一个月的数据;一个月内,我们要处理大约1.2亿行
While
循环将每个月的start\u date
和end\u date
条目放入dbo.Iteration\u ctrl
表中,方法是从dbo.Batch\u events
表中获取最小和最大日期。所以,我在这个表中有12个条目,每个条目代表一个月
while
循环遍历dbo.Iteration\u ctrl
表中的12个start\u date
和end\u date
项,并在while循环中使用pivot查询将数据加载到dbo.Time\u series
表中
DECLARE @MIN_TIME DATETIME, @MIN_TIMESTAMP DATETIME;
DECLARE @MAX_TIME DATETIME, @MAX_TIMESTAMP DATETIME;
DECLARE @DATE DATETIME, @ROWCOUNT INT, @TOTALCOUNT INT;
SELECT @MIN_TIME = MIN(Time_stamp)
FROM [dbo].[BATCH_EVENTS]
SELECT @MAX_TIME = MAX(Time_stamp)
FROM [dbo].[BATCH_EVENTS]
PRINT 'INSERT INTO TABLE [dbo].[ITERATION_CTRL] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30))
WHILE @MIN_TIME < @MAX_TIME
BEGIN
SELECT @DATE = DATEADD(MM, 01, @MIN_TIME)
SELECT @DATE = CASE WHEN @DATE > @MAX_TIME THEN @MAX_TIME ELSE @DATE END
INSERT INTO dbo.ITERATION_CTRL
SELECT @MIN_TIME, @DATE
PRINT 'INSERTION INTO TABLE [dbo].[ITERATION_CTRL] HAS ENDED FOR'+ CAST(@MIN_TIME AS nvarchar(30)) + ' -' + CAST( @DATE AS nvarchar(30)) +' NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' ' + CAST(GETDATE() AS nvarchar(30))
SELECT @MIN_TIME = DATEADD(SS, 01, @DATE)
END
PRINT 'INSERT INTO TABLE [dbo].[Time_series_data] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30))
SELECT @TOTALCOUNT = COUNT(*) FROM dbo.ITERATION_CTRL
SELECT @ROWCOUNT = 1
WHILE @ROWCOUNT <= @TOTALCOUNT
BEGIN
SELECT
@MIN_TIMESTAMP = MIN_DATE,
@MAX_TIMESTAMP = MAX_DATE
FROM dbo.ITERATION_CTRL
WHERE ID = @ROWCOUNT
BEGIN TRANSACTION
INSERT INTO dbo.Time_series_data
SELECT *
FROM
(SELECT
[Event_name], [Time_Stamp],
[Start_time], [End_time], [Duration],
[Value] AS [Sensor_Value],
Equipment_name
FROM
[dbo].[BATCH_EVENTS] BE
WHERE
Time_stamp >= [Start_time] AND Time_stamp <= [End_time]
AND Time_stamp BETWEEN @MIN_TIMESTAMP AND @MAX_TIMESTAMP) t
PIVOT
(MAX([Sensor_Value])
FOR Equipment_Name IN ([MY1102], [MY1138], [MY1180],
[MY1164], [MY1176], [MY204],
[MY324], [MY64B6])
ORDER BY
[Time_Stamp], [Event_name]
COMMIT TRANSACTION
SELECT @ROWCOUNT = @ROWCOUNT + 1
--PRINT @MIN_TIMESTAMP, @MAX_TIMESTAMP
PRINT 'INSERTION INTO TABLE [Time_series_data] HAS ENDED NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' For duration ' + CAST( @MIN_TIMESTAMP AS nvarchar(30))+ ' '+ CAST( @MAX_TIMESTAMP AS nvarchar(30))+' Time '+ CAST(GETDATE() AS nvarchar(30));
END
END
声明@MIN\u TIME DATETIME,@MIN\u TIMESTAMP DATETIME;
声明@MAX_TIME DATETIME,@MAX_TIMESTAMP DATETIME;
声明@DATE DATETIME、@ROWCOUNT INT、@TOTALCOUNT INT;
选择@MIN\u TIME=MIN(时间戳)
来自[dbo]。[批处理事件]
选择@MAX\u TIME=MAX(时间戳)
来自[dbo]。[批处理事件]
打印“插入到表[dbo]。[ITERATION_CTRL]已开始”+CAST(GETDATE()为nvarchar(30))
而@MIN_TIME<@MAX_TIME
开始
选择@DATE=DATEADD(MM,01,@MIN\u时间)
选择@DATE=CASE当@DATE>@MAX\u TIME然后选择@MAX\u TIME ELSE@DATE END
插入到dbo.ITERATION\u CTRL中
选择@MIN_TIME,@DATE
打印“插入到表[dbo]。[ITERATION_CTRL]已结束'+CAST(@MIN_TIME作为nvarchar(30))+'-'+CAST(@DATE作为nvarchar(30))+'插入的行数:'+CAST(@@ROWCOUNT作为nvarchar(30))+''+CAST(GETDATE()作为nvarchar(30))
选择@MIN_TIME=DATEADD(SS,01,@DATE)
结束
打印“插入到表[dbo]。[Time_series_data]已开始”+CAST(GETDATE()为nvarchar(30))
从dbo.ITERATION\u CTRL中选择@TOTALCOUNT=COUNT(*)
选择@ROWCOUNT=1
而@ROWCOUNT=[开始时间]和时间戳