Sql 如何快速更新范围内的极限列?

Sql 如何快速更新范围内的极限列?,sql,sql-server,sql-server-2008-r2,Sql,Sql Server,Sql Server 2008 R2,我有40个表,如下所示,每个表包含3000万条记录 表RawData:PKCaregoryID,时间 到目前为止,所有记录的IsSampled列都是0。 我需要更新记录,以便对于每个CategoryID和每个分钟范围,具有MaxValue、MinValue和第一条记录的记录应具有1个IsSampled 下面是我创建的过程查询,但运行时间太长。每个桌子大约2小时30米 DECLARE @startRange datetime DECLARE @endRange datetime DECLARE

我有40个表,如下所示,每个表包含3000万条记录

表RawData:PKCaregoryID,时间

到目前为止,所有记录的IsSampled列都是0。 我需要更新记录,以便对于每个CategoryID和每个分钟范围,具有MaxValue、MinValue和第一条记录的记录应具有1个IsSampled

下面是我创建的过程查询,但运行时间太长。每个桌子大约2小时30米

DECLARE @startRange datetime 
DECLARE @endRange datetime 
DECLARE @endTime datetime 
SET @startRange = '2012-07-01 00:00:00.000'
SET @endTime = '2012-08-01 00:00:00.000'

WHILE (@startRange < @endTime)
BEGIN
    SET @endRange = DATEADD(MI, 1, @startRange)

    UPDATE r1
    SET IsSampled = 1
    FROM RawData AS r1
    JOIN 
    (
      SELECT r2.CategoryID, 
             MAX(Value) as MaxValue, 
             MIN(Value) as MinValue, 
             MIN([Time]) AS FirstTime
      FROM RawData AS r2
      WHERE @startRange <= [Time] AND [Time] < @endRange
      GROUP BY CategoryID
    ) as samples
    ON r1.CategoryID = samples.CategoryID
       AND (r1.Value = samples.MaxValue 
            OR r1.Value = samples.MinValue 
            OR r1.[Time] = samples.FirstTime)
       AND @startRange <= r1.[Time] AND r1.[Time] < @endRange

    SET @startRange = DATEADD(MI, 1, @startRange)   
END    

是否有办法以非程序方式快速恢复更新这些表?谢谢

我不确定这种方法的性能如何,但它比您当前的方法更加基于集合:

declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)

;with BinnedValues as (
    select CategoryID,Time,IsSampled,Value,DATEADD(minute,DATEDIFF(minute,0,Time),0) as TimeBin
    from @T
), MinMax as (
    select CategoryID,Time,IsSampled,Value,TimeBin,
        ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Value) as MinPos,
        ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Value desc) as MaxPos,
        ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Time) as Earliest
    from
        BinnedValues
)
update MinMax set IsSampled = 1 where MinPos=1 or MaxPos=1 or Earliest=1

select * from @T
结果:

CategoryID  Time                   IsSampled Value
----------- ---------------------- --------- ---------------------------------------
1           2012-07-01 00:00:00.00 1         65.36347
1           2012-07-01 00:00:11.00 0         80.16729
1           2012-07-01 00:00:14.00 0         29.19716
1           2012-07-01 00:00:25.00 1         7.05847
1           2012-07-01 00:00:36.00 1         98.08257
1           2012-07-01 00:00:57.00 0         75.35524
1           2012-07-01 00:00:59.00 0         35.35524
如果可以将TimeBin列作为计算列添加到表中并添加到适当的索引中,则可能会加快速度


还应注意,这将标记最多3行作为采样-如果最早的也是最小值或最大值,则显然只标记一次,但不会标记下一个最近的最小值或最大值。此外,如果多行具有相同的值,即最小值或最大值,则将任意选择其中一行。

您可以将循环中的更新重写为类似以下内容:

   UPDATE r1
   SET   IsSampled = 1
   FROM  RawData r1
   WHERE r1.Time >= @startRange and Time < @endRange

   AND NOT EXISTS
    (
        select *
        from    RawData r2
        where   r2.CategoryID = r1.CategoryID
        and     r2.Time >= @startRange and r2.Time < @endRange 
        and     (r2.Time < r1.Time or r2.Value < r1.Value or r2.Value > r1.Value)
    )
要获得实际的性能改进,您需要时间列索引。

您好,请尝试此查询

declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)


;WITH CTE as (SELECT CategoryID,CAST([Time] as Time) as time,IsSampled,Value FROM @T)

,CTE2 as (SELECT CategoryID,Max(time) mx,MIN(time) mn,'00:00:00.0000000' as start FROM CTE where time <> '00:00:00.0000000' Group by CategoryID)

update @T SET IsSampled=1
FROM CTE2 c inner join @T t on c.CategoryID = t.CategoryID and (CAST(t.[Time] as Time)=c.mx or CAST(t.[Time] as Time)=c.mn or CAST(t.[Time] as Time)=c.start)

select * from @T

嗨,这里是最新更新的查询。 检查查询的性能:

declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)


;WITH CTE as (SELECT CategoryID,Time,CAST([Time] as Time) as timepart,IsSampled,Value FROM @T)
--SELECT * FROM CTE

,CTE2 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE 
where timepart <> '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
 Group by CategoryID)

,CTE3 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE 
where timepart = '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
 Group by CategoryID)

update @T SET IsSampled=1
FROM @T t left join CTE2 c1
on (t.CategoryID = c1.CategoryID  and (t.Value = c1.mn or t.Value =c1.mx))
left join CTE3 c3 on(t.CategoryID = c3.CategoryID and t.Value = c3.mx)
where (c1.CategoryID is not null or c3.CategoryID is not null)


select * from @T

此类更新中受影响的行数是多少?每个表平均应更新150万条记录。通常每秒钟有一条记录,所以你认为30密耳/ 60×3=1.5毫秒,你是否考虑改变PK到时间,CategoryID?我不熟悉索引,但是我的同事告诉我最好把PK设置为分类ID,时间,因为记录通常是先用CythyId搜索,然后是这个表的时间。哇。当它在一张有1密耳的桌子上运行时。记录,此查询在7秒内运行,而原始查询需要57秒。谢谢OP搜索的最小值和最大值是该时间范围内的最小值和最大值,而不是最小值和最大值。在这种情况下,我猜对于给定的样本表,我们应该为59sex max、11秒min和00秒rite设置IsSampled=1?不,最大值98.08257出现在36秒。最小值7.05847出现在25秒时。在OP问题顶部的表格中,他们通过显示“->1”来显示要从0转换到1的行。哦,对不起,我想到了最大和最小时间。你好,Anand,谢谢你的回答。但是,查询结果与我的预期不同。IsSampled列应标记为每分钟范围,但此查询标记整个范围的列。
declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)


;WITH CTE as (SELECT CategoryID,Time,CAST([Time] as Time) as timepart,IsSampled,Value FROM @T)
--SELECT * FROM CTE

,CTE2 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE 
where timepart <> '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
 Group by CategoryID)

,CTE3 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE 
where timepart = '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
 Group by CategoryID)

update @T SET IsSampled=1
FROM @T t left join CTE2 c1
on (t.CategoryID = c1.CategoryID  and (t.Value = c1.mn or t.Value =c1.mx))
left join CTE3 c3 on(t.CategoryID = c3.CategoryID and t.Value = c3.mx)
where (c1.CategoryID is not null or c3.CategoryID is not null)


select * from @T