Sql 查看以获取条件复杂的最小日期

Sql 查看以获取条件复杂的最小日期,sql,sql-server,tsql,view,minimum,Sql,Sql Server,Tsql,View,Minimum,我在SQL Server中有这样一个表: +----------+-----------+------------+ | DateFrom | Completed | EmployeeID | +----------+-----------+------------+ DateFrom: date not null -- unique for each EmployeeID Completed: bit not null EmployeeID: bigint not null 每一行都属于由

我在SQL Server中有这样一个表:

+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
DateFrom: date not null -- unique for each EmployeeID
Completed: bit not null
EmployeeID: bigint not null
  • 每一行都属于由开始日期定义的子周期,可以完成也可以不完成
  • 每个员工可以有多个子周期
  • 周期由一系列有序子周期定义,直到最后一个子周期完成
我想创建一个视图,返回每个EmployeeID的最后一个期间的开始日期,如下所示:

+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
DateFrom: date not null -- unique for each EmployeeID
Completed: bit not null
EmployeeID: bigint not null
  • 如果没有Completed is true,则从中获取最小日期。[该员工有一段时间尚未完成]
  • 否则,返回上次完成后的最小日期。[最后一期尚未结束]
  • 如果maximum DateFrom has Completed=true,则返回上次完成的日期之前的最小日期from为true,如果存在,则返回该日期之前的true之后的最小日期from。[最后一个期间由多个子期间完成]
  • 如果最大DateFrom has Completed=true,并且没有其他行或在其完成之前的行=true,则返回最大DateFrom。[最后一个期间由一个子期间完成]
  • 我正在寻找最优化的解决方案

    我尝试了这个,但在第三个示例中得到了一个空值:

    WITH T AS (
        SELECT EmployeeID
            , MAX(CASE WHEN Completed = 0 THEN NULL ELSE DateFrom END) MaxDateFrom 
        FROM TableDates
        GROUP BY EmployeeID
    )
    SELECT TableDates.EmployeeID, MIN(TableDates.DateFrom) DateFrom
    FROM T
    LEFT JOIN TableDates ON T.EmployeeID = TableDates.EmployeeID
        AND (T.MaxDateFrom IS NULL OR TableDates.DateFrom > T.MaxDateFrom)
    GROUP BY TableDates.EmployeeID
    

    这是一个有效的查询。这可能太复杂了,但我把简化留给你

    它处理3个案例,按要求按EmployeeId进行分区,如下所示:

  • 当不存在
    Completed=1
    时,使用
    sum(Completed)over()
    检测,然后使用
    第一个值(DateFrom)

  • 当最后一行值为
    completed=1
    且前一行为
    completed=0
    时,使用
    last_值(completed)
    lag(completed)
    进行检测,然后使用
    max(完成时的大小写为0,然后从其他空结束日期)

  • 棘手的情况是,
    Completed=1
    存在并且不是最后一个。在这种情况下,找到最近一行的DateFrom,其中
    Completed=1
    ,然后找到比先前检测到的行最近的所有行的
    min(DateFrom)
    ,直到前面的
    Completed=1

  • 如果最后一行具有
    completed=1
    ,并且倒数第二行具有
    completed=1
    ,则使用最后一行的
    DateFrom
    。如果所有其他选项都为空,则Coalesce将确保这一点


  • 注意:这假设每个日期只有一行。

    我认为您只需要条件聚合——带有一系列逻辑。假设您每天都有行,我想这就是您想要的:

    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else dateadd(day, 1, max(case when completed = 'true' then datefrom end))
            end)
    from (select t.*,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;
    
    每天需要一行实际上只是一种方便——例如,允许代码在特定的“true”false之后添加一天来获取日期。这也可以通过在子查询中使用
    lead()
    来实现

    注意:这不会处理所有条件(至少对于非空日期而言)。例如,当数据末尾有一系列“true”时,它返回
    NULL

    如果这是一个问题——你的问题的这个版本已经被问过了。用适当的样本数据和期望的结果问一个新问题。我还认为你可能能够解释你试图解决的问题,并简化解释

    编辑:

    如果缺少日期,您可以使用:

    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else max(case when completed = 'true' then next_datefrom end)
            end)
    from (select t.*,
                 lead(datefrom) over (partition by employeeid order by datefrom) as next_datefrom,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;
    

    非常感谢!只是,如果你能帮忙的话,我想创建一个分组依据。@Faresdell我猜你忘记在示例数据中添加员工ID了?请参阅编辑。我在第一篇文章中提到我将创建一个分组依据,然后我用一个完整的示例对其进行了编辑。@Faresdell但是你的示例数据没有包含它-应该包含它。@Faresdell你的sample数据应该包括这样的条件。非常感谢!只是,请注意我们没有每天的行。@FaresDelel…只需在子查询中使用
    lead()
    ,并使用下一个日期,而不是
    dateadd()
    。您的示例数据确实包含每天的数据。我们可以像这样优化解决方案:
    选择EmployeeID,(最小时的案例(完成时的案例数量)=0,然后最小时的案例(完成时的案例数量=0,然后从结束日期开始)或者最小时的案例(完成时的案例数量=1,然后从结束日期开始)最终结果从(选择t.*,求和(完成时的大小写为'true'然后为1,否则为0结束)超过(按EmployeeID按日期从desc开始的分割顺序)作为已完成的(从t开始的seqnum)按EmployeeID分组;
    WITH T AS (
        SELECT EmployeeID
            , MAX(CASE WHEN Completed = 0 THEN NULL ELSE DateFrom END) MaxDateFrom 
        FROM TableDates
        GROUP BY EmployeeID
    )
    SELECT TableDates.EmployeeID, MIN(TableDates.DateFrom) DateFrom
    FROM T
    LEFT JOIN TableDates ON T.EmployeeID = TableDates.EmployeeID
        AND (T.MaxDateFrom IS NULL OR TableDates.DateFrom > T.MaxDateFrom)
    GROUP BY TableDates.EmployeeID
    
    insert into @Test (EmployeeId, DateFrom, Completed)
    values
    -- Scenario 1
    (1, '2021-01-01', 0),
    (1, '2021-01-02', 0),
    (1, '2021-01-03', 0),
    -- Scenario 2
    (2, '2021-01-01', 0),
    (2, '2021-01-02', 1),
    (2, '2021-01-03', 0),
    (2, '2021-01-04', 0),
    -- Scenario 3
    (3, '2021-01-01', 0),
    (3, '2021-01-02', 1),
    (3, '2021-01-03', 0),
    (3, '2021-01-04', 1),
    -- Special case, single row
    (4, '2021-01-01', 1),
    -- Scenario 4
    (5, '2021-01-01', 0),
    (5, '2021-01-02', 0),
    (5, '2021-01-03', 1);
    
    with cte as (
      select *
        -- First value of DateFrom over all rows (not the default)
        , first_value (DateFrom) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) FirstDateFrom
        -- Last value of Completed over all rows (not the default)
        , last_value (Completed) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompleted
        -- Find the Date of the last row with Completed = 1
        , max (case when Completed = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompletedNew
        -- Regular row number
        , row_number() over (partition by EmployeeId order by DateFrom desc) RowNumber
        -- Total number of rows with Completed = 1
        , sum(convert(int,Completed)) over (partition by EmployeeId) SumOfCompleted
        -- Max value of DateFrom where Completed = 0
        , max(case when Completed = 0 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) MaxDateFrom
        -- Check the lagged complete to see if the last 2 rows are completed = 1
        , lag(Completed) over (partition by EmployeeId order by DateFrom asc) LaggedComplete
        -- Borrowed from Gordon to check which rows are prior to the last Completed = 1 and before the preceding Completed = 1
        , sum(case when completed = 1 then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
      from @Test
    )
    select
      EmployeeId
      -- Use the only DateFrom if there is only one
      , coalesce(case
        -- Scenario 1
        when SumOfCompleted = 0 then FirstDateFrom
        when LastCompleted = 1 then
          case
          -- Scenario 4
          when coalesce(LaggedComplete,0) = 1 then DateFrom
          -- Scenario 3
          else Scenario3
          end
        -- Scenario 2
        else ActualResult
        end, DateFrom) FinalResult
      --, * -- Uncomment for working
    from (
      select *
        -- Find the lowest DateFrom which is greater then the DateFrom of the last row where Completed = 1
        , min(case when DateFrom > LastCompletedNew then DateFrom else null end) over (partition by EmployeeId) ActualResult
        -- Find the min DateFrom over the rows between the last Completed=1 and the Completed=1 before it (if it exists)
        , min(case when completed_seqnum = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) Scenario3
      from cte
    ) x
    -- Because we have calculated the same result for every row we just take the first
    where RowNumber = 1
    order by x.EmployeeId asc, x.DateFrom asc;
    
    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else dateadd(day, 1, max(case when completed = 'true' then datefrom end))
            end)
    from (select t.*,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;
    
    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else max(case when completed = 'true' then next_datefrom end)
            end)
    from (select t.*,
                 lead(datefrom) over (partition by employeeid order by datefrom) as next_datefrom,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;