Sql 通过一个条件对从一行到另一行的子集求和

Sql 通过一个条件对从一行到另一行的子集求和,sql,sql-server,sql-server-2012,Sql,Sql Server,Sql Server 2012,我对SQL的语法不是很精通,正在努力理解如何聚合一组简单的数据 问题: SELECT Project, username, Workstation, min(case when [Status] = 'Open' then [TimeStamp] end) AS [Started], max(case when [Status] = 'Closed' then [TimeStamp] end) as [Ended], DATEDIFF(second, min(case when [Statu

我对SQL的语法不是很精通,正在努力理解如何聚合一组简单的数据


问题:

SELECT Project, username, Workstation, 
min(case when [Status] = 'Open' then [TimeStamp] end) AS [Started],
max(case when [Status] = 'Closed' then [TimeStamp] end) as [Ended],
DATEDIFF(second, min(case when [Status] = 'Open' then [TimeStamp] end), max(case when [Status] = 'Closed' then [TimeStamp] end)) AS ActualSeconds
FROM History
GROUP BY Project, username, Workstation
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
|       KEY       | WORKSTATION |  USERNAME  |        START TIME       |        END TIME         | SECONDS |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
| 181861-0001-001 |             |            | NULL                    | 2015-07-01 18:19:48.527 | NULL    |
| 181861-0001-001 | 1AHVW       | ANDJOH0427 | 2015-07-01 13:18:46.547 | 2015-07-01 14:11:41.920 | 3175    |
| 181861-0001-001 | 1ALVW       | DWYGRE0609 | NULL                    | 2015-07-01 18:29:39.127 | NULL    |
| 181861-0001-001 | 1AHVW       | HORDOU0521 | NULL                    | 2015-07-01 19:27:34.667 | NULL    |
| 181861-0001-001 | 1AQCI       | POUJON702  | 2015-07-02 00:46:37.540 | NULL                    | NULL    |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 14:51:02.937 | 2015-07-01 17:53:28.217 | 10945   |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 18:34:11.043 | 2015-07-01 19:20:11.540 | 2760    |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
根据
'Open'
'Closed'
时间戳,我必须使用下面的登录历史记录表总结用户在每个工作站花费的时间


要求:

SELECT Project, username, Workstation, 
min(case when [Status] = 'Open' then [TimeStamp] end) AS [Started],
max(case when [Status] = 'Closed' then [TimeStamp] end) as [Ended],
DATEDIFF(second, min(case when [Status] = 'Open' then [TimeStamp] end), max(case when [Status] = 'Closed' then [TimeStamp] end)) AS ActualSeconds
FROM History
GROUP BY Project, username, Workstation
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
|       KEY       | WORKSTATION |  USERNAME  |        START TIME       |        END TIME         | SECONDS |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
| 181861-0001-001 |             |            | NULL                    | 2015-07-01 18:19:48.527 | NULL    |
| 181861-0001-001 | 1AHVW       | ANDJOH0427 | 2015-07-01 13:18:46.547 | 2015-07-01 14:11:41.920 | 3175    |
| 181861-0001-001 | 1ALVW       | DWYGRE0609 | NULL                    | 2015-07-01 18:29:39.127 | NULL    |
| 181861-0001-001 | 1AHVW       | HORDOU0521 | NULL                    | 2015-07-01 19:27:34.667 | NULL    |
| 181861-0001-001 | 1AQCI       | POUJON702  | 2015-07-02 00:46:37.540 | NULL                    | NULL    |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 14:51:02.937 | 2015-07-01 17:53:28.217 | 10945   |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 18:34:11.043 | 2015-07-01 19:20:11.540 | 2760    |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
  • 工作站或用户名列为空的行应标记为无效
  • 对于按时间戳排序的每一行,如果不是以
    'Open'
    开头或以
    'Closed'
    结尾,则其状态也应标记为无效
  • 在一个
    'Closed'
    之前的多个
    'Open'
    状态仍然可以被认为是有效的,但是总和应该从第一次出现
    'Open'

我的(几乎)解决方案:

SELECT Project, username, Workstation, 
min(case when [Status] = 'Open' then [TimeStamp] end) AS [Started],
max(case when [Status] = 'Closed' then [TimeStamp] end) as [Ended],
DATEDIFF(second, min(case when [Status] = 'Open' then [TimeStamp] end), max(case when [Status] = 'Closed' then [TimeStamp] end)) AS ActualSeconds
FROM History
GROUP BY Project, username, Workstation
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
|       KEY       | WORKSTATION |  USERNAME  |        START TIME       |        END TIME         | SECONDS |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
| 181861-0001-001 |             |            | NULL                    | 2015-07-01 18:19:48.527 | NULL    |
| 181861-0001-001 | 1AHVW       | ANDJOH0427 | 2015-07-01 13:18:46.547 | 2015-07-01 14:11:41.920 | 3175    |
| 181861-0001-001 | 1ALVW       | DWYGRE0609 | NULL                    | 2015-07-01 18:29:39.127 | NULL    |
| 181861-0001-001 | 1AHVW       | HORDOU0521 | NULL                    | 2015-07-01 19:27:34.667 | NULL    |
| 181861-0001-001 | 1AQCI       | POUJON702  | 2015-07-02 00:46:37.540 | NULL                    | NULL    |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 14:51:02.937 | 2015-07-01 17:53:28.217 | 10945   |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 18:34:11.043 | 2015-07-01 19:20:11.540 | 2760    |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
不幸的是,此查询没有考虑用户登录、注销然后重新登录到同一工作站的情况

因此,我需要在
'Open'
'Closed'
状态之间找到每组的
MIN

预期结果:

SELECT Project, username, Workstation, 
min(case when [Status] = 'Open' then [TimeStamp] end) AS [Started],
max(case when [Status] = 'Closed' then [TimeStamp] end) as [Ended],
DATEDIFF(second, min(case when [Status] = 'Open' then [TimeStamp] end), max(case when [Status] = 'Closed' then [TimeStamp] end)) AS ActualSeconds
FROM History
GROUP BY Project, username, Workstation
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
|       KEY       | WORKSTATION |  USERNAME  |        START TIME       |        END TIME         | SECONDS |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+
| 181861-0001-001 |             |            | NULL                    | 2015-07-01 18:19:48.527 | NULL    |
| 181861-0001-001 | 1AHVW       | ANDJOH0427 | 2015-07-01 13:18:46.547 | 2015-07-01 14:11:41.920 | 3175    |
| 181861-0001-001 | 1ALVW       | DWYGRE0609 | NULL                    | 2015-07-01 18:29:39.127 | NULL    |
| 181861-0001-001 | 1AHVW       | HORDOU0521 | NULL                    | 2015-07-01 19:27:34.667 | NULL    |
| 181861-0001-001 | 1AQCI       | POUJON702  | 2015-07-02 00:46:37.540 | NULL                    | NULL    |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 14:51:02.937 | 2015-07-01 17:53:28.217 | 10945   |
| 181861-0001-001 | 1ALVW       | PRIADA747  | 2015-07-01 18:34:11.043 | 2015-07-01 19:20:11.540 | 2760    |
+-----------------+-------------+------------+-------------------------+-------------------------+---------+

这实际上是猜测工作,因为示例数据和OP提供的预期输出似乎不相关。这为用户名的
NULL
和joh0427'
'HORDOU0521'
'DWYGRE0609'
提供了正确的结果,但是,返回了
'PRIADA747'
的结果(该结果不在预期结果集中),并为
'POUJON702'
给出了非常不同的答案:

USE Sandbox;
GO

CREATE TABLE #Sample ([KEY] varchar(15),
                      WORKSTATION varchar(5),
                      [STATUS] varchar(6),
                      USERNAME varchar(10),
                      [TIMESTAMP] datetime);

INSERT INTO #Sample
VALUES ('181861-0001-001',NULL,'Closed',NULL,'2015-07-01T18:19:48.527'),
       ('181861-0001-001',NULL,'Closed',NULL,'2015-07-01T20:20:46.383'),
       ('181861-0001-001','1AHVW','Open','ANDJOH0427','2015-07-01T13:18:46.547'),
       ('181861-0001-001','1AHVW','Closed','ANDJOH0427','2015-07-01T14:11:41.920'),
       ('181861-0001-001','1ALVW','Closed','DWYGRE0609','2015-07-01T18:29:39.127'),
       ('181861-0001-001','1ALVW','Closed','DWYGRE0609','2015-07-01T18:29:40.300'),
       ('181861-0001-001','1AHVW','Closed','HORDOU0521','2015-07-01T19:27:34.667'),
       ('181861-0001-001','1AHVW','Closed','HORDOU0521','2015-07-01T19:44:36.167'),
       ('181861-0001-001','1AQCI','Open','POUJON702','2015-07-02T00:46:37.540'),
       ('181861-0001-001','1ALVW','Open','PRIADA747','2015-07-01T14:51:02.937'),
       ('181861-0001-001','1ALVW','Open','PRIADA747','2015-07-01T15:29:48.357'),
       ('181861-0001-001','1ALVW','Open','PRIADA747','2015-07-01T16:13:20.953'),
       ('181861-0001-001','1ALVW','Open','PRIADA747','2015-07-01T17:49:42.717'),
       ('181861-0001-001','1ALVW','Closed','PRIADA747','2015-07-01T17:53:28.217'),
       ('181861-0001-001','1ALVW','Open','PRIADA747','2015-07-01T18:34:11.043'),
       ('181861-0001-001','1ALVW','Closed','PRIADA747','2015-07-01T19:20:11.540');
GO
SELECT *
FROM #Sample;
GO

WITH Starts AS(
    SELECT [KEY],
           WORKSTATION,
           USERNAME,
           [TIMESTAMP],
           NULLIF(MIN(ISNULL(CASE STATUS WHEN 'Open' THEN [TIMESTAMP] END,'20550101')) OVER (PARTITION BY [KEY], WORKSTATION, USERNAME),'20550101') AS StartTime
    FROM #Sample S)
SELECT [KEY],
       WORKSTATION,
       USERNAME,
       StartTime,
       MAX([TIMESTAMP]) AS EndTime,
       DATEDIFF(SECOND, StartTime, MAX([TIMESTAMP])) AS Seconds
FROM Starts
GROUP BY [KEY],
         WORKSTATION,
         USERNAME,
         StartTime;
GO      
DROP TABLE #Sample;
这应该很接近:

with data as (
    select *,
        row_number() over (partition by workstation order by timestamp) as tn,
        row_number() over (partition by workstation order by username, timestamp) as un,
        sum(case when status = 'Closed' then 1 end) over (
            partition by workstation, username order by timestamp desc) as sn
    from t
)
select workstation, username,
    min(case when status = 'Open' then timestamp end) as start_time,
    max(case when status = 'Closed' then timestamp end) as end_time,
    datediff(second,
        min(case when status = 'Open' then timestamp end),
        max(case when status = 'Closed' then timestamp end)) as diff,
    datediff(millisecond,
        min(case when status = 'Open' then timestamp end), 
        max(case when status = 'Closed' then timestamp end)) / 1000 as diff2,
case when count(*) > 1 then 'Valid' else 'Invalid' end as flag
from data
group by workstation, username, tn - un, sn;


我注意到你预期输出的时差不太匹配。问题是因为
datediff()
计算时间边界的方式,而不是测量整个单位(在本例中为秒)。我添加了第二种方法来计算秒数,它确实产生了您所期望的结果。

(1)用您使用的数据库标记您的问题。(2) 显示所需的结果集。@GordonLinoff谢谢!我是新来这里的:)我不明白你的预期输出。为什么
'POUJON702'
有两个条目,为什么其中一个条目有
'2015-07-01 19:20:11.540'
[结束时间]
?他们只有一个条目,位于
'2015-07-02 00:46:37.540'
。其他值是如何生成的?它们似乎与我们拥有的样本数据无关。另外,
'PRIADA747'
发生了什么事?@Larnu嘿,对不起!我在结束表中犯了一个错误。感谢您的帮助,但不幸的是,这与我最初的查询非常相似。这不包括PRIADA747的第二次启动时间。我知道在我更新预期结果表之前,您已经发布了此信息。@johnmarkill是的,为什么确保您的样本和预期结果相互关联很重要。:)哇,你离得太近了!PRIADA747应该只返回三个条目,而不是四个条目。我找到了一个不同的解决方案,稍后将发布。