Sql 连接条件和聚合函数

Sql 连接条件和聚合函数,sql,sql-server,tsql,join,Sql,Sql Server,Tsql,Join,我有一张表格,里面有进出大门的记录 DECLARE @doorStatistics TABLE ( id INT IDENTITY, [user] VARCHAR(250), accessDate DATETIME, accessType VARCHAR(5) ) 样本记录: INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:02:43.000','IN')

我有一张表格,里面有进出大门的记录

DECLARE @doorStatistics TABLE
( id INT IDENTITY,
[user] VARCHAR(250),
accessDate DATETIME,
accessType VARCHAR(5)
)
样本记录:

INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:02:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:12:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:22:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:32:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:37:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:42:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:48:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:52:43.000','OUT')
我想做的是一个查询,它会给出以下结果(基于上述示例):

我提出的问题如下:

SELECT [user], accessDate AS [in date], 
    (SELECT MIN(accessDate) 
        FROM @doorStatistics ds2 
        WHERE accessType = 'OUT' 
            AND ds2.accessDate > ds.accessDate 
            AND ds.[user] = ds2.[user]) AS [out date] 
FROM @doorStatistics ds 
WHERE accessType = 'IN'
但这并不好,因为当用户忘记注册他/她的入口时,会产生如下结果:

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | 07:02:43 | 09:26:43 |
虽然它应该是

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | NULL     | 09:26:43 |
查询不好的第二个原因是性能。我有超过200000条记录,每行选择一条会减慢查询速度

可能的解决方案是连接两个表

SELECT * FROM @doorStatistics WHERE accessType = 'IN'

但我不知道要得到正确的日期需要什么条件。也许一些最大或最小函数可以放在那里,但我不知道


我不想创建临时表并使用游标。

为具有持续时间的临时事件设计数据库时,最好将“输入”时间和“输出”时间放在同一行上

通过这种方式,您需要执行的所有查询都要简单得多


参见第48页和第154页的“”中他谈到的时间内聚性。

在结构层面上提高性能:

  • 我建议您将
    accessDate
    列重命名为
    accessDateTime
  • 然后根据
    accessDateTime
    (如下所示)创建一个持久计算列。然后,您需要的索引将只包括
    accessDate
    列,您将使用该列与
    user
  • 确保表上有适当的索引(根据下面的代码,您可能需要一个关于“user”、“accessDate”和包括“accessType”的索引)
accessDate
列定义:

accessDate AS CONVERT(SMALLDATETIME, CONVERT(CHAR(8), accessDateTime, 112), 112) PERSISTED
现在,考虑到您已经完成了,并且您有SQL-2005+,这个非常长的查询应该可以完成这项工作了

WITH MatchIN (in_id, out_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                            AND z.accessDateTime <= x.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    WHERE       s.accessType = 'IN'
)
,    MatchOUT (out_id, in_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                            AND z.accessDateTime >= x.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    WHERE       s.accessType = 'OUT'
)

SELECT  COALESCE(i."user", o."user") AS "user",
        COALESCE(i.accessDate, o.accessDate) AS "date",
        CONVERT(CHAR(10), i.accessDateTime, 108) AS "inHour",
        CONVERT(CHAR(10), o.accessDateTime, 108) AS "outHour"
FROM   (SELECT in_id, out_id FROM MatchIN
        UNION -- this will eliminate duplicates as the same time
        SELECT in_id, out_id FROM MatchOUT
        ) x
LEFT JOIN   @doorStatistics i
        ON  i.id = x.in_id
LEFT JOIN   @doorStatistics o
        ON  o.id = x.out_id
ORDER BY    "user", "date", "inHour"
带有匹配项(输入id、输出id)
AS(选择s.id,合并时的情况(y.id,s.id)=s.id,然后选择x.id ELSE NULL END
来自@doors
左连接@doorx
在x.id=(选择顶部1 z.id
来自@doorz
其中z“user”=s“user”
和z.accessType='OUT'
z.accessDate=s.accessDate
和z.accessDateTime>=s.accessDateTime
按z.accessDateTime ASC订购
)
左连接@doory
在y.id=(选择顶部1 z.id
来自@doorz
其中z“user”=s“user”
和z.accessType='IN'
z.accessDate=s.accessDate
和z.accessDateTime>=s.accessDateTime

和z.accessDateTime在确保没有干预记录(这将对应于有人两次进入而从未离开大楼)之后,您需要为给定用户的每个IN记录选择最小的OUT记录。这需要一些稍微复杂的SQL(例如NOT EXISTS子句)。因此,您将在表上有一个自联接,并在同一个表上有一个NOT EXISTS子查询。只需确保合理地为该表的所有引用别名。

如果Bruce Willis过夜,您希望得到什么结果?(假设Demi Moore也在场)如果Bruce过夜,outHour中的值应该为空。问题是我没有设计这个表,我无法更改它的结构。但是,在这种情况下,这不是一个好的解决方案,因为表中的每一行都包含来自一个手指阅读器的数据(对不起,对于我的英语,但我想你知道我的意思)。有些门有两个(内部和外部)读者,有些人只有一个(外部)。但无论如何,感谢您提供了这本书的链接。更改:修复了查询,删除了仅限时间的计算列,测试有效
accessDate AS CONVERT(SMALLDATETIME, CONVERT(CHAR(8), accessDateTime, 112), 112) PERSISTED
WITH MatchIN (in_id, out_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                            AND z.accessDateTime <= x.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    WHERE       s.accessType = 'IN'
)
,    MatchOUT (out_id, in_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                            AND z.accessDateTime >= x.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    WHERE       s.accessType = 'OUT'
)

SELECT  COALESCE(i."user", o."user") AS "user",
        COALESCE(i.accessDate, o.accessDate) AS "date",
        CONVERT(CHAR(10), i.accessDateTime, 108) AS "inHour",
        CONVERT(CHAR(10), o.accessDateTime, 108) AS "outHour"
FROM   (SELECT in_id, out_id FROM MatchIN
        UNION -- this will eliminate duplicates as the same time
        SELECT in_id, out_id FROM MatchOUT
        ) x
LEFT JOIN   @doorStatistics i
        ON  i.id = x.in_id
LEFT JOIN   @doorStatistics o
        ON  o.id = x.out_id
ORDER BY    "user", "date", "inHour"