Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 有效地选择最新答案_Sql_Sql Server_Sql Server 2008_Tsql - Fatal编程技术网

Sql 有效地选择最新答案

Sql 有效地选择最新答案,sql,sql-server,sql-server-2008,tsql,Sql,Sql Server,Sql Server 2008,Tsql,SQL Fiddle: 我有一张表格,里面有“你会参加这次活动吗?”这个问题的答案。每个用户可能会响应多次,所有答案都存储在表中。通常情况下,我们只对最新的答案感兴趣,我正试图为此构造一个高效的查询。我正在使用SQLServer2008R2 一个事件的表内容: Column types: int, int, datetime, bit Primary key: (EventId, MemberId, Timestamp) 请注意,成员18首先回答“否”,然后回答“是”;成员20首先回答“是”

SQL Fiddle:

我有一张表格,里面有“你会参加这次活动吗?”这个问题的答案。每个用户可能会响应多次,所有答案都存储在表中。通常情况下,我们只对最新的答案感兴趣,我正试图为此构造一个高效的查询。我正在使用SQLServer2008R2

一个事件的表内容:

Column types: int, int, datetime, bit
Primary key: (EventId, MemberId, Timestamp)

请注意,成员18首先回答“否”,然后回答“是”;成员20首先回答“是”,然后回答“否”;成员11回答“否”,然后再次回答“否”。我想过滤掉这些成员的第一个答案。此外,可能有多个答案需要筛选-例如,用户可能会回答Yes、Yes、No、Yes、No、No

我尝试了一些不同的想法,并在SQLServerManagementStudio中对它们进行了评估,方法是输入所有查询,选择“显示估计的执行计划”,并比较每个查询的总成本(以百分比表示)。这是评估绩效的好方法吗

目前测试的不同查询:

-----------------------------------------------------------------
-- Subquery to select Answer (does not include Timestamp)
-- Cost: 63 %
-----------------------------------------------------------------
select distinct a.EventId, a.MemberId,
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  order by Timestamp desc
) as Answer
from    Attendees a
where a.EventId = 68

-----------------------------------------------------------------
-- Where with subquery to find max(Timestamp)
-- Cost: 13 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, a.Timestamp, a.Answer
from     Attendees a
where  a.EventId = 68
and    a.Timestamp =
(
  select max(Timestamp)
  from     Attendees
  where  EventId  = a.EventId
  and    MemberId = a.MemberId
)
order by a.TimeStamp;

-----------------------------------------------------------------
-- Group by to find max(Timestamp)
-- Subquery to select Answer matching max(Timestamp)
-- Cost: 23 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, max(a.Timestamp),
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  and   Timestamp = max(a.Timestamp)
) as Answer
from    Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);
最好避免为每个成员使用子查询。在上一个查询中,我尝试使用
groupby
,但仍然必须对答案列使用子查询。我真的想要这样的东西,但这不是有效的SQL,当然:

select a.EventId, a.MemberId, max(a.Timestamp), a.Answer <-- Picked from the line selected by max(a.Timestamp)
from  Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);

选择a.EventId、a.MemberId、max(a.Timestamp)、a.AnswerSQLServer2008支持公共表表达式和窗口函数

WITH recordsList
AS
(
    SELECT  EventID, MemberID, TimeStamp, Answer,
            ROW_NUMBER() OVER (PARTITION BY EventID, MemberID
                                ORDER BY Timestamp DESC) rn
    FROM    tableName
)
SELECT  EventID, MemberID, TimeStamp, Answer
FROM    recordsList
WHERE   rn = 1

我也更喜欢CTE方法,但这里有另一个使用子查询的选项,应该可以工作:

SELECT T.EventId, T.MemberId, T.TimeStamp, T.Answer
FROM TableName T
 JOIN (
   SELECT EventId, MemberId, Max(Timestamp) MaxTimeStamp
   FROM TableName
   GROUP BY EventId, MemberId ) T2 ON T.EventId = T2.EventId 
    AND T.MemberId = T2.MemberId 
    AND T.TimeStamp = T2.MaxTimeStamp
话虽如此,我认为CTE的性能会更好

编辑--不再确定性能--下面是两者的示例--您可以看到各自的执行计划

祝你好运。

还有一个选择

SELECT a.EventId, a.MemberId, a.Timestamp, a.Answer
FROM Attendees a
WHERE a.EventId = 68 AND EXISTS (
              SELECT 1
              FROM Attendees
              WHERE EventId = a.EventId             
              GROUP BY MemberId
              HAVING  MAX(Timestamp) = a.Timestamp                      
                      AND MemberId  = a.MemberId
              )

演示

为什么要存储所有答案的历史记录,而不仅仅是“当前”答案?主要是为了历史记录和可追踪性。很好,我现在已经将真实数据放到了SQL FIDLE上(问题编辑)。在SSMS中使用子查询运行my
Where、JW和您的查询将导致31%、38%和31%的成本,如果这是一个适当的性能度量。谢谢。根据SSMS,您的查询与SGEDES和my
的成本相同,其中带有子查询。不客气。(根据您的数据),我的所有成本结果都不同;)那很有趣。我的表实际上包含大约1600行,但我认为一次选择一个事件(其中EventId=68)将使查询对所有查询都同样容易。那么,也许不是。