Sql server SQL分组在多个列上的Distinct上运行计数_Sql Server_Query Performance

Sql server SQL分组在多个列上的Distinct上运行计数

sql-server

Sql server SQL分组在多个列上的Distinct上运行计数,sql-server,query-performance,Sql Server,Query Performance,可能是性能问题，也可能是错误的塔贝拉： StudentID | Date ----------|------ 1 | 20140101 1 | 20140102 1 | 20170103 2 | 20140101 2 | 20170103 3 | 20140101 3 | 20170103 3 | 20170104 表A的主键

可能是性能问题，也可能是错误的

塔贝拉：

StudentID | Date  
----------|------  
1         | 20140101  
1         | 20140102  
1         | 20170103  
2         | 20140101  
2         | 20170103  
3         | 20140101  
3         | 20170103  
3         | 20170104

表A的主键是：studentID，Date

TableB:  
StudentID|Date     |Class   | Warning | Instructor  
---------|---------|--------|---------|-----------  
1        |20140101 |History |Tardy    | Mr.H  
1        |20140101 |History |Homework | Mr.H  
1        |20140101 |Biology |Tardy    | Mr.B 
1        |20140102 |Biology |Homework | Mr.B   
1        |20140102 |History |Tardy    | Mr.H  
2        |20140101 |Math    |Test     | Mr.M 
2        |20140101 |Art     |Test     | Mr.A 
3        |20140101 |History |Tardy    | Mr.H  
3        |20170103 |History |Tardy    | Mr.H

希望这是足够的数据。目标：对于表A中的每个学生ID和日期，统计到表A中指定日期之前的不同班级、警告、讲师的数量

预期结果：

StudentID | Date    | Count  
----------|---------|--------  
1         |20140101 |3        
1         |20140102 |4       
1         |20170103 |4  
2         |20140101 |2  
2         |20170103 |2        
3         |20140101 |1         
3         |20170103 |1  
3         |20170104 |1

以下是我所拥有的：

select A.studentID, A.date, count(1)
from TableA A
cross apply (select distinct B.class,B.warning,B.instructor
             from TableB B
             where A.studentID = B.studentID
               and B.date <= A.date) Z
group by A.studentID, A.date
order by A.studentID, A.date

对于大型数据集，是否有更好的/替代方法来实现此结果？问题：我无法获得100万行大数据集的最终结果集。它一直在跑

谢谢

解决方案：按条款删除订单表B已经有日期、学生ID和我在表B中为studentId date添加了另一个索引。时差：以前：>15分钟现在：<30秒

我想看看。由于tableB是自给自足的，我将使用联接而不是交叉应用

select
    A.studentID,
    A.date,
    isnull(B.record_count,0)
from TableA A
left join (
    select
        studentID,
        date,
        count(*) as [record_count]
    from TableB
    group by
        studentID,
        date) B
    on A.studentID = b.studentID
    and A.date >= B.date
group by
    A.studentID,
    A.date
order by
    A.studentID,
    A.date

如果您仍然遇到性能问题，我会检查您的执行计划，查找表B上可能缺少的索引

为什么需要表A？并非所有日期都在表B中。换句话说，表A中有一些日期不在表B中。这可以在日期20170103的结果集中看到。TableA是一个几乎涵盖所有天的主表。这是怎么回事？按照你的例子？1 | 20140102 | 43来自20140101，1来自20140102。在20140101和20140102中都有一个，因此它只按不同的值计算。2代表1 | 20140102对吧。。。。以下记录1 | 20140102 |生物学|家庭作业| B先生------------------1 | 20140102 |历史|迟到| HW先生正在研究索引，因为它肯定会有所帮助。此外，删除ORDERBY子句也有很大帮助。谢谢测试后回来。谢谢你的帮助。这是一个指数问题。上面添加的解决方案。此外，此答案不考虑distinct，因此不会产生相同的结果。group by子句应返回distinct结果。