Sql server SQL Server-运行嵌套查询需要40分钟
自2001年以来,我有一个非常大的web论坛应用程序,从SQLServer2012数据库运行,大约有2000万篇文章。数据文件的大小约为40GB 我在表中添加了相应字段的索引,但是此查询显示每个论坛中帖子的日期范围,运行大约需要40分钟:Sql server SQL Server-运行嵌套查询需要40分钟,sql-server,tsql,select,Sql Server,Tsql,Select,自2001年以来,我有一个非常大的web论坛应用程序,从SQLServer2012数据库运行,大约有2000万篇文章。数据文件的大小约为40GB 我在表中添加了相应字段的索引,但是此查询显示每个论坛中帖子的日期范围,运行大约需要40分钟: SELECT T2.ForumId, Forums.Title, T2.ForumThreads, T2.ForumPosts, T2.ForumStart, T2.ForumStop FROM For
SELECT
T2.ForumId,
Forums.Title,
T2.ForumThreads,
T2.ForumPosts,
T2.ForumStart,
T2.ForumStop
FROM
Forums
INNER JOIN (
SELECT
Min(ThreadStart) As ForumStart,
Max(ThreadStop) As ForumStop,
Count(*) As ForumThreads,
Sum(ThreadPosts) As ForumPosts,
Threads.ForumId
FROM
Threads
INNER JOIN (
SELECT
Min(Posts.DateTime) As ThreadStart,
Max(Posts.DateTime) As ThreadStop,
Count(*) As ThreadPosts,
Posts.ThreadId
FROM
Posts
GROUP BY
Posts.ThreadId
) As P2 ON Threads.ThreadId = P2.ThreadId
GROUP BY
Threads.ForumId
) AS T2 ON T2.ForumId = Forums.ForumId
我怎样才能加快速度呢
更新:
这是从右到左的估计执行计划:
[Path 1]
Clustered Index Scan (Clustered) [Posts].[PK_Posts], Cost: 98%
Hash Match (Partial Aggregate), Cost: 2%
Parallelism (Repartition Streams), Cost: 0%
Hash Match (Aggregate), Cost 0%
Compute Scalar, Cost: 0%
Bitmap (Bitmap Create), Cost: 0%
[Path 2]
Index Scan (NonClustered) [Threads].[IX_ForumId], Cost: 0%
Parallelism (Repartition Streams), Cost: 0%
[Path 1 and 2 converge into Path 3]
Hash Match (Inner Join), Cost: 0%
Hash Match (Partial Agregate), Cost: 0%
Parallelism (Repartition Streams), Cost: 0%
Sort, Cost: 0%
Stream Aggregate (Aggregate), Cost: 0%
Compute Scalar, Cost: 0%
[Path 4]
Clustered Index Seek (Clustered) [Forums].[PK_Forums], Cost: 0%
[Path 3 and 4 converge into Path 5]
Nested Loops (Inner Join), Cost: 0%
Paralleism (Gather Streams), Cost: 0%
SELECT, Cost: 0%
您是否尝试过将这两个派生表放入临时表中?SQL Server将从它们的单个列中获取统计信息,您也可以在它们上放置索引
此外,乍一看,索引视图在这里可能会有所帮助,因为您有很多聚合。您是否尝试过将这两个派生表放入临时表中?SQL Server将从它们的单个列中获取统计信息,您也可以在它们上放置索引
此外,乍一看索引视图在这里可能会有所帮助,因为您有很多聚合。您真的需要聚合两次吗?这个查询会给你同样的结果吗
SELECT
T2.ForumId,
Forums.Title,
T2.ForumThreads,
T2.ForumPosts,
T2.ForumStart,
T2.ForumStop
FROM
Forums
INNER JOIN (
SELECT
Min(ThreadStart) As ForumStart,
Max(ThreadStop) As ForumStop,
Count(*) As ForumThreads,
Sum(ThreadPosts) As ForumPosts,
Threads.ForumId
FROM
Threads
INNER JOIN (
SELECT
Posts.DateTime As ThreadStart,
Posts.DateTime As ThreadStop,
Count(*) As ThreadPosts,
Posts.ThreadId
FROM
Posts
) As P2 ON Threads.ThreadId = P2.ThreadId
GROUP BY
Threads.ForumId
) AS T2 ON T2.ForumId = Forums.ForumId
你真的需要聚合两次吗?这个查询会给你同样的结果吗
SELECT
T2.ForumId,
Forums.Title,
T2.ForumThreads,
T2.ForumPosts,
T2.ForumStart,
T2.ForumStop
FROM
Forums
INNER JOIN (
SELECT
Min(ThreadStart) As ForumStart,
Max(ThreadStop) As ForumStop,
Count(*) As ForumThreads,
Sum(ThreadPosts) As ForumPosts,
Threads.ForumId
FROM
Threads
INNER JOIN (
SELECT
Posts.DateTime As ThreadStart,
Posts.DateTime As ThreadStop,
Count(*) As ThreadPosts,
Posts.ThreadId
FROM
Posts
) As P2 ON Threads.ThreadId = P2.ThreadId
GROUP BY
Threads.ForumId
) AS T2 ON T2.ForumId = Forums.ForumId
像这样的怎么样?不管怎样,你明白了
SELECT f.ForumID,
f.Title,
MIN(p.[DateTime]) as ForumStart,
MAX(p.[DateTime]) as ForumStop,
COUNT(DISTINCT f.ForumID) as ForumPosts,
COUNT(DISTINCT t.ThreadID) as ForumThreads
FROM Forums f
INNER JOIN Threads t
ON f.ForumID = t.ForumID
INNER JOIN Posts p
ON p.ThreadID = p.ThreadID
GROUP BY f.ForumID, f.Title
像这样的怎么样?不管怎样,你明白了
SELECT f.ForumID,
f.Title,
MIN(p.[DateTime]) as ForumStart,
MAX(p.[DateTime]) as ForumStop,
COUNT(DISTINCT f.ForumID) as ForumPosts,
COUNT(DISTINCT t.ThreadID) as ForumThreads
FROM Forums f
INNER JOIN Threads t
ON f.ForumID = t.ForumID
INNER JOIN Posts p
ON p.ThreadID = p.ThreadID
GROUP BY f.ForumID, f.Title
当您确实从中选择时,索引可能会起作用,但子查询的结果不会被索引。加入它们可能会扼杀性能 正如buckley所建议的,我会尝试将中间结果存储在临时表中,并在执行最终查询之前添加索引
但是外部选择不包括特定于线程的信息。看起来查询只是按论坛选择最小/最大日期。如果是这样的话,您只需按论坛分组获得min/max/count帖子。当您选择FROM时,索引可能会起作用,但子查询的结果不会被索引。加入它们可能会扼杀性能 正如buckley所建议的,我会尝试将中间结果存储在临时表中,并在执行最终查询之前添加索引
但是外部选择不包括特定于线程的信息。看起来查询只是按论坛选择最小/最大日期。如果是这样的话,你只需要得到按论坛分组的min/max/count帖子。如果你通过向posts表中添加ForumId来取消规范化,你就可以直接从posts表中查询所有的统计信息。有了正确的索引,这可能会表现得很好。当然,在插入Posts表时,这需要对代码进行一点小的更改,以包括ForumId…如果通过将ForumId添加到Posts表来取消规范化,则可以直接从Posts表中查询所有统计信息。有了正确的索引,这可能会表现得很好。当然,在插入Posts表时,这需要对代码进行一点小的修改,以包含ForumId…我向数据库添加了更多索引,这大大加快了速度。执行时间现在大约是20秒!!。我承认,添加的许多索引都是猜测或随机添加的。我向数据库添加了更多索引,这大大加快了速度。执行时间现在大约是20秒!!。我承认,添加的许多索引都是猜测或随机添加的。查询的执行计划是什么样子的?40Gig?不足为奇。。添加索引!通过添加、更改索引,使这些扫描成为搜索,它会变得更好。您可能希望将表拆分为分区。查询的执行计划是什么样子的?40Gig?不足为奇。。添加索引!通过添加、更改索引,使这些扫描成为搜索,它会变得更好。您可能希望将表拆分为分区。+1,当我看到您的几乎相同的解决方案时,我已经发布并删除了。但是,f.forumid不应该是*?这是每个ForumID的帖子数量+1,当我看到你的几乎相同的解决方案时,我已经发布并删除了。但是,f.forumid不应该是*?它是每个ForumID的帖子数。