Azure sql database Azure数据仓库-未使用所有可用的DWU';s

Azure sql database Azure数据仓库-未使用所有可用的DWU';s,azure-sql-database,azure-sqldw,Azure Sql Database,Azure Sqldw,我的组织目前正在评估Azure数据仓库。我们有一个包含16M行的事实表,另一个包含5M行的事实表。它们都是散列分布在同一列上(相同的数据类型和长度) 当使用smallrc资源类以200 DWU的规模执行内部联接时,查询将花费近6分钟的时间,但DWU的使用(如门户所示)仅占所有可用DWU的一小部分。此外,当同一查询同时启动多次时(通过SSIS执行SQL任务),数据仓库始终执行6个实例(最多) 我们已将DWU设置从200更改为500到1000,最后更改为2000,但每次同一查询的最大并发运行次数仅限

我的组织目前正在评估Azure数据仓库。我们有一个包含16M行的事实表,另一个包含5M行的事实表。它们都是散列分布在同一列上(相同的数据类型和长度)

当使用smallrc资源类以200 DWU的规模执行内部联接时,查询将花费近6分钟的时间,但DWU的使用(如门户所示)仅占所有可用DWU的一小部分。此外,当同一查询同时启动多次时(通过SSIS执行SQL任务),数据仓库始终执行6个实例(最多)

我们已将DWU设置从200更改为500到1000,最后更改为2000,但每次同一查询的最大并发运行次数仅限于6次,并且DWU使用量始终占可用总数的一小部分,性能也没有明显变化

这是预期的行为吗?当然,大大增加DWU应该会大大减少查询执行时间吗?我理解这是非常一般的,但我试图理解,如果我没有考虑对这种分析很重要的东西

以下是解释:

explain
<?xml version="1.0" encoding="utf-8"?>
<dsql_query number_nodes="10" number_distributions="60" number_distributions_per_node="6">
  <sql>select shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type], count(distinct booking_removal.[Booking ID])  from shifts join booking_removal on shifts.[Shift reference] = booking_removal.[Shift Reference] join dimdate on shifts.[Shift Start Date] = date where year in  (2015, 2016)
group by shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type]</sql>
  <dsql_operations total_cost="1.22805816" total_number_operations="5">
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_15134</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>      
<sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_15134] ([Date] DATE, [YYYYMM] INT ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="BROADCAST_MOVE">
      <operation_cost cost="1.22805816" accumulative_cost="1.22805816" average_rowsize="7" output_rows="730.987" GroupNumber="15" />                                                                                                                                    
<source_statement>SELECT [T1_1].[Date] AS [Date],
       [T1_1].[YYYYMM] AS [YYYYMM]
FROM   (SELECT [T2_1].[Date] AS [Date],
               [T2_1].[YYYYMM] AS [YYYYMM]
        FROM   [NHSP-Shifts-DW].[dbo].[DimDate] AS T2_1
        WHERE  (([T2_1].[Year] = CAST ((2015) AS INT))
                OR ([T2_1].[Year] = CAST ((2016) AS INT)))) AS T1_1</source_statement>
      <destination_table>[TEMP_ID_15134]</destination_table>
    </dsql_operation>
    <dsql_operation operation_type="RETURN">
      <location distribution="AllDistributions" />
<select>SELECT [T1_1].[Shift Reference] AS [Shift Reference],
       [T1_1].[Trust Code] AS [Trust Code],
       [T1_1].[Ward Code] AS [Ward Code],
       [T1_1].[Location Code] AS [Location Code],
       [T1_1].[Qualification] AS [Qualification],
       [T1_1].[YYYYMM] AS [YYYYMM],
       [T1_1].[Booking Type] AS [Booking Type],
       [T1_1].[Booking Type] AS [Booking Type1],
       [T1_1].[col] AS [col]                                                                                                        
FROM   (SELECT   COUNT([T2_1].[Booking ID]) AS [col],
                 [T2_1].[Shift Reference] AS [Shift Reference],
                 [T2_1].[Trust Code] AS [Trust Code],
                 [T2_1].[Ward Code] AS [Ward Code],
                 [T2_1].[Location Code] AS [Location Code],
                 [T2_1].[Qualification] AS [Qualification],
                 [T2_1].[YYYYMM] AS [YYYYMM],
                 [T2_1].[Booking Type] AS [Booking Type]                                                                   
FROM     (SELECT   [T3_1].[Booking ID] AS [Booking ID],
                           [T3_2].[Shift Reference] AS [Shift Reference],
                           [T3_2].[Trust Code] AS [Trust Code],
                           [T3_2].[Ward Code] AS [Ward Code],
                           [T3_2].[Location Code] AS [Location Code],
                           [T3_2].[Qualification] AS [Qualification],
                           [T3_2].[YYYYMM] AS [YYYYMM],
                           [T3_1].[Booking Type] AS [Booking Type]
FROM     [NHSP-Shifts-DW].[dbo].[Booking_Removal] AS T3_1
                           INNER JOIN
                           (SELECT [T4_2].[Shift Reference] AS [Shift Reference],
                                   [T4_2].[Trust Code] AS [Trust Code],
                                   [T4_2].[Ward Code] AS [Ward Code],
                                   [T4_2].[Location Code] AS [Location Code],
                                   [T4_2].[Qualification] AS [Qualification],
                                   [T4_1].[YYYYMM] AS [YYYYMM]
FROM   [tempdb].[dbo].[TEMP_ID_15134] AS T4_1
                                   INNER JOIN
                                   [NHSP-Shifts-DW].[dbo].[Shifts] AS T4_2
                                   ON ([T4_1].[Date] = [T4_2].[Shift Start Date])) AS T3_2 ON ([T3_1].[Shift Reference] = [T3_2].[Shift Reference])
                  GROUP BY [T3_2].[Shift Reference], [T3_2].[Trust Code], [T3_2].[Ward Code], [T3_2].[Location Code], [T3_2].[Qualification], [T3_2].[YYYYMM], [T3_1].[Booking Type], [T3_1].[Booking ID]) AS T2_1
GROUP BY [T2_1].[Shift Reference], [T2_1].[Trust Code], [T2_1].[Ward Code], [T2_1].[Location Code], [T2_1].[Qualification], [T2_1].[YYYYMM], [T2_1].[Booking Type]) AS T1_1</select>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_15134]</sql_operation>
      </sql_operations>
    </dsql_operation>
  </dsql_operations>
</dsql_query>

您在每个发行版上的努力基本相同,这很好。使用额外的节点可以提高处理能力,但处理单个分发的工作不会受到太大影响。如果您是CPU受限的,那么当然。如果你是IO绑定的,那么当然。但是由于您正在进行分组等原因,您的查询需要很长时间。您可以通过各种方式改进查询,但是增加DWU不会有多大帮助,除非为其他查询提供更多的处理能力。如果您想调整查询,使其总体上运行得更快,那么这是另一种问题。

您的查询返回多少行?如果它返回很多行,那么客户端上的返回操作可能是瓶颈,因此扩展DWU可能没有帮助。你提到货币总是6。这可能是SSIS设置吗?在SQL DW中,并发性取决于分配的DWU数量。本文对此进行了详细解释。要查看一次实际运行的查询数量,可以运行查询

SELECT * FROM sys.dm_pdw_exec_requests WHERE status = 'Running';

使用前面的
EXPLAIN
运行查询。它会给你一些XML。谢谢你,我可以告诉你它是1000 DWU。看起来数据移动显然不是你的问题。根据计划,你的数据是在你的加入条件下分发的
[Shift Reference]
,这很好。你没有得到不必要的运动。查询可以进行更好的调优,但首先,您是否有扭曲?你所描述的情况很可能说明了这一点。你能运行DBCC PDW_SHOWSPACEUSED并将结果放入问题中吗?嗨,Rob,谢谢你迄今为止的帮助。这就是我一直试图理解的,如果不是IO,也不是处理能力,那么限制因素是什么?我仍然不明白为什么在任何时候都不使用DWU进行处理?我知道这不是最优化的查询,但我想最大限度地利用资源,但即使是优化程度较低的查询,这种情况似乎也不会发生。如果您在常规SQL框上运行查询,并查看计划,您将了解需要进行的六十次(并行)操作。但是,仅仅因为你有十个节点而不是一个节点在做这项工作并不意味着它会发生得更快。如果你做的是常规计数而不是独立计数,你的查询返回的速度会更快吗?是的,常规计数非常快,因为独立计数不能很好地扩展。嗨,Sonya,你可能是对的。6并发限制是一个SSIS设置(maxconcurrentthread),因为我有4个核心,它最多自动创建6个线程。
SELECT * FROM sys.dm_pdw_exec_requests WHERE status = 'Running';