Sql Teradata性能问题和示例

Sql Teradata性能问题和示例,sql,teradata,Sql,Teradata,我在Teradata QA环境中遇到了一个问题,一个运行不到1分钟的简单查询现在需要12分钟才能完成。此选择基于简单的内部联接提取5个字段 select a.material , b.season , b.theme , b.collection from SalesOrders_view.Allocation_Deliveries_cur a inner join SalesOrders_view.Material_Attributes_cur b on a.m

我在Teradata QA环境中遇到了一个问题,一个运行不到1分钟的简单查询现在需要12分钟才能完成。此选择基于简单的内部联接提取5个字段

select a.material
    , b.season
    , b.theme
    , b.collection
from SalesOrders_view.Allocation_Deliveries_cur a
inner join SalesOrders_view.Material_Attributes_cur b
    on a.material = b.material;
我可以在我们的Prod环境中运行相同的查询,它在不到一分钟的时间内返回,同时运行的记录比QA多约20万条

SalesOrders.Allocation\u Deliveries中的总容量低于110万条记录,SalesOrders.Material\u属性中的总容量低于129K条记录。这些是小数据集

我比较了两种环境下的Explain计划,在第一个连接步骤中,估计的线轴体积存在明显差异。生产中的估计值是在金钱上的,而QA中的估计值是在数量级上的。然而,两个系统中的数据和表/视图是相同的,我们以各种可能的方式收集统计数据,我们可以看到两个系统中的特定表人口统计数据是相同的

最后,在所有环境(包括QA)中,此查询总是在一分钟内返回,因为它仍在生产中。这种潜在的行为是最近一周左右发生的。我与我们的DBA讨论了这一点,我们没有对软件或配置进行任何更改。他是新来的,但似乎知道自己在做什么,但仍在适应新的环境

我正在寻找一些关于下一步检查内容的建议。我比较了QA和Prod中的相关表/视图定义,它们是相同的。每个系统中的表统计数据也是相同的(我与我们的DBA一起查看了这些数据以确保)

感谢您的帮助。提前谢谢。 拍

这是QA的解释计划。请注意步骤5中的非常低的估计值(144行)。在Prod中,同样的解释显示>1m行,这将接近我所知道的

Explain select a.material
    , b.season
    , b.theme
    , b.collection
from SalesOrders_view.Allocation_Deliveries a
inner join SalesOrders_view.Material_Attributes_cur b
    on a.material = b.material;

  1) First, we lock SalesOrders.Allocation_Deliveries in view
     SalesOrders_view.Allocation_Deliveries for access, and we lock
     SalesOrders.Material_Attributes in view SalesOrders_view.Material_Attributes_cur for
     access. 
  2) Next, we do an all-AMPs SUM step to aggregate from
     SalesOrders.Material_Attributes in view SalesOrders_view.Material_Attributes_cur by way
     of an all-rows scan with no residual conditions
     , grouping by field1 ( SalesOrders.Material_Attributes.material
     ,SalesOrders.Material_Attributes.season ,SalesOrders.Material_Attributes.theme
     ,SalesOrders.Material_Attributes.theme ,SalesOrders.Material_Attributes.af_grdval
     ,SalesOrders.Material_Attributes.af_stcat
     ,SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM).  Aggregate
     Intermediate Results are computed locally, then placed in Spool 4. 
     The size of Spool 4 is estimated with high confidence to be
     129,144 rows (41,713,512 bytes).  The estimated time for this step
     is 0.06 seconds. 
  3) We execute the following steps in parallel. 
       1) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
          way of an all-rows scan into Spool 2 (all_amps), which is
          redistributed by the hash code of (
          SalesOrders.Material_Attributes.Field_9,
          SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
          SalesOrders.Material_Attributes.Field_7, SalesOrders.Material_Attributes.Field_6,
          SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
          SalesOrders.Material_Attributes.season, SalesOrders.Material_Attributes.material)
          to all AMPs.  Then we do a SORT to order Spool 2 by row hash
          and the sort key in spool field1 eliminating duplicate rows. 
          The size of Spool 2 is estimated with low confidence to be
          129,144 rows (23,504,208 bytes).  The estimated time for this
          step is 0.11 seconds. 
       2) We do an all-AMPs RETRIEVE step from SalesOrders.Material_Attributes in
          view SalesOrders_view.Material_Attributes_cur by way of an all-rows scan
          with no residual conditions locking for access into Spool 6
          (all_amps), which is redistributed by the hash code of (
          SalesOrders.Material_Attributes.material, SalesOrders.Material_Attributes.season,
          SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
          SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
          SalesOrders.Material_Attributes.Material_Attributes_UPD_TS, (CASE WHEN (NOT
          (SalesOrders.Material_Attributes.af_stcat IS NULL )) THEN
          (SalesOrders.Material_Attributes.af_stcat) ELSE ('') END )(VARCHAR(16),
          CHARACTER SET UNICODE, NOT CASESPECIFIC), (CASE WHEN (NOT
          (SalesOrders.Material_Attributes.af_grdval IS NULL )) THEN
          (SalesOrders.Material_Attributes.af_grdval) ELSE ('') END )(VARCHAR(8),
          CHARACTER SET UNICODE, NOT CASESPECIFIC)) to all AMPs.  Then
          we do a SORT to order Spool 6 by row hash.  The size of Spool
          6 is estimated with high confidence to be 129,144 rows (
          13,430,976 bytes).  The estimated time for this step is 0.08
          seconds. 
  4) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
     an all-rows scan into Spool 7 (all_amps), which is built locally
     on the AMPs.  Then we do a SORT to order Spool 7 by the hash code
     of (SalesOrders.Material_Attributes.material, SalesOrders.Material_Attributes.season,
     SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
     SalesOrders.Material_Attributes.Field_6, SalesOrders.Material_Attributes.Field_7,
     SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
     SalesOrders.Material_Attributes.Field_9).  The size of Spool 7 is estimated
     with low confidence to be 129,144 rows (13,301,832 bytes).  The
     estimated time for this step is 0.05 seconds. 
  5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to Spool 7 (Last Use) by way of an
     all-rows scan.  Spool 6 and Spool 7 are joined using an inclusion
     merge join, with a join condition of ("(material = material) AND
     ((season = season) AND ((theme = theme) AND ((theme =
     theme) AND (((( CASE WHEN (NOT (af_grdval IS NULL )) THEN
     (af_grdval) ELSE ('') END ))= Field_6) AND (((( CASE WHEN (NOT
     (AF_STCAT IS NULL )) THEN (AF_STCAT) ELSE ('') END ))= Field_7)
     AND ((Material_Attributes_SRC_SYS_NM = Material_Attributes_SRC_SYS_NM) AND
     (Material_Attributes_UPD_TS = Field_9 )))))))").  The result goes into Spool
     8 (all_amps), which is duplicated on all AMPs.  The size of Spool
     8 is estimated with low confidence to be 144 rows (5,616 bytes). 
     The estimated time for this step is 0.04 seconds. 
  6) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
     all-rows scan, which is joined to SalesOrders.Allocation_Deliveries in view
     SalesOrders_view.Allocation_Deliveries by way of an all-rows scan with no
     residual conditions.  Spool 8 and SalesOrders.Allocation_Deliveries are
     joined using a single partition hash_ join, with a join condition
     of ("SalesOrders.Allocation_Deliveries.material = material").  The result goes
     into Spool 1 (group_amps), which is built locally on the AMPs. 
     The size of Spool 1 is estimated with low confidence to be 3,858
     rows (146,604 bytes).  The estimated time for this step is 0.44
     seconds. 
  7) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.70 seconds.
下面是记录分布的样子以及我用来生成结果集的SQL

SELECT HASHAMP(HASHBUCKET(HASHROW( MATERIAL ))) AS
"AMP#",COUNT(*)
FROM EDW_LND_SAP_VIEW.EMDMMU01_CUR
GROUP BY 1
ORDER BY 2 DESC;
输出 最高:电流137,共1093行 最低:带768行的放大器72
总安培数:144

统计建议

在PROD和QA中运行以下命令,并张贴差异(如果需要,请隐藏列名):

当与EXPLAIN命令一起运行时,此诊断将生成一个推荐统计信息列表,这可能有助于优化器生成成本最低的查询计划。这可能不会产生任何差异,也可能指向不同环境之间的差异(数据或其他)

查看和连接条件

根据您的解释计划,SalesOrders\u视图数据库中的一个或两个视图似乎正在使用EXISTS子句。此EXISTS子句依赖于合并条件(或显式案例逻辑)来适应一个表中定义为NOTNULL的列与另一个表中定义为允许NULL值的列之间的比较。这可能会影响该联接的性能

数据分布

您的分发结果似乎来自生产环境。(基于放大器的数量以及放大器上显示的最高和最低行的行数。)这对QA来说是什么样的

编辑-2013-01-09:21

如果数据是2个月前从Prod复制的,那么问这个问题似乎很愚蠢,但统计数据是在事后收集的吗?替换数据上的过时统计信息可能会导致环境之间的查询计划出现差异

您是否正在收集表上的分区统计信息,即使它们不是PPI表?这有助于优化器进行基数估计

您是QA系统上运行的唯一工作负载吗

您是否查看了DBQL指标来比较每个环境中查询的CPU和IO消耗?还要查看IO倾斜、CPU倾斜和不必要的IO指标

您是否在QA环境中设置了延迟节流阀,这些节流阀可能会延迟您的工作负载?这将使您感觉到,在QA环境中运行需要更长的时间,而实际上QA和PROD之间的实际CPU消耗和IO消耗是相同的

您有权访问Viewpoint吗

如果是,您是否使用“我的查询”和/或“查询聚光灯”查看过您的查询 观察其行为的portlet

您知道查询计划中的哪个步骤最昂贵或最耗时吗?使用我提到的portlet或DBQL中的步骤日志记录进行视点回放可以向您展示这一点

环境之间的DBS控制设置是否相同?请您的DBA看看这个。其中有一些设置可能会影响优化器使用的连接计划


最后,如果硬件和TDBMS补丁级别相同的两个系统上的数据、表结构、索引和统计数据相同,则不应得到两个不同的解释计划。如果最终是这样,我建议您联系GSC,让他们参与进来。

如果解释计划不同,那么QA和生产环境之间就会有所不同。可以共享解释计划、视图定义和表定义(我不在乎是否重命名列)?参与
内部联接的列的数据分布是什么样子的?请记住,这些列必须位于同一放大器上才能连接。如果你发现你的问题所在的数据中存在重新分配和偏差。Rob,我添加了QA的解释计划。必须更改名称,但它们匹配。此外,我使用WinMerge来区分2个解释计划,唯一的区别是行数和大小估计。当将它们与实际的表REC或联接中的预期内容进行比较时,它们都是准确的。唯一的偏差值是在步骤5(仅在QA中),其中显示144行。这应该是1米以上的行,因为它是在产品。您的产品和质量保证系统运行相同的版本和相同的大小?这三个问题说明了什么
DIAGNOSTIC HELPSTATS ON FOR SESSION;

EXPLAIN
select a.material
    , b.season
    , b.theme
    , b.collection
from SalesOrders_view.Allocation_Deliveries_cur a
inner join SalesOrders_view.Material_Attributes_cur b
    on a.material = b.material;