Sql 什么是Teradata中扭曲因子的坏数字?

Sql 什么是Teradata中扭曲因子的坏数字?,sql,optimization,teradata,skew,Sql,Optimization,Teradata,Skew,我通过以下方式确定倾斜系数: SELECT TABLENAME, SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM, (100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR FROM DBC.TABLESIZE WHERE DATABASENAME= <DATABASENAME> AND TABLENAME =<TABLENAME> GROUP B

我通过以下方式确定倾斜系数:

SELECT 
TABLENAME,
SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM, 
(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR 
FROM 
DBC.TABLESIZE 
WHERE DATABASENAME= <DATABASENAME> 
AND 
TABLENAME =<TABLENAME>  
GROUP BY 1;
选择
表名,
总和(CURRENTPERM)/(1024*1024)为CURRENTPERM,
(100-(平均值(CURRENTPERM)/最大值(CURRENTPERM)*100))作为偏斜系数
从…起
表大小
其中DATABASENAME=
和
表名=
分组1例;

对于某些大小约为600 Gb的表,倾斜系数为30%。对于大小为10GB的表来说,98%是相当高的。上面的数字到底有多糟?有没有官方文章说应该重新分配超过10%的收入?我需要它来证实对集市开发商的要求。我发现只有

没有任何神奇的数字,但是拥有一个具有98%倾斜的表意味着几乎所有数据都位于一个AMP中,这意味着(1)您正在失去并行数据库的性能优势(2)您正在系统上创建不平衡负载。

倾斜系数为30意味着与平均值相比,最大放大器上的数据大约多出40%。这可能仍然是可以接受的(当然,这取决于),与你的数据库管理员讨论他们通常认为太大的问题。 另一方面,98意味着最大放大器上的数据量是原来的40到50倍,这意味着要多得多

这将比较计算倾斜的两种方法:

SELECT
   t.DatabaseName
   ,t.TableName

   -- currently used diskspace in GB
   ,SUM(t.CurrentPerm) / 1024**3 (DEC(9,2)) AS CurrentPermGB

   -- currently needed diskspace in GB to store this table as standalone (due to Skew)
   ,MAX(t.CurrentPerm) / 1024**3 * (HASHAMP() + 1) (DEC(9,2)) AS SkewedPermGB

   ,SkewedPermGB - CurrentPermGB  AS WastedPermGB

   -- AMP with higehst disk usage
   ,MAX(t.MaxPermAMP) AS SkewedAMP

   -- skew factor, 1 = even distribution, 1.1 = max AMP needs 10% more space than the average AMP
   ,MAX(t.CurrentPerm) / NULLIF(AVG(t.CurrentPerm),0) (DEC(5,2)) AS SkewFactor

   -- skew factor, between 0 and 99.  Same calculation as WinDDI/ TD Administrator
   ,(100 - (AVG(t.CurrentPerm) / NULLIF(MAX(t.CurrentPerm),0) * 100)) (DEC(3,0)) AS SkewFactor_WINDDI
FROM
 (
   SELECT
      DatabaseName,
      TableName,
      CurrentPerm,
      CASE WHEN CurrentPerm = MAX(CurrentPerm) OVER (PARTITION BY DatabaseName, TableName) THEN vproc END AS MaxPermAMP
   FROM dbc.TableSizeV
   WHERE DatabaseName = '???' --
) AS t
GROUP BY 1,2
HAVING SkewFactor > 1.1 -- or whatever
   AND SkewedPermGB > 10 -- or whatever
ORDER BY WastedPermGB DESC
;

我无法引用确切的源代码,但a基于它创建了一个脚本(可能是Teradata本身),在这个脚本中,您可能希望在大型表上看到50+的倾斜系数。@access_已授予-所以50可以吗。?49? 48? 47?. 每一种都有其影响