Google bigquery 避免重复计算-BigQuery

Google bigquery 避免重复计算-BigQuery,google-bigquery,Google Bigquery,我有两个表,对于A中的每个区域,我想找到B中最近的区域 表A: ------------------------ ID | Start | End | Color ------------------------ 1 | 400 | 500 | White ------------------------ 1 | 10 | 20 | Red ------------------------ 2 | 2 | 10 | Blue ------------------

我有两个表,对于A中的每个区域,我想找到B中最近的区域

表A:

------------------------
ID | Start | End | Color 
------------------------
 1 |  400  | 500 | White
------------------------
 1 |  10   | 20  | Red 
------------------------
 2 |   2   |  10 | Blue 
------------------------
 4 |   88  |  90 | Color 
------------------------
表B:

-------------------------------
ID | Start | End | Name | Name2 
-------------------------------
 1 |  1    | 2   | XYZ1 | EWQ
-------------------------------
 1 |  50   | 60  | XYZ4 | EWY
-------------------------------
 2 |  150  | 160 | ABC1 | TRE
-------------------------------
 2 |  50   | 60  | ABC2 | YUE
-------------------------------
 4 |  100  | 120 | EFG  | MMN
-------------------------------
以下是结果表:

-------------------------------------------------------
ID | Start | End | Color | Closest Name | Closest Name2
-------------------------------------------------------
 1 |  400  | 500 | White |   XYZ4       |   EWY
-------------------------------------------------------
 1 |  10   | 20  | Red   |   XYZ1       |  EWQ
-------------------------------------------------------
 2 |   2   |  10 | Blue  |   ABC2       |  YUE
-------------------------------------------------------
 4 |   88  |  90 | Color |   EFG        |  MMN
-------------------------------------------------------
以下是当前的解决方案:

#standardSQL
SELECT
  A.ID,
  A.Start,
  A.END,
  ARRAY_AGG(B.name
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)  
  LIMIT
    1)[SAFE_OFFSET(0)] name,
  ARRAY_AGG(B.name2
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)  
  LIMIT
    1)[SAFE_OFFSET(0)] name2

FROM
     A
JOIN
     B
ON
  A.ID = B.ID

  WHERE  (A.start>B.End) OR (B.Start> A.END)
GROUP BY
  A.ID,
  A.start,
  A.END
在本例中,我们只有两个字段(name和name2);如果B有N个字段,那么有没有办法避免重复计算


谢谢

您应该能够将
ARRAY\u AGG
STRUCT
一起使用。下面是几个示例表达式:

ARRAY_AGG(
  B
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
      THEN ABS(A.Start-B.END)
      ELSE ABS(A.END-B.Start)
    END)  
  LIMIT
    1)[SAFE_OFFSET(0)].*
这将根据排序返回第一个
B
实例的
B
中的所有字段

ARRAY_AGG(
  (SELECT AS STRUCT B.* EXCEPT(foo, bar))
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
      THEN ABS(A.Start-B.END)
      ELSE ABS(A.END-B.Start)
    END)  
  LIMIT
    1)[SAFE_OFFSET(0)].*
这将返回
B
中除
foo
bar
之外的所有字段(您可以用任何要排除的名称替换这些名称)


这只返回
B
中的命名字段。你可以列出你想要的任何一个。

你应该能够使用
数组_AGG
结构。下面是几个示例表达式:

ARRAY_AGG(
  B
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
      THEN ABS(A.Start-B.END)
      ELSE ABS(A.END-B.Start)
    END)  
  LIMIT
    1)[SAFE_OFFSET(0)].*
这将根据排序返回第一个
B
实例的
B
中的所有字段

ARRAY_AGG(
  (SELECT AS STRUCT B.* EXCEPT(foo, bar))
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
      THEN ABS(A.Start-B.END)
      ELSE ABS(A.END-B.Start)
    END)  
  LIMIT
    1)[SAFE_OFFSET(0)].*
这将返回
B
中除
foo
bar
之外的所有字段(您可以用任何要排除的名称替换这些名称)


这只返回
B
中的命名字段。你可以列出你想要的任何一个。

下面会给你一个想法

#standardSQL
WITH A AS (
  SELECT 1 a_id, 400 a_start, 500 a_end, 'White' color UNION ALL
  SELECT 1, 10,  20  , 'Red' UNION ALL
  SELECT 2, 2,   10, 'Blue' UNION ALL
  SELECT 4, 88,  90, 'Color'
), B AS (
  SELECT 1 b_id, 1 b_start, 2 b_end, 'XYZ1' name, 'EWQ' name2 UNION ALL
  SELECT 1, 50, 60,  'XYZ4', 'EWY' UNION ALL
  SELECT 2, 150, 160,'ABC1', 'TRE' UNION ALL
  SELECT 2, 50, 60,  'ABC2', 'YUE' UNION ALL
  SELECT 4, 100, 120,'EFG', 'MMN'
)
SELECT 
  a_id, a_start, a_end, color, names.name, names.name2
FROM (
  SELECT a_id, a_start, a_end, color,  
    ARRAY_AGG(STRUCT(name, name2) ORDER BY POW(ABS(a_start - b_start), 2) + POW(ABS(a_end - b_end), 2) LIMIT 1)[SAFE_OFFSET(0)] names
  FROM A JOIN B ON a_id = b_id
  GROUP BY a_id, a_start, a_end, color
)
ORDER BY a_id  
结果是

Row a_id    a_start a_end   color   name    name2    
1   1       400     500     White   XYZ4    EWY  
2   1       10      20      Red     XYZ1    EWQ  
3   2       2       10      Blue    ABC2    YUE  
4   4       88      90      Color   EFG     MMN  

下面应该给你一个想法

#standardSQL
WITH A AS (
  SELECT 1 a_id, 400 a_start, 500 a_end, 'White' color UNION ALL
  SELECT 1, 10,  20  , 'Red' UNION ALL
  SELECT 2, 2,   10, 'Blue' UNION ALL
  SELECT 4, 88,  90, 'Color'
), B AS (
  SELECT 1 b_id, 1 b_start, 2 b_end, 'XYZ1' name, 'EWQ' name2 UNION ALL
  SELECT 1, 50, 60,  'XYZ4', 'EWY' UNION ALL
  SELECT 2, 150, 160,'ABC1', 'TRE' UNION ALL
  SELECT 2, 50, 60,  'ABC2', 'YUE' UNION ALL
  SELECT 4, 100, 120,'EFG', 'MMN'
)
SELECT 
  a_id, a_start, a_end, color, names.name, names.name2
FROM (
  SELECT a_id, a_start, a_end, color,  
    ARRAY_AGG(STRUCT(name, name2) ORDER BY POW(ABS(a_start - b_start), 2) + POW(ABS(a_end - b_end), 2) LIMIT 1)[SAFE_OFFSET(0)] names
  FROM A JOIN B ON a_id = b_id
  GROUP BY a_id, a_start, a_end, color
)
ORDER BY a_id  
结果是

Row a_id    a_start a_end   color   name    name2    
1   1       400     500     White   XYZ4    EWY  
2   1       10      20      Red     XYZ1    EWQ  
3   2       2       10      Blue    ABC2    YUE  
4   4       88      90      Color   EFG     MMN  
创建一个结构:

ARRAY_AGG((B.name, B.name2)
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)  
  LIMIT
    1)[SAFE_OFFSET(0)] names,
创建一个结构:

ARRAY_AGG((B.name, B.name2)
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)  
  LIMIT
    1)[SAFE_OFFSET(0)] names,