Google bigquery 避免重复计算-BigQuery
我有两个表,对于A中的每个区域,我想找到B中最近的区域 表A:Google bigquery 避免重复计算-BigQuery,google-bigquery,Google Bigquery,我有两个表,对于A中的每个区域,我想找到B中最近的区域 表A: ------------------------ ID | Start | End | Color ------------------------ 1 | 400 | 500 | White ------------------------ 1 | 10 | 20 | Red ------------------------ 2 | 2 | 10 | Blue ------------------
------------------------
ID | Start | End | Color
------------------------
1 | 400 | 500 | White
------------------------
1 | 10 | 20 | Red
------------------------
2 | 2 | 10 | Blue
------------------------
4 | 88 | 90 | Color
------------------------
表B:
-------------------------------
ID | Start | End | Name | Name2
-------------------------------
1 | 1 | 2 | XYZ1 | EWQ
-------------------------------
1 | 50 | 60 | XYZ4 | EWY
-------------------------------
2 | 150 | 160 | ABC1 | TRE
-------------------------------
2 | 50 | 60 | ABC2 | YUE
-------------------------------
4 | 100 | 120 | EFG | MMN
-------------------------------
以下是结果表:
-------------------------------------------------------
ID | Start | End | Color | Closest Name | Closest Name2
-------------------------------------------------------
1 | 400 | 500 | White | XYZ4 | EWY
-------------------------------------------------------
1 | 10 | 20 | Red | XYZ1 | EWQ
-------------------------------------------------------
2 | 2 | 10 | Blue | ABC2 | YUE
-------------------------------------------------------
4 | 88 | 90 | Color | EFG | MMN
-------------------------------------------------------
以下是当前的解决方案:
#standardSQL
SELECT
A.ID,
A.Start,
A.END,
ARRAY_AGG(B.name
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)
LIMIT
1)[SAFE_OFFSET(0)] name,
ARRAY_AGG(B.name2
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)
LIMIT
1)[SAFE_OFFSET(0)] name2
FROM
A
JOIN
B
ON
A.ID = B.ID
WHERE (A.start>B.End) OR (B.Start> A.END)
GROUP BY
A.ID,
A.start,
A.END
在本例中,我们只有两个字段(name和name2);如果B有N个字段,那么有没有办法避免重复计算
谢谢 您应该能够将
ARRAY\u AGG
与STRUCT
一起使用。下面是几个示例表达式:
ARRAY_AGG(
B
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
THEN ABS(A.Start-B.END)
ELSE ABS(A.END-B.Start)
END)
LIMIT
1)[SAFE_OFFSET(0)].*
这将根据排序返回第一个B
实例的B
中的所有字段
ARRAY_AGG(
(SELECT AS STRUCT B.* EXCEPT(foo, bar))
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
THEN ABS(A.Start-B.END)
ELSE ABS(A.END-B.Start)
END)
LIMIT
1)[SAFE_OFFSET(0)].*
这将返回B
中除foo
和bar
之外的所有字段(您可以用任何要排除的名称替换这些名称)
这只返回
B
中的命名字段。你可以列出你想要的任何一个。你应该能够使用数组_AGG
和结构。下面是几个示例表达式:
ARRAY_AGG(
B
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
THEN ABS(A.Start-B.END)
ELSE ABS(A.END-B.Start)
END)
LIMIT
1)[SAFE_OFFSET(0)].*
这将根据排序返回第一个B
实例的B
中的所有字段
ARRAY_AGG(
(SELECT AS STRUCT B.* EXCEPT(foo, bar))
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
THEN ABS(A.Start-B.END)
ELSE ABS(A.END-B.Start)
END)
LIMIT
1)[SAFE_OFFSET(0)].*
这将返回B
中除foo
和bar
之外的所有字段(您可以用任何要排除的名称替换这些名称)
这只返回B
中的命名字段。你可以列出你想要的任何一个。下面会给你一个想法
#standardSQL
WITH A AS (
SELECT 1 a_id, 400 a_start, 500 a_end, 'White' color UNION ALL
SELECT 1, 10, 20 , 'Red' UNION ALL
SELECT 2, 2, 10, 'Blue' UNION ALL
SELECT 4, 88, 90, 'Color'
), B AS (
SELECT 1 b_id, 1 b_start, 2 b_end, 'XYZ1' name, 'EWQ' name2 UNION ALL
SELECT 1, 50, 60, 'XYZ4', 'EWY' UNION ALL
SELECT 2, 150, 160,'ABC1', 'TRE' UNION ALL
SELECT 2, 50, 60, 'ABC2', 'YUE' UNION ALL
SELECT 4, 100, 120,'EFG', 'MMN'
)
SELECT
a_id, a_start, a_end, color, names.name, names.name2
FROM (
SELECT a_id, a_start, a_end, color,
ARRAY_AGG(STRUCT(name, name2) ORDER BY POW(ABS(a_start - b_start), 2) + POW(ABS(a_end - b_end), 2) LIMIT 1)[SAFE_OFFSET(0)] names
FROM A JOIN B ON a_id = b_id
GROUP BY a_id, a_start, a_end, color
)
ORDER BY a_id
结果是
Row a_id a_start a_end color name name2
1 1 400 500 White XYZ4 EWY
2 1 10 20 Red XYZ1 EWQ
3 2 2 10 Blue ABC2 YUE
4 4 88 90 Color EFG MMN
下面应该给你一个想法
#standardSQL
WITH A AS (
SELECT 1 a_id, 400 a_start, 500 a_end, 'White' color UNION ALL
SELECT 1, 10, 20 , 'Red' UNION ALL
SELECT 2, 2, 10, 'Blue' UNION ALL
SELECT 4, 88, 90, 'Color'
), B AS (
SELECT 1 b_id, 1 b_start, 2 b_end, 'XYZ1' name, 'EWQ' name2 UNION ALL
SELECT 1, 50, 60, 'XYZ4', 'EWY' UNION ALL
SELECT 2, 150, 160,'ABC1', 'TRE' UNION ALL
SELECT 2, 50, 60, 'ABC2', 'YUE' UNION ALL
SELECT 4, 100, 120,'EFG', 'MMN'
)
SELECT
a_id, a_start, a_end, color, names.name, names.name2
FROM (
SELECT a_id, a_start, a_end, color,
ARRAY_AGG(STRUCT(name, name2) ORDER BY POW(ABS(a_start - b_start), 2) + POW(ABS(a_end - b_end), 2) LIMIT 1)[SAFE_OFFSET(0)] names
FROM A JOIN B ON a_id = b_id
GROUP BY a_id, a_start, a_end, color
)
ORDER BY a_id
结果是
Row a_id a_start a_end color name name2
1 1 400 500 White XYZ4 EWY
2 1 10 20 Red XYZ1 EWQ
3 2 2 10 Blue ABC2 YUE
4 4 88 90 Color EFG MMN
创建一个结构:
ARRAY_AGG((B.name, B.name2)
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)
LIMIT
1)[SAFE_OFFSET(0)] names,
创建一个结构:
ARRAY_AGG((B.name, B.name2)
ORDER BY
(CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)
LIMIT
1)[SAFE_OFFSET(0)] names,