Google bigquery FROM子句中的多个表如何在BigQuery中工作_Google Bigquery

Google bigquery FROM子句中的多个表如何在BigQuery中工作

google-bigquery

Google bigquery FROM子句中的多个表如何在BigQuery中工作,google-bigquery,Google Bigquery,我正试图将我的专栏从 id, english, math 1,100,200 2,50,100 到一张看起来像 id, subject, marks 1, english, 100 1, math, 200 2, english, 50 2, math, 100 我在BigQuery中玩这个临时表，我有这个代码 with marks as ( select 1 as id, 200 as math, 100 as english union all select 2 as id, 100 a

我正试图将我的专栏从

id, english, math
1,100,200
2,50,100

到一张看起来像

id, subject, marks
1, english, 100
1, math, 200
2, english, 50
2, math, 100

我在BigQuery中玩这个临时表，我有这个代码

with marks as (
select 1 as id, 200 as math, 100 as english union all
select 2 as id, 100 as math, 50 as english 
)

, temp as 
(
select 'math' as subject union all
select 'english' as subject
)


select * from marks, temp

当您同时提到两个表时，我不理解BigQuery是如何运行的。它是否在内部进行某种连接

它进行交叉连接。这就是你想要的：

with marks as (
  select 1 as id, 200 as math, 100 as english union all
  select 2 as id, 100 as math, 50 as english 
)
select id, subject, value
from marks
join unnest([struct('math' as subject, math as value), struct('english' as subject, english as value)]) as s

它做交叉连接。这就是你想要的：

with marks as (
  select 1 as id, 200 as math, 100 as english union all
  select 2 as id, 100 as math, 50 as english 
)
select id, subject, value
from marks
join unnest([struct('math' as subject, math as value), struct('english' as subject, english as value)]) as s

下面是针对BigQuery标准SQL的，不需要对列名进行任何显式引用，因此对于具有任意列数的表来说都足够通用，在本例中为主题

#standardSQL
SELECT id, subject, CAST(marks AS INT64) AS marks
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), '[{}"]', ''))) kv,
UNNEST([STRUCT(SPLIT(kv, ':')[OFFSET(0)] AS subject, SPLIT(kv, ':')[OFFSET(1)] AS marks)])
WHERE NOT subject = 'id'

如果要应用于示例中的样本数据，则输出为

Row id  subject marks    
1   1   english 100  
2   1   math    200  
3   2   english 50   
4   2   math    100

下面是针对BigQuery标准SQL的，不需要对列名进行任何显式引用，因此对于具有任意列数的表来说都足够通用，在本例中为主题

#standardSQL
SELECT id, subject, CAST(marks AS INT64) AS marks
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), '[{}"]', ''))) kv,
UNNEST([STRUCT(SPLIT(kv, ':')[OFFSET(0)] AS subject, SPLIT(kv, ':')[OFFSET(1)] AS marks)])
WHERE NOT subject = 'id'

如果要应用于示例中的样本数据，则输出为

Row id  subject marks    
1   1   english 100  
2   1   math    200  
3   2   english 50   
4   2   math    100

这是交叉连接。基本上，它对两个表进行笛卡尔乘法，这是一个交叉连接。基本上，它对两个表进行笛卡尔乘法。多亏了这一点，我知道BigQuery很快，但是如果输入表-

marks

中有2000万条记录，其中有10个主题会怎么样。交叉连接使表增长得非常快。那么这是一个优化的解决方案吗？有没有办法在BigQuery中检查这一点？BigQuery中的本机函数不可能实现融合。所以你必须使用unnest来制作它。另一方面，BigQuery非常擅长扩展。所以不要担心性能。我正在处理有数十亿行的表。它仍然可以工作，所以根本不用担心2000万行。感谢这一点，我知道BigQuery很快，但是如果输入表-

marks

中有2000万条包含10个主题的记录会怎么样呢。交叉连接使表增长得非常快。那么这是一个优化的解决方案吗？有没有办法在BigQuery中检查这一点？BigQuery中的本机函数不可能实现融合。所以你必须使用unnest来制作它。另一方面，BigQuery非常擅长扩展。所以不要担心性能。我正在处理有数十亿行的表。它仍然有效，所以根本不用担心2000万行。