Sql 多层次结构上的数组聚合
考虑以下相当标准的非规范化事务信息模型:Sql 多层次结构上的数组聚合,sql,google-bigquery,Sql,Google Bigquery,考虑以下相当标准的非规范化事务信息模型: with transactions as( select 'T_10000' as trans_id, 'L_1000' as line_item_id, 'P_100' as part_id union all select 'T_10000', 'L_1000', 'P_101' union all select 'T_10000', 'L_1001', 'P_103' union all select 'T_1000
with transactions as(
select 'T_10000' as trans_id, 'L_1000' as line_item_id, 'P_100' as part_id
union all
select 'T_10000', 'L_1000', 'P_101'
union all
select 'T_10000', 'L_1001', 'P_103'
union all
select 'T_10001', 'L_1002', 'P_104'
)
我想进一步反规范化这个表,以消除所有重复的值。BigQuery中的数组似乎是一个很好的选择
下表已关闭,但仍在第二列中返回重复值
select trans_id, array_agg(line_item_id), array_agg(part_id)
from transactions
group by 1
此外,下面的内容也很接近,但现在在第一列中包含重复的值
select trans_id, line_item_id, array_agg(part_id)
from transactions
group by 1, 2
有没有一种直接的方法可以做到这一点?这就是你想要的吗
select trans_id, array_agg(distinct line_item_id), array_agg(part_id) as parts
from transactions t
group by trans_id;
这是你想要的吗
select trans_id, array_agg(distinct line_item_id), array_agg(part_id) as parts
from transactions t
group by trans_id;
下面是BigQuery标准SQL
#standardSQL
SELECT trans_id,
ARRAY_AGG(STRUCT(line_item_id, parts)) items
FROM (
SELECT trans_id,
line_item_id,
ARRAY_AGG(part_id) parts
FROM transactions
GROUP BY trans_id, line_item_id
)
GROUP BY trans_id
当应用于问题中的样本数据时-结果为
下面是BigQuery标准SQL
#standardSQL
SELECT trans_id,
ARRAY_AGG(STRUCT(line_item_id, parts)) items
FROM (
SELECT trans_id,
line_item_id,
ARRAY_AGG(part_id) parts
FROM transactions
GROUP BY trans_id, line_item_id
)
GROUP BY trans_id
当应用于问题中的样本数据时-结果为