Google bigquery 结构的BigQuery物化视图
我们正在尝试创建一个大型BQ表的物化视图。该表接收大量流式web活动插入,是多租户的,并且真正利用了BQ的嵌套列结构 我们希望创建此表的子集,以便以最小的管理开销实现更高效、近实时的查询执行。我们认为最简单的解决方案是创建一个物化视图,它只是行(按客户端)和列的子集,但当前物化视图需要聚合 此外,物化视图beta版支持一组有限的聚合函数,不支持子选择或非最新操作。我们还没有找到一种很好的方法将深度嵌套的结构提取到物化视图中。一个简单的例子:Google bigquery 结构的BigQuery物化视图,google-bigquery,Google Bigquery,我们正在尝试创建一个大型BQ表的物化视图。该表接收大量流式web活动插入,是多租户的,并且真正利用了BQ的嵌套列结构 我们希望创建此表的子集,以便以最小的管理开销实现更高效、近实时的查询执行。我们认为最简单的解决方案是创建一个物化视图,它只是行(按客户端)和列的子集,但当前物化视图需要聚合 此外,物化视图beta版支持一组有限的聚合函数,不支持子选择或非最新操作。我们还没有找到一种很好的方法将深度嵌套的结构提取到物化视图中。一个简单的例子: SELECT '7602E3E96349E972
SELECT
'7602E3E96349E972' as session_id,
'084F0262' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SAVE50'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'7602E3E96349E972' as session_id,
'01ECB6EF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SPRING','LOVE'] as value),
STRUCT(
'discounts' as name,
['14.99','6.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'508082BC49BAC09F' as session_id,
'038B67CF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['FREESHIP','HOLIDAY25'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as transaction
)] as contexts_transaction
UNION ALL
SELECT
'C88AE153C784D910' as session_id,
'EA716BD2' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['CYBER'] as value),
STRUCT(
'discounts' as name,
['9.99','19.99'] as value)
] as modifiers
)]
在理想情况下,我们将保持该结构的原样,我们正试图在物化视图中实现类似的功能(认识到这些功能不受支持):
选择
会话id,
交易id,
ARRAY_AGG(STRUCT(mods_ARRAY.name,mods_ARRAY.value))作为修饰符
从数据上看,,
UNNEST(上下文\u事务)trans\u数组,
UNNEST(trans_数组.修饰符)mods_数组
按1,2分组
我们愿意采用任何方法对这个庞大的表格进行细分,不仅仅是MV,而且希望它具有相同的好处(低维护、自动化、低成本)。任何建议,谢谢 据我从您的问题中了解,您希望得到类似的输出:
with rawdata AS
(
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;
因此,输入表如下所示而输出表看起来像
如果您的意图不同,请在问题中详细说明 为此,我尝试将该查询创建为物化视图
create or replace table project.dataset.rawdata as
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;
create materialized view project.dataset.mview as
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid
但是,我在物化视图:array\u concat\u agg.中得到了错误不支持的聚合函数。
由于物化视图还处于测试阶段,我们不知道将来是否会支持它。然而,用现有的能力是不可能做到这一点的
@也许FHOFA可以告诉我更多信息。据我从您的问题中了解,您希望得到类似的输出:
with rawdata AS
(
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;
因此,输入表如下所示
而输出表看起来像
如果您的意图不同,请在问题中详细说明
为此,我尝试将该查询创建为物化视图
create or replace table project.dataset.rawdata as
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;
create materialized view project.dataset.mview as
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid
但是,我在物化视图:array\u concat\u agg.
中得到了错误不支持的聚合函数。
由于物化视图还处于测试阶段,我们不知道将来是否会支持它。然而,用现有的能力是不可能做到这一点的
@FHOFA可能会提供更多信息。您能提供一个示例输入和预期输出吗?请编辑问题,并向我们展示您希望运行的查询。您能提供一个示例输入和预期输出吗?请编辑问题,并向我们展示您希望运行的查询。谢谢Sabri和Felipe。优秀的反馈和你正在寻找的细节的例子。我们现在正在编辑这个问题。我已经编辑了这个问题-希望我的意图更清楚!谢谢萨布里和菲利佩。优秀的反馈和你正在寻找的细节的例子。我们现在正在编辑这个问题。我已经编辑了这个问题-希望我的意图更清楚!