Sql 如何在firebase数据库上简化和扩展BigQuery子筛选事件参数_Sql_Google Bigquery

Sql 如何在firebase数据库上简化和扩展BigQuery子筛选事件参数

sql google-bigquery

Sql 如何在firebase数据库上简化和扩展BigQuery子筛选事件参数,sql,google-bigquery,Sql,Google Bigquery,我正在尝试进行BigQuery查询，以从firebase db子表事件参数中提取数据。表events_*有一个事件列表，其中字段event_params是一个结构，其中包含一个表，其形式为键，int_值，以及本例中未使用的其他字段例如，一个事件可能有如下事件参数列表： key int_value level 1 time 35 kills 10 我想按级别筛选事件列表，但选择该事件的时间。我得到的解决方案是： select user_pseudo_id, x.value.in

我正在尝试进行BigQuery查询，以从firebase db子表

事件参数

中提取数据。表events_*有一个事件列表，其中字段

event_params

是一个结构，其中包含一个表，其形式为

键

，

int_值

，以及本例中未使用的其他字段

例如，一个事件可能有如下事件参数列表：

key    int_value
level  1
time   35
kills  10

我想按级别筛选事件列表，但选择该事件的时间。我得到的解决方案是：

select user_pseudo_id, x.value.int_value as time
from (

  select user_pseudo_id, event_params
  from `whatever.events_*`, unnest(event_params) as y
  where event_name = 'level_complete' and y.key = 'level' and y.value.int_value = 1

), unnest(event_params) as x
where x.key = 'time'

它适用于本案例，但我有几个问题要问如何改进并使其增长：

有没有一种方法可以在一个查询中简化这一点
如果我想得到时间场，但也要得到kills场呢？（我能想到的唯一解决方案是复制查询并合并两个结果，但是如果我想提取很多字段呢？）
这可以通过创建一个类似于返回结果表的函数来简化吗，如
```
过滤器（'level_complete'，'level'，1）
```

提前谢谢

数据透视：

# start with some sample data
with data as (
select 1 as user_pseudo_id, 'level_complete' as event_name, [struct('level' as key, 1 as int_value), struct('time' as key, 35 as int_value), struct('kills' as key, 10 as int_value)] as event_params
UNION ALL
select 2 as user_pseudo_id, 'level_complete' as event_name, [struct('level' as key, 4 as int_value), struct('time' as key, 10 as int_value), struct('kills' as key, 1 as int_value)] as event_params
UNION ALL
select 3 as user_pseudo_id, 'level_complete' as event_name, [struct('level' as key, 6 as int_value), struct('time' as key, 30 as int_value), struct('kills' as key, 2 as int_value)] as event_params
UNION ALL
select 3 as user_pseudo_id, 'other_type' as event_name, [struct('bad' as key, 5 as int_value), struct('time' as key, 40 as int_value), struct('kills' as key, 3 as int_value)] as event_params
)
# This is the "code" part
select user_pseudo_id, event_name,
  max(if(params.key = 'level', int_value, null)) level,
  max(if(params.key = 'time', int_value, null)) time,
  max(if(params.key = 'kills', int_value, null)) kills,
  array_agg(distinct if(params.key in ('level','time','kills'), null, params.key) ignore nulls) as unexpected_keys
from data
CROSS JOIN unnest(event_params) as params
group by user_pseudo_id, event_name

或者，如果您希望以这种格式创建转换表，请尝试

我一直在python程序中使用这种技术，该程序查询

键

的所有唯一值，然后使用适当的动态列创建

视图

。如果您的值是静态的，那么手工编写的SQL更简单。

这一个不需要太多资源，因为它不执行聚合：

select *
from (
  select 
    user_pseudo_id,
    event_name,
    (select int_value from unnest(event_params) where key = 'level') as level,
    (select int_value from unnest(event_params) where key = 'time') as time,
    (select int_value from unnest(event_params) where key = 'kills') as kills
  from `whatever.events_*`
  where event_name = 'level_complete'
)
where level = 1

提供示例数据和预期结果您可以提供一个参考，说明子选择比BQ中的聚合便宜吗？另外，公平警告，如果某个键在任何行的

event_params

中重复，则整个查询失败，而

标量子查询生成了多个元素

Hi，完全同意。这一个不适合在一行中有多个重复键的情况。资源消耗我通常在“执行详细信息”选项卡（在“结果”旁边）中进行检查。当您使用类似的查询检查“执行详细信息”时，您是否看到子选择优于聚合？您的索赔证据在哪里？

由于它不执行聚合，因此资源密集度较低

？当然，只需比较聚合和子选择的执行详细信息即可。例如，对于答案中的样本数据，子选择的“计算”总数较少。