Google bigquery 在BigQuery中,将字符串化的对象数组转换为非字符串化的对象数组

Google bigquery 在BigQuery中,将字符串化的对象数组转换为非字符串化的对象数组,google-bigquery,Google Bigquery,我正在将.json数据摄取到Google BigQuery中,在摄取过程中,数组和对象的数据类型都被投射到字符串列中。BigQuery中的数据如下所示: select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all select 2 as id, null as stringCol1, null as stringCol2 union all select 3 as id, "{'game': '22', 'year'

我正在将
.json
数据摄取到Google BigQuery中,在摄取过程中,
数组
对象
数据类型都被投射到
字符串
列中。BigQuery中的数据如下所示:

select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all
select 2 as id, null as stringCol1, null as stringCol2 union all
select 3 as id, "{'game': '22', 'year': 'sophomore'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32]" as stringCol2 union all
select 4 as id, "{'game': '17', 'year': 'freshman'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33]" as stringCol2 union all
select 5 as id, "{'game': '9', 'year': 'senior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17]" as stringCol2 union all
select 6 as id, "{'game': '234', 'year': 'junior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25]" as stringCol2

数据有点乱

  • stringCol1
    中,缺少数据的
    null
    '[]'
    值都存在。我想从这个字符串化对象创建两列
    game
    year
  • 对于
    stringCol2
    ,这始终是一个包含两个对象的数组,具有相同的键(
    teamName
    teamAge
    ,在本例中)。然后需要将其转换为4列
    teamName1
    teamAge1
    teamName2
    teamAge2

解决了将基本字符串化数组转换为非字符串化数组的问题,但这里的示例有点复杂。特别是,另一篇文章中的解决方案在这种情况下不起作用。

下面是针对BigQuery标准SQL的

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1, 
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
)])   
如果要应用于您问题中的样本数据

WITH `project.dataset.table` AS (
  SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
  SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
  SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
  SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
  SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
  SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
) 
输出为

Row id  game    year        teamName1   teamAge1    teamName2   teamAge2     
1   1   null    null        null        null        null        null     
2   2   null    null        null        null        null        null     
3   3   22      sophomore   teamA       37          teamB       32   
4   4   17      freshman    teamA       32          teamB       33   
5   5   9       senior      teamC       31          teamD       17   
6   6   234     junior      teamC       42          teamD       25      
例如,为了提高可读性,上面可以有很多变体

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])

下面是BigQuery标准SQL

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1, 
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
)])   
如果要应用于您问题中的样本数据

WITH `project.dataset.table` AS (
  SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
  SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
  SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
  SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
  SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
  SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
) 
输出为

Row id  game    year        teamName1   teamAge1    teamName2   teamAge2     
1   1   null    null        null        null        null        null     
2   2   null    null        null        null        null        null     
3   3   22      sophomore   teamA       37          teamB       32   
4   4   17      freshman    teamA       32          teamB       33   
5   5   9       senior      teamC       31          teamD       17   
6   6   234     junior      teamC       42          teamD       25      
例如,为了提高可读性,上面可以有很多变体

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])

非常有帮助,谢谢
json\u extract\u*
似乎是BigQueryVery中的一个强大功能,非常有用,谢谢
json\u extract\u*
似乎是BigQuery中一个强大的函数