Google bigquery 在BigQuery中,将字符串化的对象数组转换为非字符串化的对象数组
我正在将Google bigquery 在BigQuery中,将字符串化的对象数组转换为非字符串化的对象数组,google-bigquery,Google Bigquery,我正在将.json数据摄取到Google BigQuery中,在摄取过程中,数组和对象的数据类型都被投射到字符串列中。BigQuery中的数据如下所示: select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all select 2 as id, null as stringCol1, null as stringCol2 union all select 3 as id, "{'game': '22', 'year'
.json
数据摄取到Google BigQuery中,在摄取过程中,数组
和对象
的数据类型都被投射到字符串
列中。BigQuery中的数据如下所示:
select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all
select 2 as id, null as stringCol1, null as stringCol2 union all
select 3 as id, "{'game': '22', 'year': 'sophomore'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32]" as stringCol2 union all
select 4 as id, "{'game': '17', 'year': 'freshman'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33]" as stringCol2 union all
select 5 as id, "{'game': '9', 'year': 'senior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17]" as stringCol2 union all
select 6 as id, "{'game': '234', 'year': 'junior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25]" as stringCol2
数据有点乱
- 在
stringCol1
中,缺少数据的null
和'[]'
值都存在。我想从这个字符串化对象创建两列game
和year
- 对于
stringCol2
,这始终是一个包含两个对象的数组,具有相同的键(teamName
和teamAge
,在本例中)。然后需要将其转换为4列teamName1
,teamAge1
,teamName2
,teamAge2
解决了将基本字符串化数组转换为非字符串化数组的问题,但这里的示例有点复杂。特别是,另一篇文章中的解决方案在这种情况下不起作用。下面是针对BigQuery标准SQL的
#standardSQL
SELECT id,
JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(
JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1,
JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
)])
如果要应用于您问题中的样本数据
WITH `project.dataset.table` AS (
SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
)
输出为
Row id game year teamName1 teamAge1 teamName2 teamAge2
1 1 null null null null null null
2 2 null null null null null null
3 3 22 sophomore teamA 37 teamB 32
4 4 17 freshman teamA 32 teamB 33
5 5 9 senior teamC 31 teamD 17
6 6 234 junior teamC 42 teamD 25
例如,为了提高可读性,上面可以有很多变体
#standardSQL
SELECT id,
JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])
下面是BigQuery标准SQL
#standardSQL
SELECT id,
JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(
JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1,
JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
)])
如果要应用于您问题中的样本数据
WITH `project.dataset.table` AS (
SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
)
输出为
Row id game year teamName1 teamAge1 teamName2 teamAge2
1 1 null null null null null null
2 2 null null null null null null
3 3 22 sophomore teamA 37 teamB 32
4 4 17 freshman teamA 32 teamB 33
5 5 9 senior teamC 31 teamD 17
6 6 234 junior teamC 42 teamD 25
例如,为了提高可读性,上面可以有很多变体
#standardSQL
SELECT id,
JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])
非常有帮助,谢谢json\u extract\u*
似乎是BigQueryVery中的一个强大功能,非常有用,谢谢json\u extract\u*
似乎是BigQuery中一个强大的函数