Google bigquery 从单元格中的JSON中提取最后一项
我有一个名为Google bigquery 从单元格中的JSON中提取最后一项,google-bigquery,Google Bigquery,我有一个名为submission\u date的列,其中包含json单元格,如下所示: {"submitted":["January 24, 2019","January 25, 2019","January 30, 2019","February 27, 2019"],"submission_canceled":["January 24, 2019","January 25, 2019"],"returned":"February 19, 2019"} {"submitted":["Feb
submission\u date
的列,其中包含json单元格,如下所示:
{"submitted":["January 24, 2019","January 25, 2019","January 30,
2019","February 27, 2019"],"submission_canceled":["January 24,
2019","January 25, 2019"],"returned":"February 19, 2019"}
{"submitted":["February 27, 2019","March 5, 2019"],"submission_canceled":"March 5, 2019"}
或者像这样:
{"submitted":["January 24, 2019","January 25, 2019","January 30,
2019","February 27, 2019"],"submission_canceled":["January 24,
2019","January 25, 2019"],"returned":"February 19, 2019"}
{"submitted":["February 27, 2019","March 5, 2019"],"submission_canceled":"March 5, 2019"}
通过执行以下操作,我可以很容易地从“submission_Cancelled”字段获得第一个结果:
json_extract(submission_date, "$.submission_canceled[0]")
我想如果我想保持价值,我会:
json_extract(submission_date, "$.submission_canceled[-1]")
但这只是给了我一个空值。如您所见,有时
submission\u cancelled
字段在列表中会有多个日期,而其他时候它只会有一个日期,而不在列表中。我想从submission\u cancelled
部分获取列表中的单个项目或最后一个项目。下面的示例适用于BigQuery标准SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '{"submitted":["January 24, 2019","January 25, 2019","January 30, 2019","February 27, 2019"],"submission_canceled":["January 24, 2019","January 25, 2019"],"returned":"February 19, 2019"}' submission_date UNION ALL
SELECT 2, '{"submitted":["February 27, 2019","March 5, 2019"],"submission_canceled":"March 5, 2019"}'
)
SELECT id, REGEXP_REPLACE(ARRAY_REVERSE(SPLIT(JSON_EXTRACT(submission_date, '$.submission_canceled'), '","'))[OFFSET(0)], r'"|\[|\]', '') last_submission_canceled
FROM `project.dataset.table`
结果
Row id last_submission_canceled
1 1 January 25, 2019
2 2 March 5, 2019
更新-下面是“更轻”的版本
结果显然是一样的
Row id last_submission_canceled
1 1 January 25, 2019
2 2 March 5, 2019
下面是BigQuery标准SQL的示例
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '{"submitted":["January 24, 2019","January 25, 2019","January 30, 2019","February 27, 2019"],"submission_canceled":["January 24, 2019","January 25, 2019"],"returned":"February 19, 2019"}' submission_date UNION ALL
SELECT 2, '{"submitted":["February 27, 2019","March 5, 2019"],"submission_canceled":"March 5, 2019"}'
)
SELECT id, REGEXP_REPLACE(ARRAY_REVERSE(SPLIT(JSON_EXTRACT(submission_date, '$.submission_canceled'), '","'))[OFFSET(0)], r'"|\[|\]', '') last_submission_canceled
FROM `project.dataset.table`
结果
Row id last_submission_canceled
1 1 January 25, 2019
2 2 March 5, 2019
更新-下面是“更轻”的版本
结果显然是一样的
Row id last_submission_canceled
1 1 January 25, 2019
2 2 March 5, 2019
当然但不幸的是,在BigQuery中解析json受到限制,所以另一个选择是使用js udf来模拟常规的jpath功能——我在这里有很多答案,所以有了这些答案examples@ndevito1-谢谢你“强迫”我再次访问我的答案-见更新-希望现在更简单:o)@ndevito1-你有机会尝试吗?是的,我们能够轻松实现这一点!非常感谢!当然但不幸的是,在BigQuery中解析json受到限制,所以另一个选择是使用js udf来模拟常规的jpath功能——我在这里有很多答案,所以有了这些答案examples@ndevito1-谢谢你“强迫”我再次访问我的答案-见更新-希望现在更简单:o)@ndevito1-你有机会尝试吗?是的,我们能够轻松实现这一点!非常感谢!