Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud platform BigQuery数组操作_Google Cloud Platform_Google Bigquery - Fatal编程技术网

Google cloud platform BigQuery数组操作

Google cloud platform BigQuery数组操作,google-cloud-platform,google-bigquery,Google Cloud Platform,Google Bigquery,我需要一些关于BigQuery数组操作的帮助,如下所示: 第1列表示内容ID的列表,第2列表示嵌入内容ID的列表 |---------------------------------------------------------------------------------------------------------------------------------------------------------| | Column1

我需要一些关于BigQuery数组操作的帮助,如下所示:

第1列表示内容ID的列表,第2列表示嵌入内容ID的列表

    |---------------------------------------------------------------------------------------------------------------------------------------------------------|
    | Column1                                                                                | Column2                                                        |
    |---------------------------------------------------------------------------------------------------------------------------------------------------------|
    |{"contentId":["1.5433912","1.5536755","1.5536970","1.5536380","1.5536809","1.5535567"]} |{'1.5433912':['1.5561001','1.5559520','1.5560946','1.5561026']} |
    |----------------------------------------------------------------------------------------|----------------------------------------------------------------|
    |{"contentId":["1.5536141","1.5535574","1.5534770","1.5535870"]}                       |{'1.5535574':['1.5527726','1.5533354','1.5533093']}             |
    |----------------------------------------------------------------------------------------|----------------------------------------------------------------|
    |{"contentId":["1.5561069","1.5557612","1.5561433"]}.                                    |{'1.5561069':['1.5527726'],'1.5561433':['1.5533093']}           |
    |----------------------------------------------------------------------------------------|----------------------------------------------------------------|


所需输出如下:


下面是BigQuery标准SQL

#standardSQL
SELECT ARRAY_CONCAT_AGG(
  IF(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", '')IS NULL, [TRIM(item, '"')],
    ARRAY(
      SELECT ref
      FROM UNNEST(SPLIT(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", ''))) AS ref WITH OFFSET
      ORDER BY OFFSET
    ))
  ORDER BY OFFSET) AS contentId
FROM `project.dataset.table` t,
UNNEST(JSON_EXTRACT_ARRAY(Column1, '$.contentId')) AS item WITH OFFSET 
LEFT JOIN UNNEST(REGEXP_EXTRACT_ALL(Column2, r"'.*?':\[.*?\]")) refs
ON STARTS_WITH(refs, "'" || TRIM(item, '"'))
GROUP BY FORMAT('%t', t)
如果要应用于问题中的样本数据,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '{"contentId":["1.5433912","1.5536755","1.5536970","1.5536380","1.5536809","1.5535567"]}' Column1, "{'1.5433912':['1.5561001','1.5559520','1.5560946','1.5561026']}" Column2 UNION ALL
  SELECT '{"contentId":["1.5536141","1.5535574","1.5534770","1.5535870"]}', " {'1.5535574':['1.5527726','1.5533354','1.5533093']} " UNION ALL
  SELECT '{"contentId":["1.5561069","1.5557612","1.5561433"]}', "{'1.5561069':['1.5527726'],'1.5561433':['1.5533093']}"
)
SELECT ARRAY_CONCAT_AGG(
  IF(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", '')IS NULL, [TRIM(item, '"')],
    ARRAY(
      SELECT ref
      FROM UNNEST(SPLIT(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", ''))) AS ref WITH OFFSET
      ORDER BY OFFSET
    ))
  ORDER BY OFFSET) AS contentId
FROM `project.dataset.table` t,
UNNEST(JSON_EXTRACT_ARRAY(Column1, '$.contentId')) AS item WITH OFFSET 
LEFT JOIN UNNEST(REGEXP_EXTRACT_ALL(Column2, r"'.*?':\[.*?\]")) refs
ON STARTS_WITH(refs, "'" || TRIM(item, '"'))
GROUP BY FORMAT('%t', t) 
结果与您期望的示例完全相同

Row contentId    
1   1.5561001    
    1.5559520    
    1.5560946    
    1.5561026    
    1.5536755    
    1.5536970    
    1.5536380    
    1.5536809    
    1.5535567    
2   1.5536141    
    1.5527726    
    1.5533354    
    1.5533093    
    1.5534770    
    1.5535870    
3   1.5527726    
    1.5557612    
    1.5533093    

下面是BigQuery标准SQL

#standardSQL
SELECT ARRAY_CONCAT_AGG(
  IF(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", '')IS NULL, [TRIM(item, '"')],
    ARRAY(
      SELECT ref
      FROM UNNEST(SPLIT(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", ''))) AS ref WITH OFFSET
      ORDER BY OFFSET
    ))
  ORDER BY OFFSET) AS contentId
FROM `project.dataset.table` t,
UNNEST(JSON_EXTRACT_ARRAY(Column1, '$.contentId')) AS item WITH OFFSET 
LEFT JOIN UNNEST(REGEXP_EXTRACT_ALL(Column2, r"'.*?':\[.*?\]")) refs
ON STARTS_WITH(refs, "'" || TRIM(item, '"'))
GROUP BY FORMAT('%t', t)
如果要应用于问题中的样本数据,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '{"contentId":["1.5433912","1.5536755","1.5536970","1.5536380","1.5536809","1.5535567"]}' Column1, "{'1.5433912':['1.5561001','1.5559520','1.5560946','1.5561026']}" Column2 UNION ALL
  SELECT '{"contentId":["1.5536141","1.5535574","1.5534770","1.5535870"]}', " {'1.5535574':['1.5527726','1.5533354','1.5533093']} " UNION ALL
  SELECT '{"contentId":["1.5561069","1.5557612","1.5561433"]}', "{'1.5561069':['1.5527726'],'1.5561433':['1.5533093']}"
)
SELECT ARRAY_CONCAT_AGG(
  IF(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", '')IS NULL, [TRIM(item, '"')],
    ARRAY(
      SELECT ref
      FROM UNNEST(SPLIT(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", ''))) AS ref WITH OFFSET
      ORDER BY OFFSET
    ))
  ORDER BY OFFSET) AS contentId
FROM `project.dataset.table` t,
UNNEST(JSON_EXTRACT_ARRAY(Column1, '$.contentId')) AS item WITH OFFSET 
LEFT JOIN UNNEST(REGEXP_EXTRACT_ALL(Column2, r"'.*?':\[.*?\]")) refs
ON STARTS_WITH(refs, "'" || TRIM(item, '"'))
GROUP BY FORMAT('%t', t) 
结果与您期望的示例完全相同

Row contentId    
1   1.5561001    
    1.5559520    
    1.5560946    
    1.5561026    
    1.5536755    
    1.5536970    
    1.5536380    
    1.5536809    
    1.5535567    
2   1.5536141    
    1.5527726    
    1.5533354    
    1.5533093    
    1.5534770    
    1.5535870    
3   1.5527726    
    1.5557612    
    1.5533093    

解释输出的逻辑,以便我们可以帮助您。同时,我觉得这是输出示例中的一个错误(仅基于我对逻辑的反向工程选项,但显然这只是胡乱猜测),输出需要根据它们的“引用键”(即1.5433912和1.5535574)合并两个列。发生合并时,输出列表还应保留每个内容ID在数组中的位置。例如,在第#1行上,引用键出现在位置0处,因此输出数组应该是该列2的所有内容ID,然后是该列1的所有内容ID。因此,引用位于第2列-对吗?它能容纳很少的参考资料吗?不只是你的例子中的一个,这就像根据位置匹配将数组列表从第2列粘贴到第1列一样。我希望我能够清楚地解释它。可能的参考文献太少了-对吧?解释输出的逻辑,这样我们可以帮助你。同时,我觉得这是输出示例中的一个错误(仅基于我对逻辑的反向工程选项,但显然这只是胡乱猜测),输出需要根据它们的“引用键”(即1.5433912和1.5535574)合并两个列。发生合并时,输出列表还应保留每个内容ID在数组中的位置。例如,在第#1行上,引用键出现在位置0处,因此输出数组应该是该列2的所有内容ID,然后是该列1的所有内容ID。因此,引用位于第2列-对吗?它能容纳很少的参考资料吗?不只是你的例子中的一个,这就像根据位置匹配将数组列表从第2列粘贴到第1列。我希望我能解释清楚。所以可能的参考文献很少-对吧?谢谢@mikhail的回答。您能告诉我GROUPBY子句在逻辑中的用法吗?最初,所有元素都不列在单独的行中。在它们全部组装好之后,你需要有一种方法将它们重新组合在一起。在您的示例中,没有可用于此操作的任何ID或其他字段。在这种情况下,您可以创建整行的散列,然后按该散列分组,因此
格式('%t',t)
就是原始表的
t
行的散列谢谢@mikhail的回答。您能告诉我GROUPBY子句在逻辑中的用法吗?最初,所有元素都不列在单独的行中。在它们全部组装好之后,你需要有一种方法将它们重新组合在一起。在您的示例中,没有可用于此操作的任何ID或其他字段。在这种情况下,您可以创建整行的散列,然后按该散列分组,因此
格式('%t',t)
就是原始表的
t
行的散列