将字符串强制转换为结构数组,然后在另一个SQL表中查找值

将字符串强制转换为结构数组,然后在另一个SQL表中查找值,sql,google-bigquery,Sql,Google Bigquery,我当前有一个字符串,它表示表中的结构列表。我想根据结构中元素的值在另一个表中查找值 例如,下面的car info结构是[spare,carType,carcolor] ╔═══════════════════════════╗ ║ CarInfo ║ ╠═══════════════════════════╣ ║ “[1,1,1]” ║ ║ “[1,2,1] [1,1,2]” ║ ║ null

我当前有一个字符串,它表示表中的结构列表。我想根据结构中元素的值在另一个表中查找值

例如,下面的car info结构是[spare,carType,carcolor]

╔═══════════════════════════╗
║          CarInfo          ║
╠═══════════════════════════╣
║ “[1,1,1]”                 ║
║ “[1,2,1] [1,1,2]”         ║
║ null                      ║
║ “[1,2,1] [1,1,2] [1,1,1]” ║
╚═══════════════════════════╝
我想查一下表格:

╔═══════════╦═══════════════╦═════════════╦═════════════════╦══╗
║ CarTypeId ║ CarTypeString ║ CarColourId ║ CarColourString ║  ║
╠═══════════╬═══════════════╬═════════════╬═════════════════╬══╣
║         1 ║ "Hyundai"     ║           1 ║ "Red"           ║  ║
║         1 ║ "Hyundai"     ║           2 ║ "Blue"          ║  ║
║         2 ║ "Toyota"      ║           1 ║ "Green"         ║  ║
║         2 ║ "Toyota"      ║           2 ║ "Yellow"        ║  ║
╚═══════════╩═══════════════╩═════════════╩═════════════════╩══╝
并得到以下结果:

╔═════════════════════════════════════════════════════╗
║                       CarInfo                       ║
╠═════════════════════════════════════════════════════╣
║ “[1,Hyundai,Red]”                                   ║
║ “[1,Toyota,Green] [1,Hyundai,Blue]”                 ║
║ null                                                ║
║ “[1,Toyota,Green] [1,Hyundai,Blue] [1,Hyundai,Red]” ║
╚═════════════════════════════════════════════════════╝

我发现我可以使用someString.split(CarInfo,'')将字符串拆分为数组,但此后我不确定如何执行结构转换或之后的“循环”左连接。

下面是针对BigQuery标准SQL的

#standardSQL
SELECT STRING_AGG('[' || spare || ',' || carTypeString || ',' || carColourString || ']', ' ') AS CarInfo
FROM `project.dataset.cars` t
LEFT JOIN UNNEST(SPLIT(CarInfo, ' ')) info,
UNNEST([STRUCT(
  SPLIT(TRIM(info, '[]'))[OFFSET(0)] AS spare, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(1)] AS INT64) AS carTypeId, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(2)] AS INT64) AS carColourId
)])
LEFT JOIN `project.dataset.lookup` l
USING(carTypeId, carColourId)
GROUP BY FORMAT('%t', t)   
如果要应用于您问题中的样本数据,请参见下面的示例

#standardSQL
WITH `project.dataset.cars` AS (
  SELECT '[1,1,1]' CarInfo UNION ALL
  SELECT '[1,2,1] [1,1,2]' UNION ALL
  SELECT NULL UNION ALL
  SELECT '[1,2,1] [1,1,2] [1,1,1]'
), `project.dataset.lookup` AS (
  SELECT 1 CarTypeId, 'Hyundai' CarTypeString, 1 CarColourId, 'Red' CarColourString UNION ALL
  SELECT 1, 'Hyundai', 2, 'Blue' UNION ALL
  SELECT 2, 'Toyota', 1, 'Green' UNION ALL
  SELECT 2, 'Toyota', 2, 'Yellow'
)
SELECT STRING_AGG('[' || spare || ',' || carTypeString || ',' || carColourString || ']', ' ') AS CarInfo
FROM `project.dataset.cars` t
LEFT JOIN UNNEST(SPLIT(CarInfo, ' ')) info,
UNNEST([STRUCT(
  SPLIT(TRIM(info, '[]'))[OFFSET(0)] AS spare, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(1)] AS INT64) AS carTypeId, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(2)] AS INT64) AS carColourId
)])
LEFT JOIN `project.dataset.lookup` l
USING(carTypeId, carColourId)
GROUP BY FORMAT('%t', t)      
输出为

Row CarInfo  
1   [1,Hyundai,Red]  
2   [1,Toyota,Green] [1,Hyundai,Blue]    
3   null     
4   [1,Toyota,Green] [1,Hyundai,Blue] [1,Hyundai,Red]    

澄清:在carInfo表中,列的具体数据类型是什么?拥有精确的表模式将有助于理解您的用例!它是一个字符串。所以双引号是字符串的一部分,或者您将它们添加到了强调符号中,它们是字符串?另外,在查找表
“现代”
-中,双引号实际上是值的一部分,还是仅仅是表示它是字符串的一种方式?在这两种情况下,我都将它们添加到了强调符号中,它们是字符串,对不起!非常感谢,太棒了。我还有一个简短的问题。如果我在CarInfo表中有另一列要开始,并且希望将其保留在末尾。使用GroupBy时,保存它的最佳方式是什么?我在其他地方看到其他人把Max(additionalCol)作为additionalCol来做?这正是应该做的:o)只需将它添加到最外层的选择中