将字符串强制转换为结构数组,然后在另一个SQL表中查找值
我当前有一个字符串,它表示表中的结构列表。我想根据结构中元素的值在另一个表中查找值 例如,下面的car info结构是[spare,carType,carcolor]将字符串强制转换为结构数组,然后在另一个SQL表中查找值,sql,google-bigquery,Sql,Google Bigquery,我当前有一个字符串,它表示表中的结构列表。我想根据结构中元素的值在另一个表中查找值 例如,下面的car info结构是[spare,carType,carcolor] ╔═══════════════════════════╗ ║ CarInfo ║ ╠═══════════════════════════╣ ║ “[1,1,1]” ║ ║ “[1,2,1] [1,1,2]” ║ ║ null
╔═══════════════════════════╗
║ CarInfo ║
╠═══════════════════════════╣
║ “[1,1,1]” ║
║ “[1,2,1] [1,1,2]” ║
║ null ║
║ “[1,2,1] [1,1,2] [1,1,1]” ║
╚═══════════════════════════╝
我想查一下表格:
╔═══════════╦═══════════════╦═════════════╦═════════════════╦══╗
║ CarTypeId ║ CarTypeString ║ CarColourId ║ CarColourString ║ ║
╠═══════════╬═══════════════╬═════════════╬═════════════════╬══╣
║ 1 ║ "Hyundai" ║ 1 ║ "Red" ║ ║
║ 1 ║ "Hyundai" ║ 2 ║ "Blue" ║ ║
║ 2 ║ "Toyota" ║ 1 ║ "Green" ║ ║
║ 2 ║ "Toyota" ║ 2 ║ "Yellow" ║ ║
╚═══════════╩═══════════════╩═════════════╩═════════════════╩══╝
并得到以下结果:
╔═════════════════════════════════════════════════════╗
║ CarInfo ║
╠═════════════════════════════════════════════════════╣
║ “[1,Hyundai,Red]” ║
║ “[1,Toyota,Green] [1,Hyundai,Blue]” ║
║ null ║
║ “[1,Toyota,Green] [1,Hyundai,Blue] [1,Hyundai,Red]” ║
╚═════════════════════════════════════════════════════╝
我发现我可以使用someString.split(CarInfo,'')将字符串拆分为数组,但此后我不确定如何执行结构转换或之后的“循环”左连接。下面是针对BigQuery标准SQL的
#standardSQL
SELECT STRING_AGG('[' || spare || ',' || carTypeString || ',' || carColourString || ']', ' ') AS CarInfo
FROM `project.dataset.cars` t
LEFT JOIN UNNEST(SPLIT(CarInfo, ' ')) info,
UNNEST([STRUCT(
SPLIT(TRIM(info, '[]'))[OFFSET(0)] AS spare,
CAST(SPLIT(TRIM(info, '[]'))[OFFSET(1)] AS INT64) AS carTypeId,
CAST(SPLIT(TRIM(info, '[]'))[OFFSET(2)] AS INT64) AS carColourId
)])
LEFT JOIN `project.dataset.lookup` l
USING(carTypeId, carColourId)
GROUP BY FORMAT('%t', t)
如果要应用于您问题中的样本数据,请参见下面的示例
#standardSQL
WITH `project.dataset.cars` AS (
SELECT '[1,1,1]' CarInfo UNION ALL
SELECT '[1,2,1] [1,1,2]' UNION ALL
SELECT NULL UNION ALL
SELECT '[1,2,1] [1,1,2] [1,1,1]'
), `project.dataset.lookup` AS (
SELECT 1 CarTypeId, 'Hyundai' CarTypeString, 1 CarColourId, 'Red' CarColourString UNION ALL
SELECT 1, 'Hyundai', 2, 'Blue' UNION ALL
SELECT 2, 'Toyota', 1, 'Green' UNION ALL
SELECT 2, 'Toyota', 2, 'Yellow'
)
SELECT STRING_AGG('[' || spare || ',' || carTypeString || ',' || carColourString || ']', ' ') AS CarInfo
FROM `project.dataset.cars` t
LEFT JOIN UNNEST(SPLIT(CarInfo, ' ')) info,
UNNEST([STRUCT(
SPLIT(TRIM(info, '[]'))[OFFSET(0)] AS spare,
CAST(SPLIT(TRIM(info, '[]'))[OFFSET(1)] AS INT64) AS carTypeId,
CAST(SPLIT(TRIM(info, '[]'))[OFFSET(2)] AS INT64) AS carColourId
)])
LEFT JOIN `project.dataset.lookup` l
USING(carTypeId, carColourId)
GROUP BY FORMAT('%t', t)
输出为
Row CarInfo
1 [1,Hyundai,Red]
2 [1,Toyota,Green] [1,Hyundai,Blue]
3 null
4 [1,Toyota,Green] [1,Hyundai,Blue] [1,Hyundai,Red]
澄清:在carInfo表中,列的具体数据类型是什么?拥有精确的表模式将有助于理解您的用例!它是一个字符串。所以双引号是字符串的一部分,或者您将它们添加到了强调符号中,它们是字符串?另外,在查找表
“现代”
-中,双引号实际上是值的一部分,还是仅仅是表示它是字符串的一种方式?在这两种情况下,我都将它们添加到了强调符号中,它们是字符串,对不起!非常感谢,太棒了。我还有一个简短的问题。如果我在CarInfo表中有另一列要开始,并且希望将其保留在末尾。使用GroupBy时,保存它的最佳方式是什么?我在其他地方看到其他人把Max(additionalCol)作为additionalCol来做?这正是应该做的:o)只需将它添加到最外层的选择中