Sql 将带有数组的行拆分为带有子数组的多行
假设我有一张这样的桌子:Sql 将带有数组的行拆分为带有子数组的多行,sql,postgresql,Sql,Postgresql,假设我有一张这样的桌子: +-----+-----------------+-------------+ | ID | Points | BreakPoints | +-----+-----------------+-------------+ | 123 | {6,8,1,3,7,9} | {1,7} | | 456 | {16,9,78,96,33} | {78} | +-----+-----------------+------------
+-----+-----------------+-------------+
| ID | Points | BreakPoints |
+-----+-----------------+-------------+
| 123 | {6,8,1,3,7,9} | {1,7} |
| 456 | {16,9,78,96,33} | {78} |
+-----+-----------------+-------------+
+-----+------------+
| ID | Points |
+-----+------------+
| 123 | {6,8,1} |
| 123 | {1,3,7} |
| 123 | {7,9} |
| 456 | {16,9,78} |
| 456 | {78,96,33} |
+-----+------------+
我想在保留原始行的ID
的同时,在断点
中包含的点上“打断”这些点
序列元素的顺序很重要,因此我无法对它们进行排序
还请注意,断点位于两个结果行中,这两个结果行都来自于在该断点处打断原始序列(分别位于端点和起点)。所以结果应该是这样的:
+-----+-----------------+-------------+
| ID | Points | BreakPoints |
+-----+-----------------+-------------+
| 123 | {6,8,1,3,7,9} | {1,7} |
| 456 | {16,9,78,96,33} | {78} |
+-----+-----------------+-------------+
+-----+------------+
| ID | Points |
+-----+------------+
| 123 | {6,8,1} |
| 123 | {1,3,7} |
| 123 | {7,9} |
| 456 | {16,9,78} |
| 456 | {78,96,33} |
+-----+------------+
当然,我可以编写PL/pgSQL函数,为每一行调用它,迭代数组,并为每个子序列返回下一个。但是有没有其他方法,不调用所有行的函数?我想这就是您想要的:
select t.id,
array_agg(point order by point)
from t cross join
unnest(points) point cross join lateral
(select lag(breakpoint) over (order by breakpoint) as prev_breakpoint, breakpoint
from unnest(t.breakpoints) breakpoint
union all
select max(breakpoint), null
from unnest(t.breakpoints) breakpoint
) b
where (point >= prev_breakpoint or prev_breakpoint is null) and
(point <= breakpoint or breakpoint is null)
group by t.id, breakpoint;
这已包含在dbfidde中
这个想法很简单。只需返回每个点的位置并匹配断点即可。然后使用窗口函数定义组
唯一复杂的问题是聚合。您想要两个记录中的断点。所以这需要一些操作。我认为使用lead()
进行数组操作比其他方法更简单
WITH data(id, points, breakpoints) AS (
VALUES (123, ARRAY [6,8,1,3,7,9], ARRAY [7, 1])
, (456, ARRAY [16,9,78,96,33], ARRAY [78])
),
-- we'll map the breakpoints to the indices where they appear in `points` and sort this array
-- so, ARRAY[1, 7] -> ARRAY[3, 5] (the positions of 1 & 7 in `points`, arrays are 1-based)
-- and ARRAY[7, 1] -> ARRAY[3, 5] (since we sort this new 'breakpoint_indices' array)
sorted_breakpoint_indices(id, points, breakpoint_indices, number_of_breakpoints) AS (
SELECT id
, points
, breakpoint_indices
, number_of_breakpoints
FROM data
JOIN LATERAL (
SELECT ARRAY_AGG(array_position(points, breakpoint) ORDER BY array_position(points, breakpoint))
, COUNT(*) -- simply here to avoid multiple `cardinality(breakpoint_indices)` below
FROM unnest(breakpoints) AS breakpoint
) AS f(breakpoint_indices, number_of_breakpoints)
ON true
)
SELECT id
, CASE i
-- first segment, from start to breakpoint #1
WHEN 0 THEN points[:breakpoint_indices[1]]
-- last segment, from last breakpoint to end
WHEN number_of_breakpoints THEN points[breakpoint_indices[number_of_breakpoints]:]
-- default case, bp i to i+1
ELSE points[breakpoint_indices[i]:breakpoint_indices[i+1]]
END
FROM sorted_breakpoint_indices
, generate_series(0, number_of_breakpoints, 1) AS f(i)
返回
+---+----------+
|id |result |
+---+----------+
|123|{6,8,1} |
|123|{1,3,7} |
|123|{7,9} |
|456|{16,9,78} |
|456|{78,96,33}|
+---+----------+
注意:我在写这个答案时写了多个其他版本,通过查看这篇文章的编辑历史可以看到这些版本为什么“8”在第一集中?@GordonLinoff,因为它是关于数组中的位置,而不是元素“值”。(元素实际上是ID本身,它们的“值”是无关的)。也许我应该澄清,元素的顺序很重要,我不能“排序”它们。我将把这一点添加到问题中。@rouen。嗯,我喜欢我的第一个答案,即使是对错误的问题。你总是可以问另一个问题。更严重的是,我针对您的实际问题使用了类似的方法更新了答案。非常有趣的方法:)对于我的示例(bummer),还有一件事情可能不清楚:我不能保证断点的顺序与它们在序列中出现的顺序相同。因此,如果您将第一行中的{1,7}更改为{7,1},它仍然会返回我建议的结果。我会尝试调整你的解决方案来解决这个问题。@rouen:好的,我可能有办法解决这个问题。但有一个问题,断点可以重复吗?即
breakpoints=Array[1,1,7]
,其中1
在点中出现两次(并且在您第一次遇到它时被“消耗”)。不,断点中没有重复项。如果有关系的话,在原始序列中也不可能有重复:)你是SQL beast,伙计!非常感谢。现在来看有趣的部分:在数百万行上对此进行基准测试:)嘿,让你知道:它在真实的数据集上非常有效。在相当普通的服务器上,它大约在5秒钟内拆分700k个序列:)