Sql 将带有数组的行拆分为带有子数组的多行

Sql 将带有数组的行拆分为带有子数组的多行,sql,postgresql,Sql,Postgresql,假设我有一张这样的桌子: +-----+-----------------+-------------+ | ID | Points | BreakPoints | +-----+-----------------+-------------+ | 123 | {6,8,1,3,7,9} | {1,7} | | 456 | {16,9,78,96,33} | {78} | +-----+-----------------+------------

假设我有一张这样的桌子:

+-----+-----------------+-------------+
| ID  |     Points      | BreakPoints |
+-----+-----------------+-------------+
| 123 | {6,8,1,3,7,9}   | {1,7}       |
| 456 | {16,9,78,96,33} | {78}        |
+-----+-----------------+-------------+

+-----+------------+
| ID  |   Points   |
+-----+------------+
| 123 | {6,8,1}    |
| 123 | {1,3,7}    |
| 123 | {7,9}      |
| 456 | {16,9,78}  |
| 456 | {78,96,33} |
+-----+------------+
我想在保留原始行的
ID
的同时,在
断点
中包含的点上“打断”这些
序列元素的顺序很重要,因此我无法对它们进行排序

还请注意,断点位于两个结果行中,这两个结果行都来自于在该断点处打断原始序列(分别位于端点和起点)。所以结果应该是这样的:

+-----+-----------------+-------------+
| ID  |     Points      | BreakPoints |
+-----+-----------------+-------------+
| 123 | {6,8,1,3,7,9}   | {1,7}       |
| 456 | {16,9,78,96,33} | {78}        |
+-----+-----------------+-------------+

+-----+------------+
| ID  |   Points   |
+-----+------------+
| 123 | {6,8,1}    |
| 123 | {1,3,7}    |
| 123 | {7,9}      |
| 456 | {16,9,78}  |
| 456 | {78,96,33} |
+-----+------------+

当然,我可以编写PL/pgSQL函数,为每一行调用它,迭代数组,并为每个子序列返回下一个。但是有没有其他方法,不调用所有行的函数?

我想这就是您想要的:

select t.id,
       array_agg(point order by point)
from t cross join
     unnest(points) point cross join lateral
     (select lag(breakpoint) over (order by breakpoint) as prev_breakpoint, breakpoint
      from unnest(t.breakpoints) breakpoint
      union all
      select max(breakpoint), null
      from unnest(t.breakpoints) breakpoint
     ) b
where (point >= prev_breakpoint or prev_breakpoint is null) and
      (point <= breakpoint or breakpoint is null)
group by t.id, breakpoint;
这已包含在dbfidde中

这个想法很简单。只需返回每个点的位置并匹配断点即可。然后使用窗口函数定义组

唯一复杂的问题是聚合。您想要两个记录中的断点。所以这需要一些操作。我认为使用
lead()
进行数组操作比其他方法更简单

WITH data(id, points, breakpoints) AS (
    VALUES (123, ARRAY [6,8,1,3,7,9], ARRAY [7, 1])
         , (456, ARRAY [16,9,78,96,33], ARRAY [78])
),
-- we'll map the breakpoints to the indices where they appear in `points` and sort this array
-- so, ARRAY[1, 7] -> ARRAY[3, 5] (the positions of 1 & 7 in `points`, arrays are 1-based)
-- and ARRAY[7, 1] -> ARRAY[3, 5] (since we sort this new 'breakpoint_indices' array)
sorted_breakpoint_indices(id, points, breakpoint_indices, number_of_breakpoints) AS (
    SELECT id
         , points
         , breakpoint_indices
         , number_of_breakpoints
    FROM data
    JOIN LATERAL (
        SELECT ARRAY_AGG(array_position(points, breakpoint) ORDER BY array_position(points, breakpoint))
             , COUNT(*) -- simply here to avoid multiple `cardinality(breakpoint_indices)` below
        FROM unnest(breakpoints) AS breakpoint
    ) AS f(breakpoint_indices, number_of_breakpoints)
    ON true
)
SELECT id
     , CASE i
         -- first segment, from start to breakpoint #1
         WHEN 0 THEN points[:breakpoint_indices[1]]
         -- last segment, from last breakpoint to end
         WHEN number_of_breakpoints THEN points[breakpoint_indices[number_of_breakpoints]:]
         -- default case, bp i to i+1
         ELSE points[breakpoint_indices[i]:breakpoint_indices[i+1]]
       END
FROM sorted_breakpoint_indices
   , generate_series(0, number_of_breakpoints, 1) AS f(i)
返回

+---+----------+
|id |result    |
+---+----------+
|123|{6,8,1}   |
|123|{1,3,7}   |
|123|{7,9}     |
|456|{16,9,78} |
|456|{78,96,33}|
+---+----------+


注意:我在写这个答案时写了多个其他版本,通过查看这篇文章的编辑历史可以看到这些版本

为什么“8”在第一集中?@GordonLinoff,因为它是关于数组中的位置,而不是元素“值”。(元素实际上是ID本身,它们的“值”是无关的)。也许我应该澄清,元素的顺序很重要,我不能“排序”它们。我将把这一点添加到问题中。@rouen。嗯,我喜欢我的第一个答案,即使是对错误的问题。你总是可以问另一个问题。更严重的是,我针对您的实际问题使用了类似的方法更新了答案。非常有趣的方法:)对于我的示例(bummer),还有一件事情可能不清楚:我不能保证断点的顺序与它们在序列中出现的顺序相同。因此,如果您将第一行中的{1,7}更改为{7,1},它仍然会返回我建议的结果。我会尝试调整你的解决方案来解决这个问题。@rouen:好的,我可能有办法解决这个问题。但有一个问题,断点可以重复吗?即
breakpoints=Array[1,1,7]
,其中
1
点中出现两次(并且在您第一次遇到它时被“消耗”)。不,断点中没有重复项。如果有关系的话,在原始序列中也不可能有重复:)你是SQL beast,伙计!非常感谢。现在来看有趣的部分:在数百万行上对此进行基准测试:)嘿,让你知道:它在真实的数据集上非常有效。在相当普通的服务器上,它大约在5秒钟内拆分700k个序列:)