SQL中的高效（线性时间）嵌套查询_Sql_Mariadb_Pivot_Query Optimization_Gaps And Islands

SQL中的高效（线性时间）嵌套查询

sql mariadb

SQL中的高效（线性时间）嵌套查询,sql,mariadb,pivot,query-optimization,gaps-and-islands,Sql,Mariadb,Pivot,Query Optimization,Gaps And Islands,从该表中：事件身份证件活动日期比赛成绩 12 2020-04-10 13 2020-04-11 13 2020-04-14 8. 13 2020-04-13 6. 12 2020-04-15 14 2020-04-16 14 2020-04-17 14 2020-04-18 11 14 2020-04-19 14 2020-04-20 14 2020-04-22 12 2020-04-25 14 2020-04-30 如果您运行的是MariaDB 10.2.2或更高版本，您可以将其视为一

从该表中：

事件身份证件活动日期比赛成绩 12 2020-04-10 13 2020-04-11 13 2020-04-14 8. 13 2020-04-13 6. 12 2020-04-15 14 2020-04-16 14 2020-04-17 14 2020-04-18 11 14 2020-04-19 14 2020-04-20 14 2020-04-22 12 2020-04-25 14 2020-04-30

如果您运行的是MariaDB 10.2.2或更高版本，您可以将其视为一个缺口和孤岛问题。这样做的目的是统计前面和后面的行中出现了多少非空值。然后，我们可以使用条件聚合在两个方向上过滤第一个非空值：

select id,
    max(case when grp_asc  = 1 then event_score end) as first_score,
    max(case when grp_desc = 1 then event_score end) as last_score
from (
    select e.*,
        count(event_score) over(partition by id order by event_score     ) as grp_asc,
        count(event_score) over(partition by id order by event_score desc) as grp_desc
    from events e
) e
group by id
order by id

我无法评估此算法的时间复杂度，但我怀疑它应该比原始查询运行得更快，因为原始查询要求每个distinct

id

执行两个子查询

：

id | first_score | last_score -: | ----------: | ---------: 12 | null | null 13 | 6 | 8 14 | 11 | 11 id |第一个|最后一个|分数 -: | ----------: | ---------: 12 |零|零 13 | 6 | 8 14 | 11 | 11

select id,
    max(case when grp_asc  = 1 then event_score end) as first_score,
    max(case when grp_desc = 1 then event_score end) as last_score
from (
    select e.*,
        count(event_score) over(partition by id order by event_score     ) as grp_asc,
        count(event_score) over(partition by id order by event_score desc) as grp_desc
    from events e
) e
group by id
order by id

我无法评估此算法的时间复杂度，但我怀疑它应该比原始查询运行得更快，因为原始查询要求每个distinct

id

执行两个子查询

：

id | first_score | last_score -: | ----------: | ---------: 12 | null | null 13 | 6 | 8 14 | 11 | 11 id |第一个|最后一个|分数 -: | ----------: | ---------: 12 |零|零 13 | 6 | 8 14 | 11 | 11 如果在

（id、event\u date、event\u sore）

上有一个索引，那么这应该非常快：

SELECT id,
       (SELECT event_score
        FROM events AS subquery
        WHERE final_table.id = subquery.id AND event_score IS NOT NULL
        ORDER BY event_date
        LIMIT 1
       ) AS `first score`,
       (SELECT event_score
        FROM events AS subquery
        WHERE final_table.id=subquery.id AND event_score IS NOT NULL
        ORDER BY event_date DESC
        LIMIT 1
       ) AS `last score`
FROM (SELECT DISTINCT e.id
      FROM sensors.events e
     ) as final_table;

请注意，这会将

选择DISTINCT

移动到子查询。这是为了确保MariaDB不会对

选择distinct

实际使用“distinct”算法——其他列可能会导致这种情况发生

但是，这是O（n log n），因为子查询需要对每个

id

的少量数据进行排序，并使用索引找到正确的位置

我想不出一种在SQL中实现O（n）的方法。我非常确定以下构造都是O（n log n）：

为每一行使用索引
对数据的任何部分进行排序
使用任何带有order by的窗口函数——尽管如果索引正确，这可能是正确的

但是，SQL查询仍然很快，尤其是索引。

如果索引位于

（id、event\u date、event\u sore）

，那么这应该很快：

SELECT id,
       (SELECT event_score
        FROM events AS subquery
        WHERE final_table.id = subquery.id AND event_score IS NOT NULL
        ORDER BY event_date
        LIMIT 1
       ) AS `first score`,
       (SELECT event_score
        FROM events AS subquery
        WHERE final_table.id=subquery.id AND event_score IS NOT NULL
        ORDER BY event_date DESC
        LIMIT 1
       ) AS `last score`
FROM (SELECT DISTINCT e.id
      FROM sensors.events e
     ) as final_table;

请注意，这会将

选择DISTINCT

移动到子查询。这是为了确保MariaDB不会对

选择distinct

实际使用“distinct”算法——其他列可能会导致这种情况发生

但是，这是O（n log n），因为子查询需要对每个

id

的少量数据进行排序，并使用索引找到正确的位置

我想不出一种在SQL中实现O（n）的方法。我非常确定以下构造都是O（n log n）：

为每一行使用索引
对数据的任何部分进行排序
使用任何带有order by的窗口函数——尽管如果索引正确，这可能是正确的

但是，SQL查询仍然很快，尤其是索引。

是否有每个

id

都有一行的表？首先检查您的猜测，使用

EXPLAIN

@HyperActive查看执行计划。除非数据已经排序，否则在Python中需要考虑排序时间。如果已排序，则它与SQL表（定义未排序）不对应。是否有每个

id

都有一行的表？首先检查您的猜测，使用

EXPLAIN

@HyperActive查看执行计划。除非数据已经排序，否则在Python中需要考虑排序时间。如果已排序，则它不对应于SQL表（定义未排序）。