Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/61.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Mysql 优化从同一表中提取的多个列的查询_Mysql_Sql_Query Optimization - Fatal编程技术网

Mysql 优化从同一表中提取的多个列的查询

Mysql 优化从同一表中提取的多个列的查询,mysql,sql,query-optimization,Mysql,Sql,Query Optimization,这是我们的后续行动 我有两个数据库表(省略了更多的表): 所有可能的id和datetime都被索引 假设我将不更改数据库结构,我需要以这种方式提取数据: 按日期时间分组的行 所选acq.id\u cu-data.id\u meas-data.id\u elab组合对应的每列data.value。(见立柱底部的注释) 如果某些列的数据缺失,但datetime中其他列的数据存在,则允许空单元格 我当前的查询是这样生成的(请参阅): 目前(编辑:今天选中)有390k次采集和920万个数据值(并且还

这是我们的后续行动

我有两个数据库表(省略了更多的表):

所有可能的
id
datetime
都被索引

假设我将更改数据库结构,我需要以这种方式提取数据:

  • 按日期时间分组的行
  • 所选
    acq.id\u cu-data.id\u meas-data.id\u elab
    组合对应的每列
    data.value
    。(见立柱底部的注释)
  • 如果某些列的数据缺失,但datetime中其他列的数据存在,则允许空单元格
我当前的查询是这样生成的(请参阅):

目前(编辑:今天选中)有390k次采集和920万个数据值(并且还在增长),提取一个包含59列的表大约需要10分钟。我知道以前的软件提取数据需要1个小时

感谢您耐心阅读到这里:)


更新 丹尼斯回答后,我尝试了他的改变。和2.,这是新查询的结果:

SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

UNION ALL

SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

UNION ALL

SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

) AS T GROUP BY datetime
毫无疑问,在表演上取得了很大的进步


更新(2) 这是添加点
3。

EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

UNION ALL

SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

UNION ALL

SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

) AS T GROUP BY datetime;
稍微慢一点,这是不是应该受益于大量的coulmns?我试试看


更新(3) 我尝试了使用和不使用
MAX(data.value)。。。按日期时间分组
,在60列查询中,我使用
获得了更好的结果。每次尝试的结果都不一样,这就是其中之一

  • 原始查询9m12.144s
  • 丹尼斯的
    1.
    2.
    4m6.597s
  • 丹尼斯的
    1.
    2.
    3.
    4m0.210s
所需时间减少约57%


更新(4) 我尝试了安迪的解决方案,但它比丹尼斯的优化慢得多

在3个组合/列上测试:

  • 未优化:1m3s
  • 丹尼斯的优化:1.7s
  • 安迪的
    案例
    :9.3s
我还对12个组合/列进行了测试:

  • 未优化:未经测试
  • 丹尼斯优化:3.6s
  • 安德里的
    案例
    :13.7s
此外,Andiry的解决方案还引入了收购日期,即没有任何选定组合的数据,但其他组合的数据存在

IMEngine控制单元1在:00和:30时每30分钟获取一次数据,而控制单元2在:15和:45时:我将使用NULL填充的空行数加倍


注意:

这都是关于一个传感器系统:有几个控制单元(每个
id\u cu
)和许多传感器

单个传感器由一对
id\u-cu/id\u-meas
传感器识别,并为每个测量发送不同的详细说明,例如MIN(
id\u-elab=1
),MAX(
id\u-elab=2
),平均(
id\u-elab=3
),瞬时(
id\u-elab=…
)等,每个
id\u-elab

用户可以自由选择他想要的任何细节,比如:

  • 结果列so
    id\u cu=1/id\u meas=3/id\u elab=3的控制单元1的传感器3的平均值(3)
  • 结果列so
    id\u cu=1/id\u meas=5/id\u elab=3的控制单元1的传感器5的平均值(3)
  • 另一列so
    id\u cu=4/id\u meas=2/id\u elab=1的控制单元4的传感器2的最小值(1)
  • (放入任何有效的
    id\u cu、id\u meas、id\u elab
    组合)
等等,多达数十种选择

以下是部分DDL(不包括无关表):


主要有三个问题:

  • 使用union all,而不是union。您正在分组并获取最小/最大值,因此引入排序步骤来删除重复行没有意义

  • where子句可以放在每个union子语句中:

    select ...
    from (
    select ... from ...  where ...
    union all
    select ... from ...  where ...
    union all
    ...
    )
    group by ...
    
    按照您编写它的方式,它从获取所有行开始,附加所有行,最后过滤您需要的行。在union子语句中注入where子句将使它只获取所需的行,最后将它们全部追加

  • 按照相同的路线,对骨料进行预骨料:

    select ..., max(foo) as foo
    from (
    select ..., max(foo) as foo from ...  where ... group by ...
    union all
    select ..., max(foo) as foo from ...  where ... group by ...
    union all
    ...
    )
    group by ...
    
    优化器将更好地利用现有索引,最终只追加几行,而不是数百万行


  • 基本上,我认为通过单选和处理条件的案例,您将获得更好的结果。无论如何,你可能想要基准测试和比较

    SELECT acq.datetime AS datetime, 
           MAX(
               CASE acq.id_cu
               WHEN 1 THEN data.value
               END 
           ) as v1,
           MAX(
               CASE acq.id_cu
               WHEN 4 THEN data.value
               END 
           ) as v2,
           MAX(
               CASE acq.id_cu
               WHEN 8 THEN data.value
               END 
           ) as v3
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_meas = 1 AND data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59"
    
    这应该做一个干净的范围扫描。 此外,使用复合索引还可以做更多的工作

    最后,例如,使用groupby有什么问题吗

    SELECT data.id_means, acq.datetime AS datetime, MAX(data.value)
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
           data.id_means IN (1,4,8)
    GROUP BY
           data.id_means
    
    这是最简单的形式(也是最灵活的),即使您没有将行转换为列(对于
    data.id\u meas
    的不同值)。但这将让您最好地了解期望的性能以及哪些索引对查询最有用

    编辑: 要获得*acq.id\u cu-data.id\u meas-data.id\u elab组合*的最大data.value,您应该能够简单地使用

    SELECT 
           acq.id_cu, data.id_meas, data.id_elab, acq.datetime AS datetime, MAX(data.value)
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
           data.id_means IN (1,4,8)
    GROUP BY
           acq.id_cu, data.id_meas, data.id_elab, acq.datetime
    
    将为
    acq.id\u cu、data.id\u meas、data.id\u elab、acq.datetime
    的所有组合提供max(data.value)(使用where中的值过滤后-调整where会影响结果)。 这不会为没有行的组合显示空值,但如果这是正确的方向,则有一个解决方法。 GROUP BY也决定顺序,因此更改GROUP BY中列的顺序

    如果我的答案仍然没有抓住要点,那么一些示例数据/测试用例将是有用的

    在你的例子中,令人困惑的部分是当你说

    每一列对应的data.value 对于选定的acq.id\u cu-data.id\u meas -data.id_elab组合

    但是,当您在示例查询中选择数据时,您可以使用
    CREATE TABLE acquisitions (
        id INTEGER NOT NULL AUTO_INCREMENT, 
        id_cu INTEGER NOT NULL, 
        datetime DATETIME NOT NULL, 
        PRIMARY KEY (id), 
        UNIQUE (id_cu, datetime), 
        FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
    )
    
    CREATE TABLE data (
        id INTEGER NOT NULL AUTO_INCREMENT, 
        id_acq INTEGER NOT NULL, 
        id_meas INTEGER NOT NULL, 
        id_elab INTEGER NOT NULL, 
        value FLOAT, 
        PRIMARY KEY (id), 
        FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
    )
    
    CREATE TABLE ctrl_units (
        id INTEGER NOT NULL, 
        name VARCHAR(40) NOT NULL, 
        PRIMARY KEY (id)
    )
    
    CREATE TABLE sensors (
        id_cu INTEGER NOT NULL, 
        id_meas INTEGER NOT NULL, 
        id_elab INTEGER NOT NULL, 
        name VARCHAR(40) NOT NULL, 
        `desc` VARCHAR(80), 
        PRIMARY KEY (id_cu, id_meas), 
        FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
    )
    
    select ...
    from (
    select ... from ...  where ...
    union all
    select ... from ...  where ...
    union all
    ...
    )
    group by ...
    
    select ..., max(foo) as foo
    from (
    select ..., max(foo) as foo from ...  where ... group by ...
    union all
    select ..., max(foo) as foo from ...  where ... group by ...
    union all
    ...
    )
    group by ...
    
    SELECT acq.datetime AS datetime, 
           MAX(
               CASE acq.id_cu
               WHEN 1 THEN data.value
               END 
           ) as v1,
           MAX(
               CASE acq.id_cu
               WHEN 4 THEN data.value
               END 
           ) as v2,
           MAX(
               CASE acq.id_cu
               WHEN 8 THEN data.value
               END 
           ) as v3
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_meas = 1 AND data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59"
    
    SELECT data.id_means, acq.datetime AS datetime, MAX(data.value)
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
           data.id_means IN (1,4,8)
    GROUP BY
           data.id_means
    
    SELECT 
           acq.id_cu, data.id_meas, data.id_elab, acq.datetime AS datetime, MAX(data.value)
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
           data.id_means IN (1,4,8)
    GROUP BY
           acq.id_cu, data.id_meas, data.id_elab, acq.datetime
    
    SELECT
      acq.datetime,
      MAX(CASE WHEN acq.id_cu = 2 AND data.id_meas = 2 AND data.id_elab = 1 THEN data.value END) AS v1,
      MAX(CASE WHEN acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6 THEN data.value END) AS v2,
      MAX(CASE WHEN acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8 THEN data.value END) AS v3
    FROM acq
      INNER JOIN data acq.id = data.id_acq
    WHERE datetime >= 2011-03-01 00:00:00 AND datetime <= 2011-04-30 23:59:59
    GROUP BY acq.datetime