Mysql 优化从同一表中提取的多个列的查询
这是我们的后续行动 我有两个数据库表(省略了更多的表): 所有可能的Mysql 优化从同一表中提取的多个列的查询,mysql,sql,query-optimization,Mysql,Sql,Query Optimization,这是我们的后续行动 我有两个数据库表(省略了更多的表): 所有可能的id和datetime都被索引 假设我将不更改数据库结构,我需要以这种方式提取数据: 按日期时间分组的行 所选acq.id\u cu-data.id\u meas-data.id\u elab组合对应的每列data.value。(见立柱底部的注释) 如果某些列的数据缺失,但datetime中其他列的数据存在,则允许空单元格 我当前的查询是这样生成的(请参阅): 目前(编辑:今天选中)有390k次采集和920万个数据值(并且还
id
和datetime
都被索引
假设我将不更改数据库结构,我需要以这种方式提取数据:
- 按日期时间分组的行
- 所选
组合对应的每列acq.id\u cu-data.id\u meas-data.id\u elab
。(见立柱底部的注释)data.value
- 如果某些列的数据缺失,但datetime中其他列的数据存在,则允许空单元格
更新 丹尼斯回答后,我尝试了他的改变。和2.,这是新查询的结果:
SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
) AS T GROUP BY datetime
毫无疑问,在表演上取得了很大的进步
更新(2) 这是添加点
3。
EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
) AS T GROUP BY datetime;
稍微慢一点,这是不是应该受益于大量的coulmns?我试试看
更新(3) 我尝试了使用和不使用
MAX(data.value)。。。按日期时间分组
,在60列查询中,我使用获得了更好的结果。每次尝试的结果都不一样,这就是其中之一
- 原始查询9m12.144s
- 丹尼斯的
和1.
4m6.597s2.
- 丹尼斯的
,1.
和2.
4m0.210s3.
更新(4) 我尝试了安迪的解决方案,但它比丹尼斯的优化慢得多 在3个组合/列上测试:
- 未优化:1m3s
- 丹尼斯的优化:1.7s
- 安迪的
:9.3s案例
- 未优化:未经测试
- 丹尼斯优化:3.6s
- 安德里的
:13.7s案例
注意: 这都是关于一个传感器系统:有几个控制单元(每个
id\u cu
)和许多传感器
单个传感器由一对id\u-cu/id\u-meas
传感器识别,并为每个测量发送不同的详细说明,例如MIN(id\u-elab=1
),MAX(id\u-elab=2
),平均(id\u-elab=3
),瞬时(id\u-elab=…
)等,每个id\u-elab
)
用户可以自由选择他想要的任何细节,比如:
- 结果列so
id\u cu=1/id\u meas=3/id\u elab=3的控制单元1的传感器3的平均值(3)
- 结果列so
id\u cu=1/id\u meas=5/id\u elab=3的控制单元1的传感器5的平均值(3)
- 另一列so
id\u cu=4/id\u meas=2/id\u elab=1的控制单元4的传感器2的最小值(1)
- (放入任何有效的
组合)id\u cu、id\u meas、id\u elab
主要有三个问题:
select ...
from (
select ... from ... where ...
union all
select ... from ... where ...
union all
...
)
group by ...
按照您编写它的方式,它从获取所有行开始,附加所有行,最后过滤您需要的行。在union子语句中注入where子句将使它只获取所需的行,最后将它们全部追加select ..., max(foo) as foo
from (
select ..., max(foo) as foo from ... where ... group by ...
union all
select ..., max(foo) as foo from ... where ... group by ...
union all
...
)
group by ...
优化器将更好地利用现有索引,最终只追加几行,而不是数百万行基本上,我认为通过单选和处理条件的案例,您将获得更好的结果。无论如何,你可能想要基准测试和比较
SELECT acq.datetime AS datetime,
MAX(
CASE acq.id_cu
WHEN 1 THEN data.value
END
) as v1,
MAX(
CASE acq.id_cu
WHEN 4 THEN data.value
END
) as v2,
MAX(
CASE acq.id_cu
WHEN 8 THEN data.value
END
) as v3
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_meas = 1 AND data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59"
这应该做一个干净的范围扫描。
此外,使用复合索引还可以做更多的工作
最后,例如,使用groupby有什么问题吗
SELECT data.id_means, acq.datetime AS datetime, MAX(data.value)
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
data.id_means IN (1,4,8)
GROUP BY
data.id_means
这是最简单的形式(也是最灵活的),即使您没有将行转换为列(对于data.id\u meas
的不同值)。但这将让您最好地了解期望的性能以及哪些索引对查询最有用
编辑:
要获得*acq.id\u cu-data.id\u meas-data.id\u elab组合*的最大data.value,您应该能够简单地使用
SELECT
acq.id_cu, data.id_meas, data.id_elab, acq.datetime AS datetime, MAX(data.value)
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
data.id_means IN (1,4,8)
GROUP BY
acq.id_cu, data.id_meas, data.id_elab, acq.datetime
将为acq.id\u cu、data.id\u meas、data.id\u elab、acq.datetime
的所有组合提供max(data.value)(使用where中的值过滤后-调整where会影响结果)。
这不会为没有行的组合显示空值,但如果这是正确的方向,则有一个解决方法。
GROUP BY也决定顺序,因此更改GROUP BY中列的顺序
如果我的答案仍然没有抓住要点,那么一些示例数据/测试用例将是有用的
在你的例子中,令人困惑的部分是当你说
每一列对应的data.value
对于选定的acq.id\u cu-data.id\u meas
-data.id_elab组合
但是,当您在示例查询中选择数据时,您可以使用
CREATE TABLE acquisitions (
id INTEGER NOT NULL AUTO_INCREMENT,
id_cu INTEGER NOT NULL,
datetime DATETIME NOT NULL,
PRIMARY KEY (id),
UNIQUE (id_cu, datetime),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
CREATE TABLE data (
id INTEGER NOT NULL AUTO_INCREMENT,
id_acq INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
value FLOAT,
PRIMARY KEY (id),
FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
)
CREATE TABLE ctrl_units (
id INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE sensors (
id_cu INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
`desc` VARCHAR(80),
PRIMARY KEY (id_cu, id_meas),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
select ...
from (
select ... from ... where ...
union all
select ... from ... where ...
union all
...
)
group by ...
select ..., max(foo) as foo
from (
select ..., max(foo) as foo from ... where ... group by ...
union all
select ..., max(foo) as foo from ... where ... group by ...
union all
...
)
group by ...
SELECT acq.datetime AS datetime,
MAX(
CASE acq.id_cu
WHEN 1 THEN data.value
END
) as v1,
MAX(
CASE acq.id_cu
WHEN 4 THEN data.value
END
) as v2,
MAX(
CASE acq.id_cu
WHEN 8 THEN data.value
END
) as v3
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_meas = 1 AND data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59"
SELECT data.id_means, acq.datetime AS datetime, MAX(data.value)
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
data.id_means IN (1,4,8)
GROUP BY
data.id_means
SELECT
acq.id_cu, data.id_meas, data.id_elab, acq.datetime AS datetime, MAX(data.value)
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
data.id_means IN (1,4,8)
GROUP BY
acq.id_cu, data.id_meas, data.id_elab, acq.datetime
SELECT
acq.datetime,
MAX(CASE WHEN acq.id_cu = 2 AND data.id_meas = 2 AND data.id_elab = 1 THEN data.value END) AS v1,
MAX(CASE WHEN acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6 THEN data.value END) AS v2,
MAX(CASE WHEN acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8 THEN data.value END) AS v3
FROM acq
INNER JOIN data acq.id = data.id_acq
WHERE datetime >= 2011-03-01 00:00:00 AND datetime <= 2011-04-30 23:59:59
GROUP BY acq.datetime