Mysql 此子查询是否可以使用索引?

Mysql 此子查询是否可以使用索引?,mysql,mysql-5.6,Mysql,Mysql 5.6,首先,提前为文字墙道歉。我确实通读了我能找到的每一个类似的问题/答案,但要么答案似乎不适用于我的问题,要么我需要更清楚地理解潜在的问题和解决方案 我有一个文件大小表以及相关的文件日期和观察时间戳。所有日期均为UNIX历元时间整数(以秒为单位): mysql> describe name_servers; +-----------------------+------------------+------+-----+---------+----------------+ | Field

首先,提前为文字墙道歉。我确实通读了我能找到的每一个类似的问题/答案,但要么答案似乎不适用于我的问题,要么我需要更清楚地理解潜在的问题和解决方案

我有一个文件大小表以及相关的文件日期和观察时间戳。所有日期均为UNIX历元时间整数(以秒为单位):

mysql> describe name_servers;
+-----------------------+------------------+------+-----+---------+----------------+
| Field                 | Type             | Null | Key | Default | Extra          |
+-----------------------+------------------+------+-----+---------+----------------+
| server_name           | varchar(255)     | YES  |     | NULL    |                |
| file_date             | int(10) unsigned | YES  |     | NULL    |                |
| file_size             | int(10) unsigned | YES  |     | NULL    |                |
| time                  | int(10) unsigned | YES  | MUL | NULL    |                |
| poll_id               | int(11)          | NO   | PRI | NULL    | auto_increment |
+-----------------------+------------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)


mysql> show index from name_servers;
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name                 | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| name_servers |          0 | PRIMARY                  |            1 | poll_id     | A         |     3523218 |     NULL | NULL   |      | BTREE      |         |               |
| name_servers |          0 | index_time_servername    |            1 | time        | A         |      503316 |     NULL | NULL   | YES  | BTREE      |         |               |
| name_servers |          0 | index_time_servername    |            2 | server_name | A         |     3523218 |     NULL | NULL   | YES  | BTREE      |         |               |
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
我必须跟踪文件大小的变化,以检测文件是否在任何48小时内收缩>20%。通常我会尝试使用MySQL窗口函数来实现这一点,但我的服务器上的MySQL版本不支持这些功能(5.6.37——我无法控制,因为服务器不是由我的团队管理的)。目前,我通过查找当前行中文件大小的外部查询和查找前48小时(172800秒)中最大文件大小的内部子查询来获取当前大小和最大大小(在过去48小时内):

mysql> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (name_servers_outside.time - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
    -> limit 10;
+-------------------+-------------------+-------------------+------------+-----------------------+
| server_name       | file_size         | file_date         | time       | max_file_size         |
+-------------------+-------------------+-------------------+------------+-----------------------+
| example_server    |           1159544 |        1550382945 | 1550382985 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383195 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383255 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383316 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383376 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383435 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383496 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383555 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383616 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383676 |               1159580 |
+-------------------+-------------------+-------------------+------------+-----------------------+
10 rows in set (16.11 sec)
仅检索这10行需要16秒,在生产环境中,此查询必须检索150多行。内部查询正在对所有300多万个表行进行完整扫描,并显示消息“为每个记录检查范围(索引映射:0x2)”:

mysql> explain
    -> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (name_servers_outside.time - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800);
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
| id | select_type        | table                | type  | possible_keys            | key                      | key_len | ref  | rows    | Extra                                          |
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
|  1 | PRIMARY            | name_servers_outside | range | index_time_servername    | index_time_servername    | 5       | NULL |   47302 | Using index condition; Using MRR               |
|  2 | DEPENDENT SUBQUERY | name_servers         | ALL   | index_time_servername    | NULL                     | NULL    | NULL | 3533883 | Range checked for each record (index map: 0x2) |
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
2 rows in set (0.01 sec)
问题似乎在于:

time > (name_servers_outside.time - 172800)
如果我使用静态整数值而不是子查询中的“name\u servers\u outside.time”列引用运行类似查询,则索引将按预期使用,并且查询速度很快:

time > (UNIX_TIMESTAMP() - 172800)
修改后的查询:

mysql> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (UNIX_TIMESTAMP() - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
    -> limit 10;
+--------------------+-------------------+-------------------+------------+-----------------------+
| server_name        | file_size         | file_date         | time       | max_file_size         |
+--------------------+-------------------+-------------------+------------+-----------------------+
| example_server     |           1159544 |        1550382945 | 1550382985 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383195 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383255 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383316 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383376 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383435 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383496 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383555 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383616 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383676 |               1159580 |
+--------------------+-------------------+-------------------+------------+-----------------------+
10 rows in set (0.01 sec)


mysql> explain
    -> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (UNIX_TIMESTAMP() - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
    -> limit 10;
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
| id | select_type | table                | type  | possible_keys            | key                      | key_len | ref  | rows  | Extra                            |
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
|  1 | PRIMARY     | name_servers_outside | range | index_time_servername    | index_time_servername    | 5       | NULL | 49042 | Using index condition; Using MRR |
|  2 | SUBQUERY    | name_servers         | range | index_time_servername    | index_time_servername    | 5       | NULL | 49042 | Using index condition; Using MRR |
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
2 rows in set (0.00 sec)
谢谢你和我一起读到现在。我再次为巨大的文本墙道歉,但我想确保我包含了足够的解释细节,以清楚地定义问题

现在,我试图解决的问题是,我需要检索每行之前48小时内文件大小的最大值。因此,每一行都有自己独特的“最大(文件大小)”计算时间范围。然后,这将用于计算文件大小更改的百分比。如上所述,我通常希望为此使用窗口函数,但我的MySQL版本(5.6.37)不支持这些函数,而且我无法更新到8.0,因为我没有这个服务器


如往常一样,任何建议都将不胜感激。谢谢你的阅读

我会首先尝试将文件大小添加到索引\u time\u servername索引中,但我怀疑真正的问题是,您必须在子查询中使用name\u servers\u outside.time,因为它来自不同的别名,这可能会让查询计划器感到困惑

那么,如果时间介于时间和时间(48小时前)之间,如何丢失子查询并将表连接到自身

类似于

SELECT
  name_servers_outside.server_name,
  name_servers_outside.file_size,
  name_servers_outside.file_date,
  name_servers_outside.time,
  MAX(previous.file_size) AS max_file_size
FROM
   name_servers AS ns
INNER JOIN name_servers AS previous 
   ON previous.time BETWEEN (ns.time - 172800) AND (ns.time - 1)
WHERE 
   ns.server_name = 'example_server'
   AND ns.time > (UNIX_TIMESTAMP() - 172800)
GROUP BY
   ns.server_name,
   ns.file_size,
   ns.file_date,
   ns.time
LIMIT 10;

我为延迟回复道歉;解决方案最终涉及多个组件,并且需要花时间来完成和测试它们

我试图解决的主要问题是查询性能。严格地说,我最初的查询返回了预期的数据,但它花费了很长时间才完成,因此不实用。因此,解决方案就是找到尽可能多的方法来减少执行时间

以下是解决方案的最终结果:

  • 根据Dazz Knowles的建议,我将子查询替换为一个内部联接,这清理了代码并使其更易于理解
  • 正如Progman所建议的,我在“服务器名称”字段中将索引更改为单列索引
  • 我将此查询中涉及的行移动到它们自己的表中,从而简化了列的工作集
  • 我将应用程序向表中写入行的采样率从每分钟1个数据点(1行)降低到每小时1个数据点(1行),从而将行的工作集减少到以前数量的1/60。1-4的综合作用使查询执行时间从几分钟缩短到几毫秒
  • 我之前尝试在运行时计算“max_file_size”,应用程序客户端同时向MySQL服务器提交约100个不同服务器和每个服务器上3个不同文件的查询(每次应用程序刷新时,约300个查询实例运行)。这使得MySQL服务器的CPU保持在100%,因此不适合实际使用,特别是当多个最终用户同时使用客户端应用程序时。我改为仅从服务器端脚本运行查询,并且仅在插入新行时运行查询。因此,查询每小时运行一次,在几毫秒内计算约300个max_file_size值。然后将max_file_size作为静态列写入MySQL表。max_file_size所依赖的值都不应该更改,因此我不担心在为特定行写入max_file_size后需要再次运行查询来更新它。应用程序的客户端现在只从MySQL读取数据;它不再尝试发送查询以计算最大文件大小。事后诸葛亮,这种方法似乎从一开始就显而易见,但有时你必须先做错事,才能理解什么是正确的方法

  • 欢迎来到SO,不要为文字墙道歉。这通常是一个很好的格式化问题,包含解决该问题所需的所有细节。证明:你得到了投票。当你尝试添加额外的单列索引时,比如在
    time
    或者甚至在
    file\u size
    列上,你会得到不同的性能吗?或者在列序列上有一些奇怪的索引
    (服务器名称、文件大小)
    ?你能为它创建一个游乐场吗?在或中选择。使用CREATETABLE语句,插入足够的行以进行适当的测试。然后,我们可以轻松地测试我们想要的任何东西。