使用GROUPBY的MySQL查询速度非常慢_Mysql_Aggregate Functions_Query Performance

使用GROUPBY的MySQL查询速度非常慢

mysql

使用GROUPBY的MySQL查询速度非常慢,mysql,aggregate-functions,query-performance,Mysql,Aggregate Functions,Query Performance,我有一个使用以下模式的数据库： CREATE TABLE IF NOT EXISTS `sessions` ( `starttime` datetime NOT NULL, `ip` varchar(15) NOT NULL default '', `country_name` varchar(45) default '', `country_iso_code` varchar(2) default '', `org` varchar(128) default '', K

我有一个使用以下模式的数据库：

CREATE TABLE IF NOT EXISTS `sessions` (
  `starttime` datetime NOT NULL,
  `ip` varchar(15) NOT NULL default '',
  `country_name` varchar(45) default '',
  `country_iso_code` varchar(2) default '',
  `org` varchar(128) default '',
  KEY (`ip`),
  KEY (`starttime`),
  KEY (`country_name`)
);

（实际的表包含更多的列；我只包含了我查询的列。）引擎是InnoDB

如您所见，有3个索引-在

ip

、

starttime

和

country\u name

这个表非常大，大约包含150万行。我正在对它进行各种查询，试图提取一个月的信息（2018年8月，在下面的示例中）

像这样的问题

SELECT
  UNIX_TIMESTAMP(starttime) as time_sec,
  country_iso_code AS metric,
  COUNT(country_iso_code) AS value
FROM
  sessions
WHERE
  starttime >= FROM_UNIXTIME(1533070800) AND
  starttime <= FROM_UNIXTIME(1535749199)
GROUP BY metric;

SELECT
  country_name AS Country,
  COUNT(country_name) AS Attacks
FROM
  sessions
WHERE
  starttime >= FROM_UNIXTIME(1533070800) AND
  starttime <= FROM_UNIXTIME(1535749199)
GROUP BY Country;

速度慢得让人无法忍受——我让它运行了大约半个小时，然后放弃了，没有得到任何结果

解释的结果

：

+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys                      | key          | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,starttime_2,country_name | country_name | 138     | NULL | 14771687 |    35.81 | Using where |
+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+

+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys            | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,ip,starttime_2 | ip   | 47      | NULL | 14771780 |    35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+

+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys             | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,starttime_2,org | org  | 387     | NULL | 14771800 |    35.81 | Using where |
+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+

+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys            | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,ip,starttime_2 | ip   | 47      | NULL | 14771914 |    35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+

到底是什么问题？我应该索引其他内容吗？也许是（

starttime

，

country\u name

）的综合指数？我读过，但也许我误解了

以下是一些其他查询，它们的速度同样缓慢，可能也存在相同的问题：

问题2：

问题3：

问题4：

一般来说，表单的查询

  SELECT column, COUNT(column)
    FROM tbl
   WHERE datestamp >= a AND datestamp <= b
   GROUP BY column

为您创建索引

注意两件事。第一：单列索引不一定有助于聚合查询性能

第二：在没有看到整个查询的情况下，很难猜出用于进行索引扫描的正确索引。简化的查询通常会导致索引过于简化。

更好的是

请注意，您没有

主键

；那太淘气了。拥有PK并不会从本质上提高性能，但让PK从

starttime

开始会提高性能。让我们这样做：

CREATE TABLE IF NOT EXISTS `sessions` (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,   -- note
  `starttime` datetime NOT NULL,
  `ip` varchar(39) NOT NULL CHARACTER SET ascii default '',  -- note
  `country_name` varchar(45) default '',
  `country_iso_code` char(2) CHARACTER SET ascii  default '',  -- note
  `org` varchar(128) default '',
  PRIMARY KEY(starttime, id)  -- in this order
  INDEX(id)                   -- to keep AUTO_INCREMENT happy
  -- The rest are unnecessary for the queries in question:
  KEY (`ip`),
  KEY (`starttime`),
  KEY (`country_name`)
) ENGINE=InnoDB;        -- just in case you are accidentally getting MyISAM

为什么?？这将利用PK与数据的“集群”。这样，只扫描时间范围内的表的一部分。而且索引和数据之间不会出现反弹。您不需要很多索引来有效地处理所有情况

IPv6最多需要39个字节。请注意，

VARCHAR

不允许您执行任何范围（CDR）测试。我可以进一步讨论这个问题。

您的大多数查询在逻辑上是无效的，因为您选择的是

GROUP BY

子句中未提及的非聚合列。这就是说，聚合需要时间，假设您的表相当大，那么聚合速度也会很慢。第一步是使用

EXPLAIN

获取查询计划，并查看那里发生了什么。请也包括在内。您没有提供足够的信息让我们帮助您。请注意，并注意有关查询性能的部分。“那么请回答你的问题。”TimBiegeleisen，恐怕我不明白。我肯定选择（并计算）了

groupby

子句中提到的内容@SamiKuhmonen，我编辑了这个问题，添加了

EXPLAIN

.1）中的结果。从我在邮件中提到的关于优化查询的指南中，我得到的印象是，只有当

WHERE

子句包含“=”比较时，才应该使用这样的复合索引。我将尝试创建一个复合索引，看看会发生什么。2）这些是完整的查询。只是简化了数据库模式。好的，我已经创建了复合索引，现在查询速度快多了。谢谢

+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys            | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,ip,starttime_2 | ip   | 47      | NULL | 14771914 |    35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+

  SELECT column, COUNT(column)
    FROM tbl
   WHERE datestamp >= a AND datestamp <= b
   GROUP BY column

UPDATE TABLE tbl ADD INDEX date_col (datestamp, column);

CREATE TABLE IF NOT EXISTS `sessions` (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,   -- note
  `starttime` datetime NOT NULL,
  `ip` varchar(39) NOT NULL CHARACTER SET ascii default '',  -- note
  `country_name` varchar(45) default '',
  `country_iso_code` char(2) CHARACTER SET ascii  default '',  -- note
  `org` varchar(128) default '',
  PRIMARY KEY(starttime, id)  -- in this order
  INDEX(id)                   -- to keep AUTO_INCREMENT happy
  -- The rest are unnecessary for the queries in question:
  KEY (`ip`),
  KEY (`starttime`),
  KEY (`country_name`)
) ENGINE=InnoDB;        -- just in case you are accidentally getting MyISAM