使用GROUPBY的MySQL查询速度非常慢
我有一个使用以下模式的数据库:使用GROUPBY的MySQL查询速度非常慢,mysql,aggregate-functions,query-performance,Mysql,Aggregate Functions,Query Performance,我有一个使用以下模式的数据库: CREATE TABLE IF NOT EXISTS `sessions` ( `starttime` datetime NOT NULL, `ip` varchar(15) NOT NULL default '', `country_name` varchar(45) default '', `country_iso_code` varchar(2) default '', `org` varchar(128) default '', K
CREATE TABLE IF NOT EXISTS `sessions` (
`starttime` datetime NOT NULL,
`ip` varchar(15) NOT NULL default '',
`country_name` varchar(45) default '',
`country_iso_code` varchar(2) default '',
`org` varchar(128) default '',
KEY (`ip`),
KEY (`starttime`),
KEY (`country_name`)
);
(实际的表包含更多的列;我只包含了我查询的列。)引擎是InnoDB
如您所见,有3个索引-在ip
、starttime
和country\u name
这个表非常大,大约包含150万行。我正在对它进行各种查询,试图提取一个月的信息(2018年8月,在下面的示例中)
像这样的问题
SELECT
UNIX_TIMESTAMP(starttime) as time_sec,
country_iso_code AS metric,
COUNT(country_iso_code) AS value
FROM
sessions
WHERE
starttime >= FROM_UNIXTIME(1533070800) AND
starttime <= FROM_UNIXTIME(1535749199)
GROUP BY metric;
SELECT
country_name AS Country,
COUNT(country_name) AS Attacks
FROM
sessions
WHERE
starttime >= FROM_UNIXTIME(1533070800) AND
starttime <= FROM_UNIXTIME(1535749199)
GROUP BY Country;
速度慢得让人无法忍受——我让它运行了大约半个小时,然后放弃了,没有得到任何结果
解释的结果
:
+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | sessions | NULL | index | starttime,starttime_2,country_name | country_name | 138 | NULL | 14771687 | 35.81 | Using where |
+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | sessions | NULL | index | starttime,ip,starttime_2 | ip | 47 | NULL | 14771780 | 35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | sessions | NULL | index | starttime,starttime_2,org | org | 387 | NULL | 14771800 | 35.81 | Using where |
+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | sessions | NULL | index | starttime,ip,starttime_2 | ip | 47 | NULL | 14771914 | 35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
到底是什么问题?我应该索引其他内容吗?也许是(starttime
,country\u name
)的综合指数?我读过,但也许我误解了
以下是一些其他查询,它们的速度同样缓慢,可能也存在相同的问题:
问题2:
问题3:
问题4:
一般来说,表单的查询
SELECT column, COUNT(column)
FROM tbl
WHERE datestamp >= a AND datestamp <= b
GROUP BY column
为您创建索引
注意两件事。第一:单列索引不一定有助于聚合查询性能
第二:在没有看到整个查询的情况下,很难猜出用于进行索引扫描的正确索引。简化的查询通常会导致索引过于简化。更好的是
请注意,您没有主键
;那太淘气了。拥有PK并不会从本质上提高性能,但让PK从starttime
开始会提高性能。让我们这样做:
CREATE TABLE IF NOT EXISTS `sessions` (
id INT UNSIGNED NOT NULL AUTO_INCREMENT, -- note
`starttime` datetime NOT NULL,
`ip` varchar(39) NOT NULL CHARACTER SET ascii default '', -- note
`country_name` varchar(45) default '',
`country_iso_code` char(2) CHARACTER SET ascii default '', -- note
`org` varchar(128) default '',
PRIMARY KEY(starttime, id) -- in this order
INDEX(id) -- to keep AUTO_INCREMENT happy
-- The rest are unnecessary for the queries in question:
KEY (`ip`),
KEY (`starttime`),
KEY (`country_name`)
) ENGINE=InnoDB; -- just in case you are accidentally getting MyISAM
为什么??这将利用PK与数据的“集群”。这样,只扫描时间范围内的表的一部分。而且索引和数据之间不会出现反弹。您不需要很多索引来有效地处理所有情况
IPv6最多需要39个字节。请注意,
VARCHAR
不允许您执行任何范围(CDR)测试。我可以进一步讨论这个问题。您的大多数查询在逻辑上是无效的,因为您选择的是GROUP BY
子句中未提及的非聚合列。这就是说,聚合需要时间,假设您的表相当大,那么聚合速度也会很慢。第一步是使用EXPLAIN
获取查询计划,并查看那里发生了什么。请也包括在内。您没有提供足够的信息让我们帮助您。请注意,并注意有关查询性能的部分。“那么请回答你的问题。”TimBiegeleisen,恐怕我不明白。我肯定选择(并计算)了groupby
子句中提到的内容@SamiKuhmonen,我编辑了这个问题,添加了EXPLAIN
.1)中的结果。从我在邮件中提到的关于优化查询的指南中,我得到的印象是,只有当WHERE
子句包含“=”比较时,才应该使用这样的复合索引。我将尝试创建一个复合索引,看看会发生什么。2) 这些是完整的查询。只是简化了数据库模式。好的,我已经创建了复合索引,现在查询速度快多了。谢谢
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | sessions | NULL | index | starttime,ip,starttime_2 | ip | 47 | NULL | 14771914 | 35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
SELECT column, COUNT(column)
FROM tbl
WHERE datestamp >= a AND datestamp <= b
GROUP BY column
UPDATE TABLE tbl ADD INDEX date_col (datestamp, column);
CREATE TABLE IF NOT EXISTS `sessions` (
id INT UNSIGNED NOT NULL AUTO_INCREMENT, -- note
`starttime` datetime NOT NULL,
`ip` varchar(39) NOT NULL CHARACTER SET ascii default '', -- note
`country_name` varchar(45) default '',
`country_iso_code` char(2) CHARACTER SET ascii default '', -- note
`org` varchar(128) default '',
PRIMARY KEY(starttime, id) -- in this order
INDEX(id) -- to keep AUTO_INCREMENT happy
-- The rest are unnecessary for the queries in question:
KEY (`ip`),
KEY (`starttime`),
KEY (`country_name`)
) ENGINE=InnoDB; -- just in case you are accidentally getting MyISAM