Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/jpa/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Mysql 不同与分组_Mysql_Group By_Distinct - Fatal编程技术网

Mysql 不同与分组

Mysql 不同与分组,mysql,group-by,distinct,Mysql,Group By,Distinct,我有两张这样的桌子。 “订单”表有21886行 CREATE TABLE `order` ( `id` bigint(20) unsigned NOT NULL, `reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `idx_reg_date` (`reg_date`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_

我有两张这样的桌子。 “订单”表有21886行

CREATE TABLE `order` (
  `id` bigint(20) unsigned NOT NULL,
  `reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `idx_reg_date` (`reg_date`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci


CREATE TABLE `order_detail_products` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `order_id` bigint(20) unsigned NOT NULL,
  `order_detail_id` int(11) NOT NULL,
  `prod_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
  KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=572375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
我的问题在这里

MariaDB [test]> explain
    -> SELECT DISTINCT A.id
    -> FROM order A
    -> JOIN order_detail_products B ON A.id = B.order_id
    -> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| id   | select_type | table | type  | possible_keys | key          | key_len | ref               | rows  | Extra                                        |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
|    1 | SIMPLE      | A     | index | PRIMARY       | idx_reg_date | 8       | NULL              | 22151 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | B     | ref   | idx_order_id  | idx_order_id | 8       | bom_20140804.A.id |     2 | Using index; Distinct                        |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

MariaDB [test]> explain
    -> SELECT A.id
    -> FROM order A
    -> JOIN order_detail_products B ON A.id = B.order_id
    -> GROUP BY A.id
    -> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| id   | select_type | table | type  | possible_keys | key          | key_len | ref               | rows | Extra                        |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
|    1 | SIMPLE      | A     | index | PRIMARY       | idx_reg_date | 8       | NULL              |   65 | Using index; Using temporary |
|    1 | SIMPLE      | B     | ref   | idx_order_id  | idx_order_id | 8       | bom_20140804.A.id |    2 | Using index                  |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
上面列出的两个查询返回相同的结果,但distinct太慢(解释太多行)。
有什么不同吗?

我认为您选择的distinct比较慢,因为您在另一个表上匹配时破坏了索引。在大多数情况下,选择“不同”会更快。但在这种情况下,由于您正在匹配另一个表的参数,索引被破坏,速度要慢得多。

通常建议使用
DISTINCT
而不是
GROUP BY
,因为这是您实际需要的,并让优化器选择“最佳”执行计划。然而,没有一个优化器是完美的。使用
DISTINCT
优化器可以为执行计划提供更多选项。但这也意味着它有更多的选择来选择一个糟糕的计划

您编写的
DISTINCT
查询是“慢”的,但您不告诉任何数字。在我的测试中(MariaDB 10.0.19和10.3.13上的行数是原来的10倍),
DISTINCT
查询速度(仅)慢了25%(562ms/453ms)。
EXPLAIN
结果毫无帮助。这甚至是“撒谎”。使用
LIMIT 100,30
时,它需要读取至少130行(这是我的
EXPLAIN
实际上为
GROUP BY
选择的),但它显示的是65行

我无法解释25%的执行时间差异,但引擎似乎在任何情况下都在进行完整的表/索引扫描,并在跳过100行并选择30行之前对结果进行排序

最好的计划可能是:

  • idx\u reg\u date
    索引(表
    A
    )中按降序逐个读取行
  • 查看
    idx\u order\u id
    索引中是否有匹配项(表
    B
  • 跳过100个匹配行
  • 发送30个匹配行
  • 出口
如果
A
中大约有10%的行在
B
中不匹配,则此计划将从
A
中读取大约143行

我能做的最好的办法是以某种方式强制实施这一计划:

SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
此查询在156毫秒内返回相同的结果(比
GROUP BY
快3倍)。但这仍然太慢。而且它可能仍在读取表
A
中的所有行

我们可以用“小”子查询技巧证明更好的计划是存在的:

SELECT A.id
FROM (
    SELECT id, reg_date
    FROM `order`
    ORDER BY reg_date DESC
    LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
此查询在“无时间”(~0毫秒)内执行,并在我的测试数据上返回相同的结果。虽然它不是100%可靠,但它表明优化器做得不好

那么,我的结论是什么:

  • 优化器并不总是做得最好,有时需要帮助
  • 即使我们知道“最好的计划”,我们也不能总是执行它
  • DISTINCT
    并不总是比
    groupby
  • 当没有索引可以用于所有子句时,事情变得相当棘手
测试模式和虚拟数据: 查询:
出于对人类的热爱,不要将表“
order
”称为@morten.c的可能副本,但答案正好相反,此处的答案建议分组,而后者建议区分。如果您将在此处获得答案,请运行
RESET QUERY CACHE
drop table if exists `order`;
CREATE TABLE `order` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `idx_reg_date` (`reg_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

insert into `order`(reg_date)
    select from_unixtime(floor(rand(1) * 1000000000)) as reg_date
    from information_schema.COLUMNS a
       , information_schema.COLUMNS b
    limit 218860;

drop table if exists `order_detail_products`;
CREATE TABLE `order_detail_products` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `order_id` bigint(20) unsigned NOT NULL,
  `order_detail_id` int(11) NOT NULL,
  `prod_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
  KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

insert into order_detail_products(id, order_id, order_detail_id, prod_id)
    select null as id
    , floor(rand(2)*218860)+1 as order_id
    , 0 as order_detail_id
    , 0 as prod_id
    from information_schema.COLUMNS a
       , information_schema.COLUMNS b
    limit 437320;
SELECT DISTINCT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 562 ms

SELECT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
GROUP BY A.id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 453 ms

SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 156 ms

SELECT A.id
FROM (
    SELECT id, reg_date
    FROM `order`
    ORDER BY reg_date DESC
    LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- ~ 0 ms