MYSQL子查询与Join-两者对我都不好
我正在使用MYSQL。 我有三张桌子MYSQL子查询与Join-两者对我都不好,mysql,Mysql,我正在使用MYSQL。 我有三张桌子 人员表,由两列组成: id-表中的主键 姓名-人员姓名 收入表,其中包含人员表中人员的收入。 此表中的每条记录代表一个人的收入。 在此表中,一个人可能没有或有很多收入。 表格结构为: 人员id(“人员”表的外键) 金额(十进制类型-金额) 金额的小时数(整数类型-赚取此收入所需的小时数) 费用包含人员费用的表格。 此表中的每条记录表示一个人的支出, 他花了一笔钱买了多少东西。 一个人在此表中可以有零个或多个费用记录。 表格结构为: 人员id(“人员
人员
表,由两列组成:
- id-表中的主键
- 姓名-人员姓名
收入
表,其中包含人员表中人员的收入。
此表中的每条记录代表一个人的收入。
在此表中,一个人可能没有或有很多收入。
表格结构为:
- 人员id(“人员”表的外键)
- 金额(十进制类型-金额)
- 金额的小时数(整数类型-赚取此收入所需的小时数)
费用
包含人员费用的表格。
此表中的每条记录表示一个人的支出,
他花了一笔钱买了多少东西。
一个人在此表中可以有零个或多个费用记录。
表格结构为:
- 人员id(“人员”表的外键)
- 金额(十进制类型的金额)
- 购买的物品数量(整数类型-此费用中购买的物品数量)
- 此人的姓名
- 他所有收入的总和
- 他工作的总小时数
- 他所有费用的总和
- 他买的东西的总数
SELECT name, income_sum, work_hours_sum, expenses_sum, items_count
FROM (people
LEFT JOIN
(SELECT person_id, sum(amount) as income_sum,
sum(number_of_hours_for_amount) as work_hours_sum
FROM income
GROUP BY person_id) as income_subquery
ON people.id = income_subquery.person_id)
LEFT JOIN
(SELECT person_id, sum(amount) as expenses_sum,
sum(number_of_items_bought) as items_count
FROM expenses
GROUP BY person_id) as income_subquery
ON people.id = income_subquery.person_id
据我所知,这个查询的问题是,一旦我从子查询中获得数据,连接的效率就会非常低
因为这些表是临时子查询表,所以在这些表上没有很好地使用索引
充分利用现有索引的最佳方法是直接在三个表之间进行连接
而不是通过子查询。
但这不是一个正确的解决方案,因为它将创建一个笛卡尔积,该积将向聚合和添加重复的值
从那些比他们应该看到的更多的记录中
(我尝试的另一个选择是将每个人的收入和支出值计算为一个select_表达式。)
在SELECT部分(依赖子查询)。这也没有足够快地工作)
我正在寻找一个高效的查询并给出这些结果。试试这个。两个联接都应该使用
people.id
上的索引
SELECT name, income_sum, work_hours_sum, expenses_sum, items_count
FROM people
LEFT JOIN
(SELECT person_id, sum(amount) as income_sum,
sum(number_of_hours_for_amount) as work_hours_sum
FROM income
GROUP BY person_id) as income_subquery
ON people.id = income_subquery.person_id
LEFT JOIN
(SELECT person_id, sum(amount) as expenses_sum,
sum(number_of_items_bought) as items_count
FROM expenses
GROUP BY person_id) as expenses_subquery
ON people.id = expenses_subquery.person_id
理想情况下,一个好的查询优化器会意识到您的原始SQL与此等价。但是您使用的是MySQL,所以我不希望进行理想的优化
请确保您在
收入、人员id
和费用、人员id
上有索引,这样子查询中的分组将非常有效。类似的内容将使您非常接近:
select id, name, (select sum(amount) from income i where i.person_id = p.id) as 'total_income_amount',
(select sum(number_of_hours_for_amount) from income i where i.person_id = p.id) as 'total_number_of_hours_for_amount',
(select sum(amount) from expenses e where e.person_id = p.id) as 'total_expenses_amount',
(select sum(number_of_items_bought) from expenses e where e.person_id = p.id) as 'total_number_of_items_bought'
from people p;
你是对的,这里有一个不可避免的笛卡尔积。您可以将此问题分解为两个子查询: 一项收入:
SELECT p.id, p.name, SUM(i.amount) AS income_sum, SUM(number_of_hours_for_amount) AS work_hours_sum
FROM people p
LEFT JOIN income i ON p.id = i.person_id
GROUP BY p.id;
+----+---------+------------+----------------+
| id | name | income_sum | work_hours_sum |
+----+---------+------------+----------------+
| 1 | Groucho | 20.00 | 20 |
| 2 | Harpo | 40.00 | 40 |
| 3 | Chico | 60.00 | 60 |
+----+---------+------------+----------------+
下面是对该查询的解释:
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using temporary; Using filesort |
| 1 | SIMPLE | i | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
一项费用:
SELECT p.id, SUM(e.amount) AS expenses_sum, SUM(number_of_items_bought) AS items_count
FROM people p
LEFT JOIN expenses e ON p.id = e.person_id
GROUP BY p.id;
+----+--------------+-------------+
| id | expenses_sum | items_count |
+----+--------------+-------------+
| 1 | 30.00 | 4 |
| 2 | 30.00 | 4 |
| 3 | 30.00 | 4 |
+----+--------------+-------------+
下面是解释:
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using temporary; Using filesort |
| 1 | SIMPLE | e | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
我们在上面的解释报告中看到,查询在收入和支出表上使用表扫描(键入“ALL”),并在没有索引的情况下进行联接(“使用联接缓冲区”)。红色标志是连接中涉及两个表,其中两个表都使用了访问类型“ALL”。如果这些表中的行数不多,那么成本就会非常高。它通常伴随着“使用联接缓冲区”,这是一个代价高昂的查询的另一个危险信号
最后,它通过使用临时表和文件排序来低效地执行组。这是另一个性能杀手
是一个MySQL 5.6的东西。如果使用早期版本的MySQL,您将看不到这一点
以下索引应有助于使这些查询变得更好:
ALTER TABLE income ADD KEY (person_id, amount, number_of_hours_for_amount);
ALTER TABLE expenses ADD KEY (person_id, amount, number_of_items_bought);
现在解释报告不再显示低效的访问。连接使用索引(类型“ref”)完成,临时表和文件排序消失。“使用索引”表示它仅通过索引中的列访问连接的表,根本不需要触摸表行
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | NULL |
| 1 | SIMPLE | i | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | NULL |
| 1 | SIMPLE | e | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
您说过要在一个查询中完成此操作,下面是如何完成此操作:
我们可以将这两个单独的查询合并到一个查询中,以获得每人一行的结果:
SELECT name, income_sum, work_hours_sum, expenses_sum, items_count
FROM
(SELECT p.id, p.name, SUM(i.amount) AS income_sum, SUM(number_of_hours_for_amount) AS work_hours_sum
FROM people p
LEFT OUTER JOIN income i ON p.id = i.person_id
GROUP BY p.id) AS subq_i
INNER JOIN
(SELECT p.id, SUM(e.amount) AS expenses_sum, SUM(number_of_items_bought) AS items_count
FROM people p
LEFT OUTER JOIN expenses e ON p.id = e.person_id
GROUP BY p.id) AS subq_e
USING (id);
+---------+------------+----------------+--------------+-------------+
| name | income_sum | work_hours_sum | expenses_sum | items_count |
+---------+------------+----------------+--------------+-------------+
| Groucho | 20.00 | 20 | 30.00 | 4 |
| Harpo | 40.00 | 40 | 30.00 | 4 |
| Chico | 60.00 | 60 | 30.00 | 4 |
+---------+------------+----------------+--------------+-------------+
即使对于这个联合查询,解释看起来也没那么糟糕。没有临时表、文件队列或联接缓冲区,并且很好地使用了覆盖索引
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | NULL |
| 1 | PRIMARY | <derived3> | ref | <auto_key0> | <auto_key0> | 4 | subq_i.id | 2 | NULL |
| 3 | DERIVED | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | Using index |
| 3 | DERIVED | e | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
| 2 | DERIVED | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | NULL |
| 2 | DERIVED | i | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
|id |选择|类型|类型|可能的|键|键|列|参考|行|额外|
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
|1 | PRIMARY | ALL | NULL | NULL | NULL | NULL | 3 | NULL|
|1 |主| | | | | | |参考| | | 4 |子Q|U i.id | 2 |空|
|3 |派生| p |索引|主|主| 4 |空| 3 |使用索引|
|3 |导出| e |参考|人员| id |人员| id | 5 |测试p.id | 1 |使用索引|
|2 |派生| p |索引|主|主| 4 |空| 3 |空|
|2 |衍生| i |参考|个人| id |个人| id | 5 |测试p.id | 1 |使用索引|
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
也许您可以完全跳过连接
SELECT person_id
, MIN(name) AS name
, SUM(income_sum) AS income_sum
, SUM(work_hours_sum) AS work_hours_sum
, SUM(expenses_sum) AS expenses_sum
, SUM(items_count) AS items_count
FROM (
SELECT id AS person_id
, name
, NULL AS income_sum
, NULL AS work_hours_sum
, NULL AS expenses_sum
, NULL AS items_count
FROM people
UNION ALL
SELECT person_id
, NULL AS name
, sum(amount) AS income_sum
, sum(number_of_hours_for_amount) AS work_hours_sum
, NULL AS expenses_sum
, NULL AS items_count
FROM income
GROUP BY person_id
UNION ALL
SELECT person_id
, NULL AS name
, NULL AS income_sum
, NULL AS work_hours_sum
, sum(amount) AS expenses_sum
, sum(number_of_items_bought) AS items_count
FROM expenses
GROUP BY person_id
) as d
WHERE person_id IS NOT NULL -- my sql generates this row
GROUP BY person_id
您多久更新一次收入和支出?你能把它们的总和聚合到另一个表中吗?表中有多少行?“不快”有多慢?你能把解释贴出来吗