Mysql 有没有办法通过中值查询优化组?
我写了一个查询,查找每个月的中值。做这件事已经够难了,因为MySQL没有内置的中值函数,所以我真的必须用我的中级SQL技能跳出框框来思考。但现在的问题是运行查询需要很长时间(有时需要1或2分钟)。有没有办法优化此查询?或者我应该编写一个python脚本,找到中间值并使用连接器将其推送到数据库 以下是查询:Mysql 有没有办法通过中值查询优化组?,mysql,group-by,median,Mysql,Group By,Median,我写了一个查询,查找每个月的中值。做这件事已经够难了,因为MySQL没有内置的中值函数,所以我真的必须用我的中级SQL技能跳出框框来思考。但现在的问题是运行查询需要很长时间(有时需要1或2分钟)。有没有办法优化此查询?或者我应该编写一个python脚本,找到中间值并使用连接器将其推送到数据库 以下是查询: SET @row_num_pos := 0; SET @median_group_pos := ''; SET @row_num_neg := 0; SET @median_group_neg
SET @row_num_pos := 0;
SET @median_group_pos := '';
SET @row_num_neg := 0;
SET @median_group_neg := '';
SELECT
p.month_num AS 'month_num',
CASE
WHEN p.month_num = 1 THEN 'Jan'
WHEN p.month_num = 2 THEN 'Feb'
WHEN p.month_num = 3 THEN 'Mar'
WHEN p.month_num = 4 THEN 'Apr'
WHEN p.month_num = 5 THEN 'May'
WHEN p.month_num = 6 THEN 'Jun'
WHEN p.month_num = 7 THEN 'Jul'
WHEN p.month_num = 8 THEN 'Aug'
WHEN p.month_num = 9 THEN 'Sep'
WHEN p.month_num = 10 THEN 'Oct'
WHEN p.month_num = 11 THEN 'Nov'
WHEN p.month_num = 12 THEN 'Dec'
END AS 'Timeline',
p.ck_pos_median AS 'CK+ Median',
n.ck_neg_median AS 'CK- Median'
FROM
(SELECT
s.median_month_pos AS 'month_num',
ROUND(AVG(ck_pos), 1) AS 'ck_pos_median'
FROM
(SELECT
@row_num_pos:=CASE
WHEN @median_group_pos = q.month_num THEN @row_num_pos + 1
ELSE 1
END AS 'count_of_group',
@median_group_pos:=q.month_num AS 'median_month_pos',
q.month_num,
q.ck_pos,
(SELECT
COUNT(*)
FROM
Biocept_DB.result_management_report
WHERE
ck_pos IS NOT NULL
AND MONTH(order_date) = q.month_num) AS total_month
FROM
(SELECT
MONTH(order_date) AS 'month_num', ck_pos
FROM
Biocept_DB.result_management_report
WHERE
ck_pos IS NOT NULL
ORDER BY MONTH(order_date) , ck_pos ASC) AS q) AS s
WHERE
s.count_of_group BETWEEN (s.total_month / 2.0) AND (s.total_month / 2.0 + 1)
GROUP BY s.median_month_pos) AS p
JOIN
(SELECT
s.median_month_neg AS 'month_num',
ROUND(AVG(ck_neg), 1) AS 'ck_neg_median'
FROM
(SELECT
@row_num_neg:=CASE
WHEN @median_group_neg = q.month_num THEN @row_num_neg + 1
ELSE 1
END AS 'count_of_group',
@median_group_neg:=q.month_num AS 'median_month_neg',
q.month_num,
q.ck_neg,
(SELECT
COUNT(*)
FROM
Biocept_DB.result_management_report
WHERE
ck_neg IS NOT NULL
AND MONTH(order_date) = q.month_num) AS total_month
FROM
(SELECT
MONTH(order_date) AS 'month_num', ck_neg
FROM
Biocept_DB.result_management_report
WHERE
ck_neg IS NOT NULL
ORDER BY MONTH(order_date) , ck_neg ASC) AS q) AS s
WHERE
s.count_of_group BETWEEN (s.total_month / 2.0) AND (s.total_month / 2.0 + 1)
GROUP BY s.median_month_neg) AS n ON p.month_num = n.month_num
ORDER BY p.month_num;
SET @row_num_pos := NULL;
SET @median_group_pos := NULL;
SET @row_num_neg := NULL;
SET @median_group_neg := NULL;
以下是它生成的表:
我对您的查询做了一些修改。我希望计算是正确的。对于您的样本日期,结果是相同的 我在我的环境中,您的查询需要6.22秒,而我的查询只需要20毫秒。因此,它看起来快了300倍 请测试我的查询,并让我知道它是否适合您。速度不是很好,我们可以使用虚拟列进行更多优化 请不要忘记为您设置一个好的值
SET SESSION group_concat_max_len = 1000000;
查询
SELECT r.Timeline AS `month_number`
, SUBSTRING_INDEX(SUBSTRING_INDEX( 'Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec', ',' , r.Timeline ) , ',' , -1)
AS Timeline
, ( SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_pos, ',' , (r.cnt_ck_pos>>1)+1 ) , ',' , -1) +
SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_pos, ',' , (r.cnt_ck_pos>>1) + 1 - ((r.cnt_ck_pos&1) XOR 1) ) , ',' , -1)
) / 2 AS 'ck_pos_median'
, ( SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_neg, ',' , (r.cnt_ck_neg>>1)+1 ) , ',' , -1) +
SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_neg, ',' , (r.cnt_ck_neg>>1) + 1 - ((r.cnt_ck_neg&1) XOR 1) ) , ',' , -1)
) / 2 AS 'ck_neg_median'
FROM (
SELECT MONTH(`order_date`) AS 'Timeline'
, SUM(IF(ck_pos is NULL,0,1)) AS cnt_ck_pos
, GROUP_CONCAT(ck_pos ORDER BY ck_pos) as grp_ck_pos
, SUM(IF(ck_neg is NULL,0,1)) AS cnt_ck_neg
, GROUP_CONCAT(ck_neg ORDER BY ck_neg) as grp_ck_neg
FROM result_management_report
where (ck_pos is not null or ck_neg is not null) = 1
GROUP BY Timeline
) r;
表格定义
新查询示例
mysql> SELECT r.Timeline AS `month_number`
-> , SUBSTRING_INDEX(SUBSTRING_INDEX( 'Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec', ',' , r.Timeline ) , ',' , -1)
-> AS Timeline
-> , ( SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_pos, ',' , (r.cnt_ck_pos>>1)+1 ) , ',' , -1) +
-> SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_pos, ',' , (r.cnt_ck_pos>>1) + 1 - ((r.cnt_ck_pos&1) XOR 1) ) , ',' , -1)
-> ) / 2 AS 'ck_pos_median'
-> , ( SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_neg, ',' , (r.cnt_ck_neg>>1)+1 ) , ',' , -1) +
-> SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_neg, ',' , (r.cnt_ck_neg>>1) + 1 - ((r.cnt_ck_neg&1) XOR 1) ) , ',' , -1)
-> ) / 2 AS 'ck_neg_median'
-> FROM (
-> SELECT MONTH(`order_date`) AS 'Timeline'
-> , SUM(IF(ck_pos is NULL,0,1)) AS cnt_ck_pos
-> , GROUP_CONCAT(ck_pos ORDER BY ck_pos) as grp_ck_pos
-> , SUM(IF(ck_neg is NULL,0,1)) AS cnt_ck_neg
-> , GROUP_CONCAT(ck_neg ORDER BY ck_neg) as grp_ck_neg
-> FROM result_management_report
-> where (ck_pos is not null or ck_neg is not null) = 1
-> GROUP BY Timeline
-> ) r;
+--------------+----------+---------------+---------------+
| month_number | Timeline | ck_pos_median | ck_neg_median |
+--------------+----------+---------------+---------------+
| 1 | Jan | 2 | 2 |
| 2 | Feb | 2 | 3 |
| 3 | Mar | 2 | 3 |
| 4 | Apr | 4 | 4 |
| 5 | May | 2 | 3 |
| 6 | Jun | 3 | 3 |
| 7 | Jul | 4 | 4 |
| 8 | Aug | 3 | 7 |
| 9 | Sep | 4 | 12 |
| 10 | Oct | 5 | 8 |
| 11 | Nov | 4 | 9 |
| 12 | Dec | 2 | 12 |
+--------------+----------+---------------+---------------+
12 rows in set (0.02 sec)
mysql>
mysql> SET @row_num_pos := 0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @median_group_pos := '';
Query OK, 0 rows affected (0.00 sec)
mysql> SET @row_num_neg := 0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @median_group_neg := '';
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> SELECT
-> p.month_num AS 'month_num',
-> CASE
-> WHEN p.month_num = 1 THEN 'Jan'
-> WHEN p.month_num = 2 THEN 'Feb'
-> WHEN p.month_num = 3 THEN 'Mar'
-> WHEN p.month_num = 4 THEN 'Apr'
-> WHEN p.month_num = 5 THEN 'May'
-> WHEN p.month_num = 6 THEN 'Jun'
-> WHEN p.month_num = 7 THEN 'Jul'
-> WHEN p.month_num = 8 THEN 'Aug'
-> WHEN p.month_num = 9 THEN 'Sep'
-> WHEN p.month_num = 10 THEN 'Oct'
-> WHEN p.month_num = 11 THEN 'Nov'
-> WHEN p.month_num = 12 THEN 'Dec'
-> END AS 'Timeline',
-> p.ck_pos_median AS 'CK+ Median',
-> n.ck_neg_median AS 'CK- Median'
-> FROM
-> (SELECT
-> s.median_month_pos AS 'month_num',
-> ROUND(AVG(ck_pos), 1) AS 'ck_pos_median'
-> FROM
-> (SELECT
-> @row_num_pos:=CASE
-> WHEN @median_group_pos = q.month_num THEN @row_num_pos + 1
-> ELSE 1
-> END AS 'count_of_group',
-> @median_group_pos:=q.month_num AS 'median_month_pos',
-> q.month_num,
-> q.ck_pos,
-> (SELECT
-> COUNT(*)
-> FROM
-> result_management_report
-> WHERE
-> ck_pos IS NOT NULL
-> AND MONTH(order_date) = q.month_num) AS total_month
-> FROM
-> (SELECT
-> MONTH(order_date) AS 'month_num', ck_pos
-> FROM
-> result_management_report
-> WHERE
-> ck_pos IS NOT NULL
-> ORDER BY MONTH(order_date) , ck_pos ASC) AS q) AS s
-> WHERE
-> s.count_of_group BETWEEN (s.total_month / 2.0) AND (s.total_month / 2.0 + 1)
-> GROUP BY s.median_month_pos) AS p
-> JOIN
-> (SELECT
-> s.median_month_neg AS 'month_num',
-> ROUND(AVG(ck_neg), 1) AS 'ck_neg_median'
-> FROM
-> (SELECT
-> @row_num_neg:=CASE
-> WHEN @median_group_neg = q.month_num THEN @row_num_neg + 1
-> ELSE 1
-> END AS 'count_of_group',
-> @median_group_neg:=q.month_num AS 'median_month_neg',
-> q.month_num,
-> q.ck_neg,
-> (SELECT
-> COUNT(*)
-> FROM
-> result_management_report
-> WHERE
-> ck_neg IS NOT NULL
-> AND MONTH(order_date) = q.month_num) AS total_month
-> FROM
-> (SELECT
-> MONTH(order_date) AS 'month_num', ck_neg
-> FROM
-> result_management_report
-> WHERE
-> ck_neg IS NOT NULL
-> ORDER BY MONTH(order_date) , ck_neg ASC) AS q) AS s
-> WHERE
-> s.count_of_group BETWEEN (s.total_month / 2.0) AND (s.total_month / 2.0 + 1)
-> GROUP BY s.median_month_neg) AS n ON p.month_num = n.month_num
-> ORDER BY p.month_num;
+-----------+----------+------------+------------+
| month_num | Timeline | CK+ Median | CK- Median |
+-----------+----------+------------+------------+
| 1 | Jan | 2.0 | 2.0 |
| 2 | Feb | 2.0 | 3.0 |
| 3 | Mar | 2.0 | 3.0 |
| 4 | Apr | 4.0 | 4.0 |
| 5 | May | 2.0 | 3.0 |
| 6 | Jun | 3.0 | 3.0 |
| 7 | Jul | 4.0 | 4.0 |
| 8 | Aug | 3.0 | 7.0 |
| 9 | Sep | 4.0 | 12.0 |
| 10 | Oct | 5.0 | 8.0 |
| 11 | Nov | 4.0 | 9.0 |
| 12 | Dec | 2.0 | 12.0 |
+-----------+----------+------------+------------+
12 rows in set (6.22 sec)
mysql>
旧查询示例
mysql> SELECT r.Timeline AS `month_number`
-> , SUBSTRING_INDEX(SUBSTRING_INDEX( 'Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec', ',' , r.Timeline ) , ',' , -1)
-> AS Timeline
-> , ( SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_pos, ',' , (r.cnt_ck_pos>>1)+1 ) , ',' , -1) +
-> SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_pos, ',' , (r.cnt_ck_pos>>1) + 1 - ((r.cnt_ck_pos&1) XOR 1) ) , ',' , -1)
-> ) / 2 AS 'ck_pos_median'
-> , ( SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_neg, ',' , (r.cnt_ck_neg>>1)+1 ) , ',' , -1) +
-> SUBSTRING_INDEX( SUBSTRING_INDEX(r.grp_ck_neg, ',' , (r.cnt_ck_neg>>1) + 1 - ((r.cnt_ck_neg&1) XOR 1) ) , ',' , -1)
-> ) / 2 AS 'ck_neg_median'
-> FROM (
-> SELECT MONTH(`order_date`) AS 'Timeline'
-> , SUM(IF(ck_pos is NULL,0,1)) AS cnt_ck_pos
-> , GROUP_CONCAT(ck_pos ORDER BY ck_pos) as grp_ck_pos
-> , SUM(IF(ck_neg is NULL,0,1)) AS cnt_ck_neg
-> , GROUP_CONCAT(ck_neg ORDER BY ck_neg) as grp_ck_neg
-> FROM result_management_report
-> where (ck_pos is not null or ck_neg is not null) = 1
-> GROUP BY Timeline
-> ) r;
+--------------+----------+---------------+---------------+
| month_number | Timeline | ck_pos_median | ck_neg_median |
+--------------+----------+---------------+---------------+
| 1 | Jan | 2 | 2 |
| 2 | Feb | 2 | 3 |
| 3 | Mar | 2 | 3 |
| 4 | Apr | 4 | 4 |
| 5 | May | 2 | 3 |
| 6 | Jun | 3 | 3 |
| 7 | Jul | 4 | 4 |
| 8 | Aug | 3 | 7 |
| 9 | Sep | 4 | 12 |
| 10 | Oct | 5 | 8 |
| 11 | Nov | 4 | 9 |
| 12 | Dec | 2 | 12 |
+--------------+----------+---------------+---------------+
12 rows in set (0.02 sec)
mysql>
mysql> SET @row_num_pos := 0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @median_group_pos := '';
Query OK, 0 rows affected (0.00 sec)
mysql> SET @row_num_neg := 0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @median_group_neg := '';
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> SELECT
-> p.month_num AS 'month_num',
-> CASE
-> WHEN p.month_num = 1 THEN 'Jan'
-> WHEN p.month_num = 2 THEN 'Feb'
-> WHEN p.month_num = 3 THEN 'Mar'
-> WHEN p.month_num = 4 THEN 'Apr'
-> WHEN p.month_num = 5 THEN 'May'
-> WHEN p.month_num = 6 THEN 'Jun'
-> WHEN p.month_num = 7 THEN 'Jul'
-> WHEN p.month_num = 8 THEN 'Aug'
-> WHEN p.month_num = 9 THEN 'Sep'
-> WHEN p.month_num = 10 THEN 'Oct'
-> WHEN p.month_num = 11 THEN 'Nov'
-> WHEN p.month_num = 12 THEN 'Dec'
-> END AS 'Timeline',
-> p.ck_pos_median AS 'CK+ Median',
-> n.ck_neg_median AS 'CK- Median'
-> FROM
-> (SELECT
-> s.median_month_pos AS 'month_num',
-> ROUND(AVG(ck_pos), 1) AS 'ck_pos_median'
-> FROM
-> (SELECT
-> @row_num_pos:=CASE
-> WHEN @median_group_pos = q.month_num THEN @row_num_pos + 1
-> ELSE 1
-> END AS 'count_of_group',
-> @median_group_pos:=q.month_num AS 'median_month_pos',
-> q.month_num,
-> q.ck_pos,
-> (SELECT
-> COUNT(*)
-> FROM
-> result_management_report
-> WHERE
-> ck_pos IS NOT NULL
-> AND MONTH(order_date) = q.month_num) AS total_month
-> FROM
-> (SELECT
-> MONTH(order_date) AS 'month_num', ck_pos
-> FROM
-> result_management_report
-> WHERE
-> ck_pos IS NOT NULL
-> ORDER BY MONTH(order_date) , ck_pos ASC) AS q) AS s
-> WHERE
-> s.count_of_group BETWEEN (s.total_month / 2.0) AND (s.total_month / 2.0 + 1)
-> GROUP BY s.median_month_pos) AS p
-> JOIN
-> (SELECT
-> s.median_month_neg AS 'month_num',
-> ROUND(AVG(ck_neg), 1) AS 'ck_neg_median'
-> FROM
-> (SELECT
-> @row_num_neg:=CASE
-> WHEN @median_group_neg = q.month_num THEN @row_num_neg + 1
-> ELSE 1
-> END AS 'count_of_group',
-> @median_group_neg:=q.month_num AS 'median_month_neg',
-> q.month_num,
-> q.ck_neg,
-> (SELECT
-> COUNT(*)
-> FROM
-> result_management_report
-> WHERE
-> ck_neg IS NOT NULL
-> AND MONTH(order_date) = q.month_num) AS total_month
-> FROM
-> (SELECT
-> MONTH(order_date) AS 'month_num', ck_neg
-> FROM
-> result_management_report
-> WHERE
-> ck_neg IS NOT NULL
-> ORDER BY MONTH(order_date) , ck_neg ASC) AS q) AS s
-> WHERE
-> s.count_of_group BETWEEN (s.total_month / 2.0) AND (s.total_month / 2.0 + 1)
-> GROUP BY s.median_month_neg) AS n ON p.month_num = n.month_num
-> ORDER BY p.month_num;
+-----------+----------+------------+------------+
| month_num | Timeline | CK+ Median | CK- Median |
+-----------+----------+------------+------------+
| 1 | Jan | 2.0 | 2.0 |
| 2 | Feb | 2.0 | 3.0 |
| 3 | Mar | 2.0 | 3.0 |
| 4 | Apr | 4.0 | 4.0 |
| 5 | May | 2.0 | 3.0 |
| 6 | Jun | 3.0 | 3.0 |
| 7 | Jul | 4.0 | 4.0 |
| 8 | Aug | 3.0 | 7.0 |
| 9 | Sep | 4.0 | 12.0 |
| 10 | Oct | 5.0 | 8.0 |
| 11 | Nov | 4.0 | 9.0 |
| 12 | Dec | 2.0 | 12.0 |
+-----------+----------+------------+------------+
12 rows in set (6.22 sec)
mysql>
在MariaDB中,是一个中值函数,请参见:。我不确定MySQL实现中是否也存在这种情况MySQL中没有中间值函数。由于这个原因,我一直在考虑转换一段时间。您是否有一些原始数据和创建表定义供我测试(pastebin)是否可以从您那里获取数据。我几乎可以肯定,我可以优化它,我将出口和发送给你们两个。嗨,伯纳德,我刚刚检查,它的工作非常出色!我会尝试去理解你做了什么,并为其他年份和数据集修改它,但非常感谢你@谢谢你的回复。如果您想了解查询,请告诉我。在我的个人资料中还有我的电子邮件