使用MySQL,如何计算具有左连接记录的中值?

使用MySQL,如何计算具有左连接记录的中值?,mysql,sql,Mysql,Sql,鉴于以下两个表格,我想知道如何计算每周评论的中位数 评论 (id, user_id, completed_at) reviews.completed_at -- lets us know the user submitted the review, it's not a draft. reviews_areas (created_at, review_id, rating) reviews_areas.rating = INT between 0…10) 审查各领域 (id, user_

鉴于以下两个表格,我想知道如何计算每周评论的中位数

评论

(id, user_id, completed_at)

reviews.completed_at -- lets us know the user submitted the review, it's not a draft.
reviews_areas (created_at, review_id, rating)

reviews_areas.rating = INT between 0…10)
审查各领域

(id, user_id, completed_at)

reviews.completed_at -- lets us know the user submitted the review, it's not a draft.
reviews_areas (created_at, review_id, rating)

reviews_areas.rating = INT between 0…10)
示例数据:

评论:

+----+---------+---------------------+
| id | user_id |    completed_at     |
+----+---------+---------------------+
|  1 |     100 | 2019-07-20 11:34:40 |
|  2 |     100 | 2019-07-22 11:34:40 |
|  3 |     500 | 2019-07-30 16:34:40 |
+----+---------+---------------------+
+------------+-----------+--------+
| created_at | review_id | rating |
+------------+-----------+--------+
| 1:34:40    |         1 |      0 |
| 12:34:40   |         1 |      5 |
| 11:34:40   |         1 |     10 |
| 5:34:40    |         1 |      9 |
| 6:34:40    |         2 |      1 |
| 1:34:40    |         2 |      5 |
| 2:32:40    |         3 |      5 |
+------------+-----------+--------+
评论\u领域:

+----+---------+---------------------+
| id | user_id |    completed_at     |
+----+---------+---------------------+
|  1 |     100 | 2019-07-20 11:34:40 |
|  2 |     100 | 2019-07-22 11:34:40 |
|  3 |     500 | 2019-07-30 16:34:40 |
+----+---------+---------------------+
+------------+-----------+--------+
| created_at | review_id | rating |
+------------+-----------+--------+
| 1:34:40    |         1 |      0 |
| 12:34:40   |         1 |      5 |
| 11:34:40   |         1 |     10 |
| 5:34:40    |         1 |      9 |
| 6:34:40    |         2 |      1 |
| 1:34:40    |         2 |      5 |
| 2:32:40    |         3 |      5 |
+------------+-----------+--------+

问题还不清楚。我将使用
reviews.completed\u at
作为日期,因为
reviews\u区域。created\u at
只包含一个时间

我们需要对日期进行
reviews
,对评分区域进行
reviews\u

为了避免不同年份的同一周重叠,我们使用将日期转换为年+周

为了得到中位数,我们需要找到每周的中间一行(如果有偶数的话,也可以是中间一行)。有很多方法可以做到这一点。我要买婴儿床。我们按升序和降序计算
行数()。重叠的+/-1为中间行。然后我们平均他们

1 2 3 4 5 6
6 5 4 3 2 1
    ^^^
    median rows
首先,我们按周计算行数

select
    yearweek(completed_at) as week,
    rating,
    row_number() over(
        partition by yearweek(completed_at)
        order by rating asc, id asc
    ) as row_asc,
    row_number() over(
        partition by yearweek(completed_at)
        order by rating desc, id desc
    ) as row_desc
from reviews_areas ra
join reviews r on r.id = ra.review_id
行号按等级asc、id asc的顺序排列。
id
是第二种排序,用于消除具有相同评级的行的歧义

+--------+--------+---------+----------+
| week   | rating | row_asc | row_desc |
+--------+--------+---------+----------+
| 201928 |     10 |       4 |        1 |
| 201928 |      9 |       3 |        2 |
| 201928 |      5 |       2 |        3 |
| 201928 |      0 |       1 |        4 |
| 201929 |      5 |       2 |        1 |
| 201929 |      1 |       1 |        2 |
| 201930 |      5 |       1 |        1 |
+--------+--------+---------+----------+
然后我们用它作为平均值来计算每周中间几行。子查询也同样有效

with rating_weeks as (
    select
        yearweek(completed_at) as week,
        rating,
        row_number() over(
            partition by yearweek(completed_at)
            order by rating asc, id asc
        ) as row_asc,
        row_number() over(
            partition by yearweek(completed_at)
            order by rating desc, id desc
        ) as row_desc
    from reviews_areas ra
    join reviews r on r.id = ra.review_id
)
select
    week,
    -- Take the average of the possibly 2 median rows
    avg(rating)
from rating_weeks
where
    -- Find the rows which overlap +/- 1. These are the median rows.
    row_asc in (row_desc, row_desc - 1, row_desc + 1)
group by week
order by week

什么的中位数?我想评级)对不起,
reviews\u area
表的中位数是什么意思<代码>在
商店创建,但没有日期?我们需要从
reviews
?reviews.completed\u at中的
completed\u at
列中提取一周。reviews.completed\u at--让我们知道用户提交了评论,它不是草稿。抱歉,但平均值不是median@Schwern,仅供参考,这是平均值,不是MEDIAN@AnApprentice我的错误。我把它改成了中位数。写这个答案我学到了很多。比如为什么SQL没有中值函数?!