使用MySQL,如何计算具有左连接记录的中值?
鉴于以下两个表格,我想知道如何计算每周评论的中位数 评论使用MySQL,如何计算具有左连接记录的中值?,mysql,sql,Mysql,Sql,鉴于以下两个表格,我想知道如何计算每周评论的中位数 评论 (id, user_id, completed_at) reviews.completed_at -- lets us know the user submitted the review, it's not a draft. reviews_areas (created_at, review_id, rating) reviews_areas.rating = INT between 0…10) 审查各领域 (id, user_
(id, user_id, completed_at)
reviews.completed_at -- lets us know the user submitted the review, it's not a draft.
reviews_areas (created_at, review_id, rating)
reviews_areas.rating = INT between 0…10)
审查各领域
(id, user_id, completed_at)
reviews.completed_at -- lets us know the user submitted the review, it's not a draft.
reviews_areas (created_at, review_id, rating)
reviews_areas.rating = INT between 0…10)
示例数据:
评论:
+----+---------+---------------------+
| id | user_id | completed_at |
+----+---------+---------------------+
| 1 | 100 | 2019-07-20 11:34:40 |
| 2 | 100 | 2019-07-22 11:34:40 |
| 3 | 500 | 2019-07-30 16:34:40 |
+----+---------+---------------------+
+------------+-----------+--------+
| created_at | review_id | rating |
+------------+-----------+--------+
| 1:34:40 | 1 | 0 |
| 12:34:40 | 1 | 5 |
| 11:34:40 | 1 | 10 |
| 5:34:40 | 1 | 9 |
| 6:34:40 | 2 | 1 |
| 1:34:40 | 2 | 5 |
| 2:32:40 | 3 | 5 |
+------------+-----------+--------+
评论\u领域:
+----+---------+---------------------+
| id | user_id | completed_at |
+----+---------+---------------------+
| 1 | 100 | 2019-07-20 11:34:40 |
| 2 | 100 | 2019-07-22 11:34:40 |
| 3 | 500 | 2019-07-30 16:34:40 |
+----+---------+---------------------+
+------------+-----------+--------+
| created_at | review_id | rating |
+------------+-----------+--------+
| 1:34:40 | 1 | 0 |
| 12:34:40 | 1 | 5 |
| 11:34:40 | 1 | 10 |
| 5:34:40 | 1 | 9 |
| 6:34:40 | 2 | 1 |
| 1:34:40 | 2 | 5 |
| 2:32:40 | 3 | 5 |
+------------+-----------+--------+
问题还不清楚。我将使用
reviews.completed\u at
作为日期,因为reviews\u区域。created\u at
只包含一个时间
我们需要对日期进行reviews
,对评分区域进行reviews\u
为了避免不同年份的同一周重叠,我们使用将日期转换为年+周
为了得到中位数,我们需要找到每周的中间一行(如果有偶数的话,也可以是中间一行)。有很多方法可以做到这一点。我要买婴儿床。我们按升序和降序计算行数()。重叠的+/-1为中间行。然后我们平均他们
1 2 3 4 5 6
6 5 4 3 2 1
^^^
median rows
首先,我们按周计算行数
select
yearweek(completed_at) as week,
rating,
row_number() over(
partition by yearweek(completed_at)
order by rating asc, id asc
) as row_asc,
row_number() over(
partition by yearweek(completed_at)
order by rating desc, id desc
) as row_desc
from reviews_areas ra
join reviews r on r.id = ra.review_id
行号按等级asc、id asc的顺序排列。id
是第二种排序,用于消除具有相同评级的行的歧义
+--------+--------+---------+----------+
| week | rating | row_asc | row_desc |
+--------+--------+---------+----------+
| 201928 | 10 | 4 | 1 |
| 201928 | 9 | 3 | 2 |
| 201928 | 5 | 2 | 3 |
| 201928 | 0 | 1 | 4 |
| 201929 | 5 | 2 | 1 |
| 201929 | 1 | 1 | 2 |
| 201930 | 5 | 1 | 1 |
+--------+--------+---------+----------+
然后我们用它作为平均值来计算每周中间几行。子查询也同样有效
with rating_weeks as (
select
yearweek(completed_at) as week,
rating,
row_number() over(
partition by yearweek(completed_at)
order by rating asc, id asc
) as row_asc,
row_number() over(
partition by yearweek(completed_at)
order by rating desc, id desc
) as row_desc
from reviews_areas ra
join reviews r on r.id = ra.review_id
)
select
week,
-- Take the average of the possibly 2 median rows
avg(rating)
from rating_weeks
where
-- Find the rows which overlap +/- 1. These are the median rows.
row_asc in (row_desc, row_desc - 1, row_desc + 1)
group by week
order by week
什么的中位数?我想评级)对不起,reviews\u area
表的中位数是什么意思<代码>在
商店创建,但没有日期?我们需要从reviews
?reviews.completed\u at中的completed\u at
列中提取一周。reviews.completed\u at--让我们知道用户提交了评论,它不是草稿。抱歉,但平均值不是median@Schwern,仅供参考,这是平均值,不是MEDIAN@AnApprentice我的错误。我把它改成了中位数。写这个答案我学到了很多。比如为什么SQL没有中值函数?!