Sql 直方图数据的百分位数
下表显示了一些考试的学生成绩数据Sql 直方图数据的百分位数,sql,postgresql,Sql,Postgresql,下表显示了一些考试的学生成绩数据 CREATE TABLE grades AS SELECT name, exams, grade_poor, grade_fair, grade_good, grade_vgood FROM ( VALUES ( 'arun' , 8 , 1 , 4 , 2 , 1 ), ( 'neha' , 10 , 3 , 2 , 1 , 4 ), ( 'ram' , 5 , 1 , 1 , 3 , 0 ), ( 'rad
CREATE TABLE grades
AS
SELECT name, exams, grade_poor, grade_fair, grade_good, grade_vgood
FROM ( VALUES
( 'arun' , 8 , 1 , 4 , 2 , 1 ),
( 'neha' , 10 , 3 , 2 , 1 , 4 ),
( 'ram' , 5 , 1 , 1 , 3 , 0 ),
( 'radha' , 8 , 0 , 3 , 1 , 4 )
) AS t(name,exams,grade_poor,grade_fair,grade_good,grade_vgood);
等级的排列顺序为:vgood>good>fair>poor
有没有可能(或者说有意义)用这些数据找到每个学生的第50个百分位分数?例如,如果学生姓名arun
我们将数据视为一系列年级类别,那么第50个百分位将是grade\u fair
选择姓名、考试、,
SELECT name, exams,
CASE WHEN 0.5 * exams <= grade_poor
THEN 'grade_poor'
WHEN 0.5 * exams <= grade_poor + grade_fair
THEN 'grade_fair'
WHEN 0.5 * exams <= grade_poor + grade_fair + grade_good
THEN 'grade_good'
ELSE 'grade_vgood' END AS median_grade;
当0.5*考试时,首先需要取消此项。我们可以这样做
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
name | array
-------+-----------
arun | {1,4,2,1}
neha | {3,2,1,4}
ram | {1,1,3,0}
radha | {0,3,1,4}
然后我们需要索引到等级。。。我们使用横向交叉连接。我们有4行,数组为4。我们要4*4排
SELECT name, grades, gs1.x, grades[gs1.x] AS gradeqty
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
ORDER BY name, x;
name | grades | x | gradeqty
-------+-----------+---+----------
arun | {1,4,2,1} | 1 | 1
arun | {1,4,2,1} | 2 | 4
arun | {1,4,2,1} | 3 | 2
arun | {1,4,2,1} | 4 | 1
neha | {3,2,1,4} | 1 | 3
neha | {3,2,1,4} | 2 | 2
neha | {3,2,1,4} | 3 | 1
neha | {3,2,1,4} | 4 | 4
radha | {0,3,1,4} | 1 | 0
radha | {0,3,1,4} | 2 | 3
radha | {0,3,1,4} | 3 | 1
radha | {0,3,1,4} | 4 | 4
ram | {1,1,3,0} | 1 | 1
ram | {1,1,3,0} | 2 | 1
ram | {1,1,3,0} | 3 | 3
ram | {1,1,3,0} | 4 | 0
(16 rows)
现在剩下的是,我们需要再次交叉连接横向
,以复制x(我们的等级),超过等级数量
SELECT name,
gs1.x
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
ORDER BY name, gs1.x;
name | x
-------+---
arun | 1
arun | 2
arun | 2
arun | 2
arun | 2
arun | 3
arun | 3
arun | 4
neha | 1
neha | 1
neha | 1
neha | 2
neha | 2
neha | 3
neha | 4
neha | 4
neha | 4
neha | 4
radha | 2
radha | 2
radha | 2
radha | 3
radha | 4
radha | 4
radha | 4
radha | 4
ram | 1
ram | 2
ram | 3
ram | 3
ram | 3
(31 rows)
现在,我们按名称分组,然后使用
SELECT name, percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
GROUP BY name ORDER BY name;
name | percentile_disc
-------+-----------------
arun | 2
neha | 2
radha | 3
ram | 3
(4 rows)
想更进一步,让它更漂亮
SELECT name, (ARRAY['Poor', 'Fair', 'Good', 'Very Good'])[percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)]
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
GROUP BY name
ORDER BY name;
name | array
-------+-------
arun | Fair
neha | Fair
radha | Good
ram | Good
(4 rows)
如果我们增加一个新用户,我们的输出会稍微多样化一些
INSERT INTO grades (name,grade_poor,grade_fair,grade_good,grade_vgood)
VALUES ('Bob', 0,0,0,100);
name | array
-------+-----------
arun | Fair
Bob | Very Good
neha | Fair
radha | Good
ram | Good
(5 rows)
你想如何处理平局,比如neha和radha的情况?不太确定,掷硬币吧?对不起,你运气不好。PostgreSQL不能掷硬币。我想我们可以得出结论,没有解决方案!!当出现平局时-掷硬币=>概率可以是0.5。感谢详细解释-在使用有序集合聚合函数之前,我已经完成了所有步骤。这一个给出了错误-我甚至看不出来。但这可能是一个问题。SQL fiddle在所有方面都很差劲。永远不要使用它,ever.SQLfiddle使用PG9.3,有序集聚合函数不存在,这是一个很好的答案,很好地解释了这个非常复杂的查询如何处理如此简单的问题。当
查询时,它也会产生与低级的情况完全相同的结果。