Sql 使用case和aggregation语句将值拆分为四分位数
我有一个如下的员工数据集。开始日期是员工的雇佣日期,结束日期是员工离开雇主的日期。两者都是日期数据类型Sql 使用case和aggregation语句将值拆分为四分位数,sql,sql-server,aggregation,quartile,Sql,Sql Server,Aggregation,Quartile,我有一个如下的员工数据集。开始日期是员工的雇佣日期,结束日期是员工离开雇主的日期。两者都是日期数据类型 Gender StartDate EndDate M 2010-04-30 2013-06-18 F 2010-01-09 2015-06-19 M 2009-09-08 2014-08-13 我想根据平均就业月数将员工数据划分为四分位数。结果还包括员工总数(员工栏)、男性员工百分比、女性员工百分比以及平
Gender StartDate EndDate
M 2010-04-30 2013-06-18
F 2010-01-09 2015-06-19
M 2009-09-08 2014-08-13
我想根据平均就业月数将员工数据划分为四分位数。结果还包括员工总数(员工栏)、男性员工百分比、女性员工百分比以及平均就业月数。以下是预期结果:
Quartile Employee %Male %Female AvgMonths
1 20 60.00 40.00 8.75
2 25 50.00 50.00 28.5
3 10 40.00 60.00 41.25
我想得到25%,50%和75%的四分位数,基于就业月数,取平均数得到平均月数
下面请找到我的查询,我不知道在哪里可以将四分位数计算添加到查询中
declare @current date;
set @current='2012-12-31';
select count(*) as Employees,
cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) +'%' as Male,
cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) +'%' as Female,
AVG(CASE WHEN EndDate is null then DATEDIFF(MONTH, StartDate, @current)
when EndDate is not null then DATEDIFF(MONTH, StartDate, EndDate)
end) as AvgMonths
from dbo.DimEmployee
-----------------------更新-------------------------------------
我自己想出来的。请在下面查找代码:
declare @current date;
set @current='2012-12-31';
select count(*) as Employees,
cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(10)) +'%' as Male,
cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(10)) +'%' as Female,
AVG(t.EmployedMonths) as AvgMonths,
Ntile(3) over (order by t.EmployedMonths asc) as Quartiles
from
(select EmployeeKey, Gender,
CASE WHEN EndDate is null then abs(DATEDIFF(MONTH, StartDate, @current))
when EndDate is not null then abs(DATEDIFF(MONTH, StartDate, EndDate))
end as EmployedMonths
from dbo.DimEmployee)t
group by t.EmployedMonths
您可以使用
rank()
和count(*)
作为窗口函数来计算四分位数:
select floor( rnk * 4.0 / cnt ) as quartile,
count(*) as Employees,
cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) + '%' as Male,
cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) + '%' as Female,
AVG(num_months * 1.0) as AvgMonths
from (select e.*,
rank() over (partition by datediff(month, startdate, coalesce(enddate, @current))) - 1 as rnk,
count(*) over () as cnt,
datediff(month, startdate, coalesce(enddate, @current)) as num_months
from dbo.DimEmployee e
) e
group by floor( rnk * 4.0 / cnt )
按四分位数划分员工的标准是什么?您提到了平均就业月数,但在您的结果集中,两个记录都有
AvgMonths=10
。您将如何处理关系?@GMB您好,我想将值分为25%、50%和75%四分位数。对不起,我的坏例子,我编了一个新的。谢谢。我没有说清楚。结果不是我想要的。我想得到25%,50%和75%的四分位数,基于平均月数。在你的例子中,它并不是计算所有的。我不明白你为什么把等级乘以4?@vivianna*4
与/0.25
相同,这是您要将内容分成的组。注意:如果你有联系,你可能没有合适的四分位数。