Sql 使用case和aggregation语句将值拆分为四分位数

Sql 使用case和aggregation语句将值拆分为四分位数,sql,sql-server,aggregation,quartile,Sql,Sql Server,Aggregation,Quartile,我有一个如下的员工数据集。开始日期是员工的雇佣日期,结束日期是员工离开雇主的日期。两者都是日期数据类型 Gender StartDate EndDate M 2010-04-30 2013-06-18 F 2010-01-09 2015-06-19 M 2009-09-08 2014-08-13 我想根据平均就业月数将员工数据划分为四分位数。结果还包括员工总数(员工栏)、男性员工百分比、女性员工百分比以及平

我有一个如下的员工数据集。开始日期是员工的雇佣日期,结束日期是员工离开雇主的日期。两者都是日期数据类型

Gender   StartDate       EndDate
 M       2010-04-30      2013-06-18
 F       2010-01-09      2015-06-19
 M       2009-09-08      2014-08-13
我想根据平均就业月数将员工数据划分为四分位数。结果还包括员工总数(员工栏)、男性员工百分比、女性员工百分比以及平均就业月数。以下是预期结果:

Quartile    Employee    %Male   %Female   AvgMonths
1             20        60.00    40.00     8.75
2             25        50.00    50.00     28.5
3             10        40.00    60.00     41.25
我想得到25%,50%和75%的四分位数,基于就业月数,取平均数得到平均月数

下面请找到我的查询,我不知道在哪里可以将四分位数计算添加到查询中

declare @current date;
set @current='2012-12-31';

select count(*) as Employees,
       cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) +'%' as Male,
       cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) +'%' as Female,
       AVG(CASE WHEN EndDate is null then DATEDIFF(MONTH, StartDate, @current)
                when EndDate is not null then DATEDIFF(MONTH, StartDate, EndDate)
           end) as AvgMonths
from dbo.DimEmployee
-----------------------更新-------------------------------------

我自己想出来的。请在下面查找代码:

declare @current date;
set @current='2012-12-31';
select count(*) as Employees,
       cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(10)) +'%' as Male,
       cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(10)) +'%' as Female,
       AVG(t.EmployedMonths) as AvgMonths,
       Ntile(3) over (order by t.EmployedMonths asc) as Quartiles
from
      (select EmployeeKey, Gender,
       CASE WHEN EndDate is null then abs(DATEDIFF(MONTH, StartDate, @current))
            when EndDate is not null then abs(DATEDIFF(MONTH, StartDate, EndDate))
            end as EmployedMonths
        from dbo.DimEmployee)t
group by t.EmployedMonths

您可以使用
rank()
count(*)
作为窗口函数来计算四分位数:

select floor( rnk * 4.0 / cnt ) as quartile,
       count(*) as Employees,
       cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) + '%' as Male,
       cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) + '%' as Female,
       AVG(num_months * 1.0) as AvgMonths
from (select e.*,
             rank() over (partition by datediff(month, startdate, coalesce(enddate, @current))) - 1 as rnk,
             count(*) over () as cnt,
             datediff(month, startdate, coalesce(enddate, @current)) as num_months
      from dbo.DimEmployee e
     ) e
group by floor( rnk * 4.0 / cnt )

按四分位数划分员工的标准是什么?您提到了平均就业月数,但在您的结果集中,两个记录都有
AvgMonths=10
。您将如何处理关系?@GMB您好,我想将值分为25%、50%和75%四分位数。对不起,我的坏例子,我编了一个新的。谢谢。我没有说清楚。结果不是我想要的。我想得到25%,50%和75%的四分位数,基于平均月数。在你的例子中,它并不是计算所有的。我不明白你为什么把等级乘以4?@vivianna
*4
/0.25
相同,这是您要将内容分成的组。注意:如果你有联系,你可能没有合适的四分位数。