Sql 时间戳变量的不规则分组
我有一张如下安排的桌子:Sql 时间戳变量的不规则分组,sql,postgresql,Sql,Postgresql,我有一张如下安排的桌子: id lateAt 1231235 2019/09/14 1242123 2019/09/13 3465345 NULL 5676548 2019/09/28 8986475 2019/09/23 其中lateAt是某笔贷款的付款延迟的时间戳。因此,对于当前的每个日期-我需要每天查看这些数字-有一定数量的条目延迟0-15、15-30、30-45、45-60、60-90和90天以上 这是我想要的输出: lateGroup Count 0-15
id lateAt
1231235 2019/09/14
1242123 2019/09/13
3465345 NULL
5676548 2019/09/28
8986475 2019/09/23
其中lateAt是某笔贷款的付款延迟的时间戳。因此,对于当前的每个日期-我需要每天查看这些数字-有一定数量的条目延迟0-15、15-30、30-45、45-60、60-90和90天以上
这是我想要的输出:
lateGroup Count
0-15 20
15-30 22
30-45 25
45-60 32
60-90 47
90+ 57
这是我可以在R中轻松计算的东西,但要将结果返回到BI仪表板,我必须在数据库中创建一个新表,我认为这不是一个好的做法。解决此问题的SQL本机方法是什么?您没有提到正在使用的DBMS,但几乎所有DBMS都有一个称为值构造函数的构造,如下所示:
select bins.lateGroup, bins.minVal, bins.maxVal FROM
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
--- example from SQL Server 2012 SP1
--- first let's set up some sample data
create table #temp (id int, lateAt datetime);
INSERT #temp (id, lateAt) values
(1231235,'2019-09-14'),
(1242123,'2019-09-13'),
(3465345,NULL),
(5676548,'2019-09-28'),
(8986475,'2019-09-23');
--- here's the actual query
select lateGroup, count(*) as Count
from #temp as T,
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
) AS bins(lateGroup,minVal,maxVal)
where datediff(day,lateAt,getdate()) between minVal and maxVal
group by lateGroup
order by lateGroup
--- remove our sample data
drop table #temp;
如果您的DBMS没有,那么您可能可以使用UNION ALL:
然后,您的完整查询以及您提供的示例数据如下所示:
select bins.lateGroup, bins.minVal, bins.maxVal FROM
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
--- example from SQL Server 2012 SP1
--- first let's set up some sample data
create table #temp (id int, lateAt datetime);
INSERT #temp (id, lateAt) values
(1231235,'2019-09-14'),
(1242123,'2019-09-13'),
(3465345,NULL),
(5676548,'2019-09-28'),
(8986475,'2019-09-23');
--- here's the actual query
select lateGroup, count(*) as Count
from #temp as T,
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
) AS bins(lateGroup,minVal,maxVal)
where datediff(day,lateAt,getdate()) between minVal and maxVal
group by lateGroup
order by lateGroup
--- remove our sample data
drop table #temp;
以下是输出:
晚群计数
15-30 2
30-45 2
注意:不计算延迟为null的行。在SQL中执行此操作的快捷方式是:
SELECT '0-15' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 0
AND (CURRENT_DATE - t.lateAt) < 15
UNION
SELECT '15-30' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 15
AND (CURRENT_DATE - t.lateAt) < 30
UNION
SELECT '30-45' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 30
AND (CURRENT_DATE - t.lateAt) < 45
-- Etc...
对于生产代码,您可能希望执行更像Ross答案的操作。我认为您可以在一个清晰的查询中完成所有操作:
with cte_lategroup as
(
select *
from (values(0,15,'0-15'),(15,30,'15-30'),(30,45,'30-45')) as t (mini, maxi, designation)
)
select
t2.designation
, count(*)
from test t
left outer join cte_lategroup t2
on current_date - t.lateat >= t2.mini
and current_date - lateat < t2.maxi
group by t2.designation;
我将使用a定义延迟组,根据天数加入:
with groups (grp) as (
values
(int4range(0,15, '[)')),
(int4range(15,30, '[)')),
(int4range(30,45, '[)')),
(int4range(45,60, '[)')),
(int4range(60,90, '[)')),
(int4range(90,null, '[)'))
)
select grp, count(t.user_id)
from groups g
left join the_table t on g.grp @> current_date - t.late_at
group by grp
order by grp;
int4range0,15,“[”创建一个从0(包含)到15(独占)的范围
在线示例:那么迟到组是根据今天和迟到列之间的差异计算出来的?因此id=1231235属于30-45组,因为今天晚了32天2019-10-16我在使用postgresql,您的第二条评论是对的。您的组重叠。迟到15天的id是属于0-15组还是属于15-30组oup或两者都有疑问?很好。它应该在下确界包含,在上确界独占。如果不使用between,则不需要此小部分的增加part@a_horse_with_no_name-你当然是对的。这归结为在不需要的详细内容之间的选择:在每个上限中添加分数,或者复制datediff是where子句中表达式的一部分。我发现,当每一行上的更改相同时,重复它实际上会增加重新阅读时的清晰度。我真希望SQL Server具有范围。