Sql 重复分析
我正试图用Oracle SQL编写一些逻辑,但很难做到正确。首先,我需要我的脚本来识别重复的项目。然后确定重复项的最新项。我正在使用的数据库在应用程序之外有大量的手动数据插入。当使用ID号时,这会导致项目出现顺序错误。我正在使用开始日期和ID号作为测量顺序的方法,因为表中没有其他方法可以这样做 如果我需要确定员工12311的最新角色,我将如何确定 以下是我到目前为止的情况: 桌子 代码 我不想查看每个员工的所有记录并确定最近的记录,而是希望脚本只使用重复的开始日期 基本上,如果最近的STARTDATE是重复的,那么确定哪个ID是最高的 所以它应该是这样的:Sql 重复分析,sql,oracle,plsql,Sql,Oracle,Plsql,我正试图用Oracle SQL编写一些逻辑,但很难做到正确。首先,我需要我的脚本来识别重复的项目。然后确定重复项的最新项。我正在使用的数据库在应用程序之外有大量的手动数据插入。当使用ID号时,这会导致项目出现顺序错误。我正在使用开始日期和ID号作为测量顺序的方法,因为表中没有其他方法可以这样做 如果我需要确定员工12311的最新角色,我将如何确定 以下是我到目前为止的情况: 桌子 代码 我不想查看每个员工的所有记录并确定最近的记录,而是希望脚本只使用重复的开始日期 基本上,如果最近的STARTD
ID | EMPLOYEE | ROLE | STARTDATE | MAX Date | Max ID
-----|----------|------------|----------------------------------|--------
3432 | 12311 | Supervisor | 2016-07-12T00:00:00Z | 1 | 1
3421 | 12311 | Analyst | 2016-07-12T00:00:00Z | 1 | 0
4321 | 12311 | Help Desk | 2014-05-12T00:00:00Z | 0 | 0
5432 | 23432 | Manager | 2012-11-02T00:00:00Z | 1 | 1
3452 | 23432 | Associate | 2011-04-23T00:00:00Z | 0 | 0
7652 | 54332 | Analyst | 2015-10-15T00:00:00Z | 1 | 1
5691 | 54332 | Assistant | 2013-10-15T00:00:00Z | 0 | 0
我完全愿意接受更好的方法。如果您能提供任何帮助,我们将不胜感激
使用解决方案编辑:
感谢@Littlefoot的帮助。我可以修改我的脚本以包括以下内容:
SELECT "ID", "EMPLOYEE", "ROLE", "STARTDATE",
ROW_NUMBER() OVER (PARTITION BY "EMPLOYEE" ORDER BY "STARTDATE" DESC, "ID" DESC) RN
FROM (
SELECT DISTINCT EMPLOYEE "E.EMPLOYEE",
E.ID "ID",
LR.DESCRIPTION "ROLE",
ROLE_START_DATE "STARTDATE"
FROM EMPLOYEES E
JOIN ROLES R ON E.EMPLOYEE_ID = R.EMPLOYEE_ID
JOIN LU_ROLES LR ON R.ROLE_ID = LR.ROLE_ID
WHERE ROLE_START_DATE <= DATE '2017-12-03')
ORDER BY 2
然后,我用RN=1筛选结果
如果我需要确定员工12311的最新角色,我将如何确定
RN最低的那个?当一个列本身执行作业时,为什么需要两个MAX列?例如:
SQL> with test (id, empid, role, startdate) as
2 (select 3432, 12311, 'supervisor', date '2016-07-12' from dual union
3 select 3421, 12311, 'analyst' , date '2016-07-12' from dual union
4 select 4321, 12311, 'help desk' , date '2014-05-12' from dual union
5 --
6 select 5432, 23432, 'manager' , date '2012-11-02' from dual union
7 select 3452, 23432, 'associate' , date '2011-04-23' from dual
8 )
9 select id, empid, role, startdate,
10 row_number() over (partition by empid order by startdate desc, id desc) rn
11 from test;
ID EMPID ROLE STARTDATE RN
---------- ---------- ---------- ---------- ----------
3432 12311 supervisor 2016-07-12 1
3421 12311 analyst 2016-07-12 2
4321 12311 help desk 2014-05-12 3
5432 23432 manager 2012-11-02 1
3452 23432 associate 2011-04-23 2
SQL>
该查询将是另一个查询的源,该查询使用WHERE子句,即
<snip>
9 select id, empid, role, startdate
10 from (select id, empid, role, startdate,
11 row_number() over (partition by empid order by startdate desc, id desc) rn
12 from test
13 )
14 where rn = 1;
ID EMPID ROLE STARTDATE
---------- ---------- ---------- ----------
3432 12311 supervisor 2016-07-12
5432 23432 manager 2012-11-02
SQL>
您可以使用max aggregate with一步完成此操作;简化形式:
select employee,
max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
这使用startdate和id查找“最新”角色;该id仅在startdate上有关联时才相关
在CTE中使用示例数据演示:
with employees (ID, EMPLOYEE, ROLE, STARTDATE) as (
select 3432, 12311, 'Supervisor', timestamp '2016-07-12 00:00:00 UTC' from dual
union all select 3421, 12311, 'Analyst', timestamp '2016-07-12 00:00:00 UTC' from dual
union all select 4321, 12311, 'Help Desk', timestamp '2014-05-12 00:00:00 UTC' from dual
union all select 5432, 23432, 'Manager', timestamp '2012-11-02 00:00:00 UTC' from dual
union all select 3452, 23432, 'Associate', timestamp '2011-04-23 00:00:00 UTC' from dual
union all select 7652, 54332, 'Analyst', timestamp '2015-10-15 00:00:00 UTC' from dual
union all select 5691, 54332, 'Assistant', timestamp '2013-10-15 00:00:00 UTC' from dual
)
select employee,
max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
order by employee;
EMPLOYEE ROLE
---------- ----------
12311 Supervisor
23432 Manager
54332 Analyst
您可以对联接的表使用相同的函数,而无需手动计算排名。我将使用keep:
谢谢你!我需要它更具可伸缩性,所以我从嵌套的select语句中提取了它。到目前为止,它似乎正在发挥作用。我喜欢这个,因为它不是简单的1,0-它的排名。从SELECT DISTINCT EMPLOYEE E.EMPLOYEE中选择ID、empid order、STARTDATE desc、ID desc rn的ID、EMPLOYEE、角色、STARTDATE、分区上的行号。。。。。我还在尝试其他一些方法,看看哪一种效率最高。但这太棒了!再次感谢!不客气;如果这有帮助,我很高兴。如果我可以建议的话:摆脱使用双引号命名Oracle对象和列的坏习惯。它只会带来问题。默认情况下,它们都是以大写字母创建的,但您可以以任何方式引用它们。但是,如果用双引号括起来,在创建这些对象时,您必须始终遵循使用的小写/大写/混合大小写。谢谢你的提示!
SQL> with test (id, empid, role, startdate) as
2 (select 3432, 12311, 'supervisor', date '2016-07-12' from dual union
3 select 3421, 12311, 'analyst' , date '2016-07-12' from dual union
4 select 4321, 12311, 'help desk' , date '2014-05-12' from dual union
5 --
6 select 5432, 23432, 'manager' , date '2012-11-02' from dual union
7 select 3452, 23432, 'associate' , date '2011-04-23' from dual
8 )
9 select id, empid, role, startdate,
10 row_number() over (partition by empid order by startdate desc, id desc) rn
11 from test;
ID EMPID ROLE STARTDATE RN
---------- ---------- ---------- ---------- ----------
3432 12311 supervisor 2016-07-12 1
3421 12311 analyst 2016-07-12 2
4321 12311 help desk 2014-05-12 3
5432 23432 manager 2012-11-02 1
3452 23432 associate 2011-04-23 2
SQL>
<snip>
9 select id, empid, role, startdate
10 from (select id, empid, role, startdate,
11 row_number() over (partition by empid order by startdate desc, id desc) rn
12 from test
13 )
14 where rn = 1;
ID EMPID ROLE STARTDATE
---------- ---------- ---------- ----------
3432 12311 supervisor 2016-07-12
5432 23432 manager 2012-11-02
SQL>
select employee,
max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
with employees (ID, EMPLOYEE, ROLE, STARTDATE) as (
select 3432, 12311, 'Supervisor', timestamp '2016-07-12 00:00:00 UTC' from dual
union all select 3421, 12311, 'Analyst', timestamp '2016-07-12 00:00:00 UTC' from dual
union all select 4321, 12311, 'Help Desk', timestamp '2014-05-12 00:00:00 UTC' from dual
union all select 5432, 23432, 'Manager', timestamp '2012-11-02 00:00:00 UTC' from dual
union all select 3452, 23432, 'Associate', timestamp '2011-04-23 00:00:00 UTC' from dual
union all select 7652, 54332, 'Analyst', timestamp '2015-10-15 00:00:00 UTC' from dual
union all select 5691, 54332, 'Assistant', timestamp '2013-10-15 00:00:00 UTC' from dual
)
select employee,
max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
order by employee;
EMPLOYEE ROLE
---------- ----------
12311 Supervisor
23432 Manager
54332 Analyst
SELECT EMPLOYEE as "E.EMPLOYEE",
E.ID as "ID",
MAX(LR.DESCRIPTION) KEEP (DENSE_RANK FIRST ORDER BY ROLE_START_DATE DESC) as "ROLE",
MAX(ROLE_START_DATE) as "STARTDATE"
FROM EMPLOYEES E JOIN
ROLES R
ON E.EMPLOYEE_ID = R.EMPLOYEE_ID JOIN
LU_ROLES LR
ON R.ROLE_ID = LR.ROLE_ID
WHERE ROLE_START_DATE <= DATE '2017-12-03'
GROUP BY EMPLOYEE;