Sql 重复分析

Sql 重复分析,sql,oracle,plsql,Sql,Oracle,Plsql,我正试图用Oracle SQL编写一些逻辑,但很难做到正确。首先,我需要我的脚本来识别重复的项目。然后确定重复项的最新项。我正在使用的数据库在应用程序之外有大量的手动数据插入。当使用ID号时,这会导致项目出现顺序错误。我正在使用开始日期和ID号作为测量顺序的方法,因为表中没有其他方法可以这样做 如果我需要确定员工12311的最新角色,我将如何确定 以下是我到目前为止的情况: 桌子 代码 我不想查看每个员工的所有记录并确定最近的记录,而是希望脚本只使用重复的开始日期 基本上,如果最近的STARTD

我正试图用Oracle SQL编写一些逻辑,但很难做到正确。首先,我需要我的脚本来识别重复的项目。然后确定重复项的最新项。我正在使用的数据库在应用程序之外有大量的手动数据插入。当使用ID号时,这会导致项目出现顺序错误。我正在使用开始日期和ID号作为测量顺序的方法,因为表中没有其他方法可以这样做

如果我需要确定员工12311的最新角色,我将如何确定

以下是我到目前为止的情况:

桌子

代码

我不想查看每个员工的所有记录并确定最近的记录,而是希望脚本只使用重复的开始日期

基本上,如果最近的STARTDATE是重复的,那么确定哪个ID是最高的

所以它应该是这样的:

  ID | EMPLOYEE |       ROLE |   STARTDATE           | MAX Date | Max ID
-----|----------|------------|----------------------------------|--------
3432 |    12311 | Supervisor |  2016-07-12T00:00:00Z |        1 |      1
3421 |    12311 | Analyst    |  2016-07-12T00:00:00Z |        1 |      0
4321 |    12311 | Help Desk  |  2014-05-12T00:00:00Z |        0 |      0
5432 |    23432 | Manager    |  2012-11-02T00:00:00Z |        1 |      1
3452 |    23432 | Associate  |  2011-04-23T00:00:00Z |        0 |      0
7652 |    54332 | Analyst    |  2015-10-15T00:00:00Z |        1 |      1
5691 |    54332 | Assistant  |  2013-10-15T00:00:00Z |        0 |      0
我完全愿意接受更好的方法。如果您能提供任何帮助,我们将不胜感激

使用解决方案编辑:

感谢@Littlefoot的帮助。我可以修改我的脚本以包括以下内容:

   SELECT "ID", "EMPLOYEE", "ROLE", "STARTDATE",
    ROW_NUMBER() OVER (PARTITION BY "EMPLOYEE" ORDER BY "STARTDATE" DESC, "ID" DESC) RN
    FROM (
    SELECT DISTINCT EMPLOYEE "E.EMPLOYEE",
    E.ID "ID",
    LR.DESCRIPTION "ROLE", 
    ROLE_START_DATE "STARTDATE"
    FROM EMPLOYEES E
    JOIN ROLES R ON E.EMPLOYEE_ID = R.EMPLOYEE_ID
    JOIN LU_ROLES LR ON R.ROLE_ID = LR.ROLE_ID
    WHERE ROLE_START_DATE <= DATE '2017-12-03')
    ORDER BY 2
然后,我用RN=1筛选结果

如果我需要确定员工12311的最新角色,我将如何确定

RN最低的那个?当一个列本身执行作业时,为什么需要两个MAX列?例如:

SQL> with test (id, empid, role, startdate) as
  2    (select 3432, 12311, 'supervisor', date '2016-07-12' from dual union
  3     select 3421, 12311, 'analyst'   , date '2016-07-12' from dual union
  4     select 4321, 12311, 'help desk' , date '2014-05-12' from dual union
  5     --
  6     select 5432, 23432, 'manager'   , date '2012-11-02' from dual union
  7     select 3452, 23432, 'associate' , date '2011-04-23' from dual
  8    )
  9  select id, empid, role, startdate,
 10    row_number() over (partition by empid order by startdate desc, id desc) rn
 11  from test;

        ID      EMPID ROLE       STARTDATE          RN
---------- ---------- ---------- ---------- ----------
      3432      12311 supervisor 2016-07-12          1
      3421      12311 analyst    2016-07-12          2
      4321      12311 help desk  2014-05-12          3
      5432      23432 manager    2012-11-02          1
      3452      23432 associate  2011-04-23          2

SQL>
该查询将是另一个查询的源,该查询使用WHERE子句,即

  <snip>
  9  select id, empid, role, startdate
 10  from (select id, empid, role, startdate,
 11          row_number() over (partition by empid order by startdate desc, id desc) rn
 12        from test
 13       )
 14  where rn = 1;

        ID      EMPID ROLE       STARTDATE
---------- ---------- ---------- ----------
      3432      12311 supervisor 2016-07-12
      5432      23432 manager    2012-11-02

SQL>
您可以使用max aggregate with一步完成此操作;简化形式:

select employee,
  max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
这使用startdate和id查找“最新”角色;该id仅在startdate上有关联时才相关

在CTE中使用示例数据演示:

with employees (ID, EMPLOYEE, ROLE, STARTDATE) as (
            select 3432, 12311, 'Supervisor', timestamp '2016-07-12 00:00:00 UTC' from dual
  union all select 3421, 12311, 'Analyst', timestamp '2016-07-12 00:00:00 UTC' from dual
  union all select 4321, 12311, 'Help Desk', timestamp '2014-05-12 00:00:00 UTC' from dual
  union all select 5432, 23432, 'Manager', timestamp '2012-11-02 00:00:00 UTC' from dual
  union all select 3452, 23432, 'Associate', timestamp '2011-04-23 00:00:00 UTC' from dual
  union all select 7652, 54332, 'Analyst', timestamp '2015-10-15 00:00:00 UTC' from dual
  union all select 5691, 54332, 'Assistant', timestamp '2013-10-15 00:00:00 UTC' from dual
)
select employee,
  max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
order by employee;

  EMPLOYEE ROLE      
---------- ----------
     12311 Supervisor
     23432 Manager   
     54332 Analyst   
您可以对联接的表使用相同的函数,而无需手动计算排名。

我将使用keep:


谢谢你!我需要它更具可伸缩性,所以我从嵌套的select语句中提取了它。到目前为止,它似乎正在发挥作用。我喜欢这个,因为它不是简单的1,0-它的排名。从SELECT DISTINCT EMPLOYEE E.EMPLOYEE中选择ID、empid order、STARTDATE desc、ID desc rn的ID、EMPLOYEE、角色、STARTDATE、分区上的行号。。。。。我还在尝试其他一些方法,看看哪一种效率最高。但这太棒了!再次感谢!不客气;如果这有帮助,我很高兴。如果我可以建议的话:摆脱使用双引号命名Oracle对象和列的坏习惯。它只会带来问题。默认情况下,它们都是以大写字母创建的,但您可以以任何方式引用它们。但是,如果用双引号括起来,在创建这些对象时,您必须始终遵循使用的小写/大写/混合大小写。谢谢你的提示!
SQL> with test (id, empid, role, startdate) as
  2    (select 3432, 12311, 'supervisor', date '2016-07-12' from dual union
  3     select 3421, 12311, 'analyst'   , date '2016-07-12' from dual union
  4     select 4321, 12311, 'help desk' , date '2014-05-12' from dual union
  5     --
  6     select 5432, 23432, 'manager'   , date '2012-11-02' from dual union
  7     select 3452, 23432, 'associate' , date '2011-04-23' from dual
  8    )
  9  select id, empid, role, startdate,
 10    row_number() over (partition by empid order by startdate desc, id desc) rn
 11  from test;

        ID      EMPID ROLE       STARTDATE          RN
---------- ---------- ---------- ---------- ----------
      3432      12311 supervisor 2016-07-12          1
      3421      12311 analyst    2016-07-12          2
      4321      12311 help desk  2014-05-12          3
      5432      23432 manager    2012-11-02          1
      3452      23432 associate  2011-04-23          2

SQL>
  <snip>
  9  select id, empid, role, startdate
 10  from (select id, empid, role, startdate,
 11          row_number() over (partition by empid order by startdate desc, id desc) rn
 12        from test
 13       )
 14  where rn = 1;

        ID      EMPID ROLE       STARTDATE
---------- ---------- ---------- ----------
      3432      12311 supervisor 2016-07-12
      5432      23432 manager    2012-11-02

SQL>
select employee,
  max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
with employees (ID, EMPLOYEE, ROLE, STARTDATE) as (
            select 3432, 12311, 'Supervisor', timestamp '2016-07-12 00:00:00 UTC' from dual
  union all select 3421, 12311, 'Analyst', timestamp '2016-07-12 00:00:00 UTC' from dual
  union all select 4321, 12311, 'Help Desk', timestamp '2014-05-12 00:00:00 UTC' from dual
  union all select 5432, 23432, 'Manager', timestamp '2012-11-02 00:00:00 UTC' from dual
  union all select 3452, 23432, 'Associate', timestamp '2011-04-23 00:00:00 UTC' from dual
  union all select 7652, 54332, 'Analyst', timestamp '2015-10-15 00:00:00 UTC' from dual
  union all select 5691, 54332, 'Assistant', timestamp '2013-10-15 00:00:00 UTC' from dual
)
select employee,
  max(role) keep (dense_rank last order by startdate, id) as role
from employees
group by employee
order by employee;

  EMPLOYEE ROLE      
---------- ----------
     12311 Supervisor
     23432 Manager   
     54332 Analyst   
SELECT EMPLOYEE as "E.EMPLOYEE",
       E.ID as "ID",
       MAX(LR.DESCRIPTION) KEEP (DENSE_RANK FIRST ORDER BY ROLE_START_DATE DESC) as "ROLE", 
       MAX(ROLE_START_DATE) as "STARTDATE"
FROM EMPLOYEES E JOIN
     ROLES R
     ON E.EMPLOYEE_ID = R.EMPLOYEE_ID JOIN
     LU_ROLES LR
     ON R.ROLE_ID = LR.ROLE_ID
WHERE ROLE_START_DATE <= DATE '2017-12-03'
GROUP BY EMPLOYEE;