在Oracle中创建名和姓的唯一字符串

在Oracle中创建名和姓的唯一字符串,oracle,plsql,Oracle,Plsql,我可以通过编程来实现这一点,但我一直在寻找一个更干净的解决方案 假设我有下表: First Name Last Name Smith Albert Smith Alphonse Smith Jason Johnson Charles Roberts Chris Roberts Christian 我想用以下规则创建一个唯一的 如果姓氏已

我可以通过编程来实现这一点,但我一直在寻找一个更干净的解决方案

假设我有下表:

First Name      Last Name
Smith           Albert       
Smith           Alphonse    
Smith           Jason         
Johnson         Charles
Roberts         Chris
Roberts         Christian
我想用以下规则创建一个唯一的

  • 如果姓氏已经是唯一的,只需返回姓氏即可
  • 如果相同的姓氏返回首字母(或更多首字母),后跟句点,则返回姓氏
对于阿尔伯特·史密斯,我将返回阿尔布·史密斯 对于查尔斯·约翰逊,我将返回约翰逊
对于克里斯汀·罗伯茨,我将返回基督·罗伯茨

有没有人想过如何直接在Oracle SQL语句中实现这一点,或者我应该坚持在程序中实现这一点?

试试以下方法:

with
last_names as (
  select last_name, count(*) as last_name_count 
  from table_name 
  group by last_name )

select case 
         when b.last_name_count = 1 then a.last_name 
         else substr(a.first_name,1,1)||'. '||a.last_name 
       end as name
from table_name a
join last_names b
on a.last_name = b.last_name;
用正确的名称替换表名。

版本为(CTE),需要11gR2:

with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
  select last_name, first_name,
    row_number() over (order by last_name, first_name),
    cast (null as varchar2(20)), 0, length(first_name)
  from t42
  union all
  select last_name, first_name, orig_rn,
    part || substr(first_name, part_length + 1, 1),
    part_length + 1,
    remaining - 1
  from t
  where remaining > 0
),
u as (
  select last_name, first_name, orig_rn, part, part_length,
    count(distinct orig_rn) over (partition by last_name) as last_name_count,
    count(distinct orig_rn) over (partition by last_name, part) as part_count
  from t
),
v as (
  select last_name, first_name, orig_rn, part, last_name_count,
  row_number() over (partition by orig_rn order by part_length) as rn
  from u
  where (part_count = 1 or part = first_name)
)
select case when last_name_count = 1 then null
  when part = first_name then first_name || ' '
  else part || '. '
  end || last_name as condendsed_name
from v
where rn = 1
order by orig_rn;
其中:

CONDENSED_NAME                               
----------------------------------------------
Johnson                                        
Chris Roberts                                  
Christ. Roberts                                
Alb. Smith                                     
Alp. Smith                                     
J. Smith                                       

t
CTE是递归的。它从原始表行开始,并为名字的每个可能收缩生成额外的行:

with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
  select last_name, first_name,
    row_number () over (order by last_name, first_name),
    cast (null as varchar2(20)), 0, length(first_name)
  from t42
  union all
  select last_name, first_name, orig_rn,
    part || substr(first_name, part_length + 1, 1),
    part_length + 1,
    remaining - 1
  from t
  where remaining > 0
)
select last_name, first_name, part
from t
where last_name = 'Johnson'
order by orig_rn, part_length;

LAST_NAME            FIRST_NAME           PART                   
-------------------- -------------------- ------------------------
Johnson              Charles                                       
Johnson              Charles              C                        
Johnson              Charles              Ch                       
Johnson              Charles              Cha                      
Johnson              Charles              Char                     
Johnson              Charles              Charl                    
Johnson              Charles              Charle                   
Johnson              Charles              Charles                  
下一个CTE,
u
(是的,对名称感到抱歉,我没有灵感)比较所有行中的值并统计出现的次数。任何计数为
1
的项都是唯一的

...
u as (
  select last_name, first_name, orig_rn, part, part_length,
    count(distinct orig_rn) over (partition by last_name) as last_name_count,
    count(distinct orig_rn) over (partition by last_name, part) as part_count
  from t
)
select last_name, first_name, part, last_name_count, part_count
from u
where last_name = 'Roberts'
order by orig_rn, part_length;

LAST_NAME            FIRST_NAME           PART                     LAST_NAME_COUNT PART_COUNT
-------------------- -------------------- ------------------------ --------------- ----------
Roberts              Chris                                                       2          2 
Roberts              Chris                C                                      2          2 
Roberts              Chris                Ch                                     2          2 
Roberts              Chris                Chr                                    2          2 
Roberts              Chris                Chri                                   2          2 
Roberts              Chris                Chris                                  2          2 
Roberts              Christian                                                   2          2 
Roberts              Christian            C                                      2          2 
Roberts              Christian            Ch                                     2          2 
Roberts              Christian            Chr                                    2          2 
Roberts              Christian            Chri                                   2          2 
Roberts              Christian            Chris                                  2          2 
Roberts              Christian            Christ                                 2          1 
Roberts              Christian            Christi                                2          1 
Roberts              Christian            Christia                               2          1 
Roberts              Christian            Christian                              2          1 
第三个CTE
v
只查看唯一值,然后根据唯一值的长度对它们进行排序;因此,在所有记录中唯一的记录的名字的最短收缩排列为
1

...
v as (
  select last_name, first_name, orig_rn, part, last_name_count,
  row_number() over (partition by orig_rn order by part_length) as rn
  from u
  where (part_count = 1 or part = first_name)
)
select last_name, first_name, part, last_name_count
from v
where rn = 1
order by orig_rn;

LAST_NAME            FIRST_NAME           PART                     LAST_NAME_COUNT
-------------------- -------------------- ------------------------ ---------------
Johnson              Charles                                                     1 
Roberts              Chris                Chris                                  2 
Roberts              Christian            Christ                                 2 
Smith                Albert               Alb                                    3 
Smith                Alphonse             Alp                                    3 
Smith                Jason                J                                      3 
然后,最后的查询只提取那些排名
1
,它们是最短的唯一值,并按照您想要的方式对它们进行格式化

如果两个人的名字完全相同,那么这两个人都会被完整地拼写出来(),这似乎是你想要从你的评论中得到的


不确定这是否真的符合“更干净”的条件,只是它只会在原始表中出现一次。

很有趣,但我很好奇你会怎么做。如果编程是可能的,那么SQL也是可能的。例如,如果有两个“Robert Haddock”和一个“Rob Haddock”,默认情况下,你会修剪最后3个字母,或者系统会如何生成一个昵称?如果两个同名,我想我只会返回Robert Haddock,不带句号。如果我想要罗布·哈多克,它会把罗布·哈多克还给我,因为那是他的全名。所以是的,我想我需要更多的规则。。。从编程的角度讲,它当前会重新查询表,将一个字母添加到名字中,直到得到唯一的返回。试图避免多次sql调用。我了解您的编程方法,请您添加更多示例数据和预期输出。这会给你更好的解决方案。很酷。我得把这件事考虑一下。它似乎满足了所有的要求,所以我将把它标记为答案;您有两个
A.Smith
和两个
C.Roberts
?这很有效,但如果需要,只用于附加第一个首字母。没有更深的层次。