在Oracle中创建名和姓的唯一字符串
我可以通过编程来实现这一点,但我一直在寻找一个更干净的解决方案 假设我有下表:在Oracle中创建名和姓的唯一字符串,oracle,plsql,Oracle,Plsql,我可以通过编程来实现这一点,但我一直在寻找一个更干净的解决方案 假设我有下表: First Name Last Name Smith Albert Smith Alphonse Smith Jason Johnson Charles Roberts Chris Roberts Christian 我想用以下规则创建一个唯一的 如果姓氏已
First Name Last Name
Smith Albert
Smith Alphonse
Smith Jason
Johnson Charles
Roberts Chris
Roberts Christian
我想用以下规则创建一个唯一的
- 如果姓氏已经是唯一的,只需返回姓氏即可
- 如果相同的姓氏返回首字母(或更多首字母),后跟句点,则返回姓氏
对于克里斯汀·罗伯茨,我将返回基督·罗伯茨 有没有人想过如何直接在Oracle SQL语句中实现这一点,或者我应该坚持在程序中实现这一点?试试以下方法:
with
last_names as (
select last_name, count(*) as last_name_count
from table_name
group by last_name )
select case
when b.last_name_count = 1 then a.last_name
else substr(a.first_name,1,1)||'. '||a.last_name
end as name
from table_name a
join last_names b
on a.last_name = b.last_name;
用正确的名称替换表名。版本为(CTE),需要11gR2:
with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
select last_name, first_name,
row_number() over (order by last_name, first_name),
cast (null as varchar2(20)), 0, length(first_name)
from t42
union all
select last_name, first_name, orig_rn,
part || substr(first_name, part_length + 1, 1),
part_length + 1,
remaining - 1
from t
where remaining > 0
),
u as (
select last_name, first_name, orig_rn, part, part_length,
count(distinct orig_rn) over (partition by last_name) as last_name_count,
count(distinct orig_rn) over (partition by last_name, part) as part_count
from t
),
v as (
select last_name, first_name, orig_rn, part, last_name_count,
row_number() over (partition by orig_rn order by part_length) as rn
from u
where (part_count = 1 or part = first_name)
)
select case when last_name_count = 1 then null
when part = first_name then first_name || ' '
else part || '. '
end || last_name as condendsed_name
from v
where rn = 1
order by orig_rn;
其中:
CONDENSED_NAME
----------------------------------------------
Johnson
Chris Roberts
Christ. Roberts
Alb. Smith
Alp. Smith
J. Smith
t
CTE是递归的。它从原始表行开始,并为名字的每个可能收缩生成额外的行:
with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
select last_name, first_name,
row_number () over (order by last_name, first_name),
cast (null as varchar2(20)), 0, length(first_name)
from t42
union all
select last_name, first_name, orig_rn,
part || substr(first_name, part_length + 1, 1),
part_length + 1,
remaining - 1
from t
where remaining > 0
)
select last_name, first_name, part
from t
where last_name = 'Johnson'
order by orig_rn, part_length;
LAST_NAME FIRST_NAME PART
-------------------- -------------------- ------------------------
Johnson Charles
Johnson Charles C
Johnson Charles Ch
Johnson Charles Cha
Johnson Charles Char
Johnson Charles Charl
Johnson Charles Charle
Johnson Charles Charles
下一个CTE,u
(是的,对名称感到抱歉,我没有灵感)比较所有行中的值并统计出现的次数。任何计数为1
的项都是唯一的
...
u as (
select last_name, first_name, orig_rn, part, part_length,
count(distinct orig_rn) over (partition by last_name) as last_name_count,
count(distinct orig_rn) over (partition by last_name, part) as part_count
from t
)
select last_name, first_name, part, last_name_count, part_count
from u
where last_name = 'Roberts'
order by orig_rn, part_length;
LAST_NAME FIRST_NAME PART LAST_NAME_COUNT PART_COUNT
-------------------- -------------------- ------------------------ --------------- ----------
Roberts Chris 2 2
Roberts Chris C 2 2
Roberts Chris Ch 2 2
Roberts Chris Chr 2 2
Roberts Chris Chri 2 2
Roberts Chris Chris 2 2
Roberts Christian 2 2
Roberts Christian C 2 2
Roberts Christian Ch 2 2
Roberts Christian Chr 2 2
Roberts Christian Chri 2 2
Roberts Christian Chris 2 2
Roberts Christian Christ 2 1
Roberts Christian Christi 2 1
Roberts Christian Christia 2 1
Roberts Christian Christian 2 1
第三个CTEv
只查看唯一值,然后根据唯一值的长度对它们进行排序;因此,在所有记录中唯一的记录的名字的最短收缩排列为1
...
v as (
select last_name, first_name, orig_rn, part, last_name_count,
row_number() over (partition by orig_rn order by part_length) as rn
from u
where (part_count = 1 or part = first_name)
)
select last_name, first_name, part, last_name_count
from v
where rn = 1
order by orig_rn;
LAST_NAME FIRST_NAME PART LAST_NAME_COUNT
-------------------- -------------------- ------------------------ ---------------
Johnson Charles 1
Roberts Chris Chris 2
Roberts Christian Christ 2
Smith Albert Alb 3
Smith Alphonse Alp 3
Smith Jason J 3
然后,最后的查询只提取那些排名1
,它们是最短的唯一值,并按照您想要的方式对它们进行格式化
如果两个人的名字完全相同,那么这两个人都会被完整地拼写出来(),这似乎是你想要从你的评论中得到的
不确定这是否真的符合“更干净”的条件,只是它只会在原始表中出现一次。很有趣,但我很好奇你会怎么做。如果编程是可能的,那么SQL也是可能的。例如,如果有两个“Robert Haddock”和一个“Rob Haddock”,默认情况下,你会修剪最后3个字母,或者系统会如何生成一个昵称?如果两个同名,我想我只会返回Robert Haddock,不带句号。如果我想要罗布·哈多克,它会把罗布·哈多克还给我,因为那是他的全名。所以是的,我想我需要更多的规则。。。从编程的角度讲,它当前会重新查询表,将一个字母添加到名字中,直到得到唯一的返回。试图避免多次sql调用。我了解您的编程方法,请您添加更多示例数据和预期输出。这会给你更好的解决方案。很酷。我得把这件事考虑一下。它似乎满足了所有的要求,所以我将把它标记为答案;您有两个
A.Smith
和两个C.Roberts
?这很有效,但如果需要,只用于附加第一个首字母。没有更深的层次。