Sql 是否有Oracle函数来确定重复字符数

Sql 是否有Oracle函数来确定重复字符数,sql,oracle,Sql,Oracle,我需要验证电子邮件中重复字符的数量 我尝试下一个代码,该代码给出了重复字符的百分比,但仅当字符彼此相邻时才起作用。因此,一个可能的顺序按字符的电子邮件,以获得我的结果 SELECT round(((REGEXP_COUNT(regexp_replace(SUBSTR('999824123@HOTMAIL.COM',1,INSTR('989824123@HOTMAIL.COM', '@', 1)-1), '(.)\1+','&'),'&')+length(SUBSTR('9898

我需要验证电子邮件中重复字符的数量

我尝试下一个代码,该代码给出了重复字符的百分比,但仅当字符彼此相邻时才起作用。因此,一个可能的顺序按字符的电子邮件,以获得我的结果

SELECT 
round(((REGEXP_COUNT(regexp_replace(SUBSTR('999824123@HOTMAIL.COM',1,INSTR('989824123@HOTMAIL.COM', '@', 1)-1), '(.)\1+','&'),'&')+length(SUBSTR('989824123@HOTMAIL.COM',1,INSTR('989824123@HOTMAIL.COM', '@', 1)-1)) - length(regexp_replace(SUBSTR('989824123@HOTMAIL.COM',1,INSTR('989824123@HOTMAIL.COM', '@', 1)-1), '(.)\1+','\1')))* 100)/length(SUBSTR('989824123@HOTMAIL.COM',1,INSTR('989824123@HOTMAIL.COM', '@', 1)-1)),2) AS PORCENTAJE_IGUAL  
FROM DUAL A;
我希望这封邮件有60%的重复字符989824123@HOTMAIL.COM. 不包括域名

请帮忙


PD:很抱歉电子邮件中出现了9、8、2个重复的数字,所以我们有6个重复的字符9、9、8、8、2、2和3个唯一的1、3、4。6/9为66.67%。 您可以使用此查询来计算:

with 
  t(email) as (select '989824123@hotmail.com' from dual),
  a(email) as (select substr(email, 1,instr(email, '@', 1)-1) from t),
  l as (select substr(email, level, 1) ltr from a connect by level <= length(email))
select sum(case when cnt <> 1 then cnt end) / sum(cnt) 
  from (select ltr, count(1) cnt from l group by ltr)
像这样使用它:

select rpt_similarity('abxabc@pqr.com') from dual;
或:

您也可以在选择中直接使用上述解决方案,无需函数,示例如下:

create table test(id, email) as (  
  select 101, '989824123@hotmail.com'      from dual union all
  select 102, 'hsimpson@gmail.com'         from dual union all
  select 103, 'msimpson@gmail.com'         from dual union all
  select 104, 'bsimpson121314@hotmail.com' from dual union all
  select 105, 'abxabx@hotmail.com'         from dual );

with 
  a(id, email) as (select id, substr(email, 1,instr(email, '@', 1)-1) from test),
  l as (
    select id, email, substr(email, level, 1) ltr from a 
      connect by level <= length(email) 
        and prior id = id and prior sys_guid() is not null) 
select id, email, sum(case when cnt <> 1 then cnt end) / sum(cnt) 
  from (select id, email, ltr, count(1) cnt from l group by id, ltr, email)
  group by id, email;

对于大型数据集,connect by查询的速度往往较慢。也许你可以调整你的regexp函数,它会更快。我试过这么做,但你的regexp_将99替换为$,999也替换为1$。

电子邮件中的数字9、8、2重复,因此我们有6个字符9、9、8、8、2、2重复,3个唯一的1、3、4。6/9为66.67%。 您可以使用此查询来计算:

with 
  t(email) as (select '989824123@hotmail.com' from dual),
  a(email) as (select substr(email, 1,instr(email, '@', 1)-1) from t),
  l as (select substr(email, level, 1) ltr from a connect by level <= length(email))
select sum(case when cnt <> 1 then cnt end) / sum(cnt) 
  from (select ltr, count(1) cnt from l group by ltr)
像这样使用它:

select rpt_similarity('abxabc@pqr.com') from dual;
或:

您也可以在选择中直接使用上述解决方案,无需函数,示例如下:

create table test(id, email) as (  
  select 101, '989824123@hotmail.com'      from dual union all
  select 102, 'hsimpson@gmail.com'         from dual union all
  select 103, 'msimpson@gmail.com'         from dual union all
  select 104, 'bsimpson121314@hotmail.com' from dual union all
  select 105, 'abxabx@hotmail.com'         from dual );

with 
  a(id, email) as (select id, substr(email, 1,instr(email, '@', 1)-1) from test),
  l as (
    select id, email, substr(email, level, 1) ltr from a 
      connect by level <= length(email) 
        and prior id = id and prior sys_guid() is not null) 
select id, email, sum(case when cnt <> 1 then cnt end) / sum(cnt) 
  from (select id, email, ltr, count(1) cnt from l group by id, ltr, email)
  group by id, email;

对于大型数据集,connect by查询的速度往往较慢。也许你可以调整你的regexp函数,它会更快。我试过这么做,但你的regexp\u replace将99变成了美元,999也变成了一美元。

那么60%是如何计算的呢?嗨,戈登,这是我做的:变量*100/长度变量中重复的字符数,以及60%是如何计算的?嗨,戈登,这是我做的:变量*100/长度变量中重复的字符数嗨,我想,这项工作很好,但如何在更新或选择大量电子邮件的大型数据库时应用这样的内容。嗨,Pounder,这项工作很好,但如何在更新或选择大量电子邮件的大型数据库时应用这样的内容。