Sql server 使用任意序列进行字符串搜索

Sql server 使用任意序列进行字符串搜索,sql-server,sql-server-2008-r2,Sql Server,Sql Server 2008 R2,我有以下两个表格: 表1: CREATE TABLE tbl_str_match_1 ( enumber int, ename varchar(100), eaddress varchar(500) ); INSERT INTO tbl_str_match_1 VALUES(1,'John Mak','Hno 12 Street Road, USA'); INSERT INTO tbl_str_match_1 VALUES(2,'Shai Lee','UK'); INS

我有以下两个表格:

表1:

CREATE TABLE tbl_str_match_1
(
    enumber int,
    ename varchar(100),
    eaddress varchar(500)
);

INSERT INTO tbl_str_match_1 VALUES(1,'John Mak','Hno 12 Street Road, USA');
INSERT INTO tbl_str_match_1 VALUES(2,'Shai Lee','UK');
INSERT INTO tbl_str_match_1 VALUES(3,'Smith Watson','Street X01 UAE');
INSERT INTO tbl_str_match_1 VALUES(4,'Ray Gibbs','SA 124');
表2:

CREATE TABLE tbl_str_match_4
(
    name varchar(100),
    [address] varchar(500)
);

INSERT INTO tbl_str_match_4 VALUES('Mak John','Street Road, Hno 12, USA');
INSERT INTO tbl_str_match_4 VALUES('Shai A Lee','UK');
INSERT INTO tbl_str_match_4 VALUES('A watson Smeeth ','UAE Street X01');
INSERT INTO tbl_str_match_1 VALUES('Henry Jay','RUS OP124');
我想用传递的数字从表tbl_str_match_1中搜索名称,然后以名称作为输入进行下一次搜索,并从另一个名为tbl_str_match_4的表中查找名称和地址

注:

名字可以是任何序列,如名字中间名、名字中间名或名字中间名,任何可能性都是可能的

我想从第二个表中找到名称和地址,该表有一个额外的列,即字符串的百分比匹配

将有两个搜索,第一个在表tbl_stru_match_1上获取名称,第二个在表tbl_stru_match_4上获取名称和地址

对于第一个记录麦约翰它应该显示100%匹配麦约翰

对于第二个记录,李小帅应该表现出与李小帅90%的匹配,因为他有一个中号的外表

最后一条记录Ray Gibbs将不会显示在结果集中,因为它与其他表值不匹配

-查询:

WITH CTE1 AS
(   
    SELECT ename FROM tbl_str_match_1 WHERE enumber = 1
)
SELECT name,[address] FROM tbl_str_match_4 WHERE name LIKE '%'+(SELECT ename from CTE1)+'%'
预期结果:

场景1:如果我通过enumber=1,那么结果应该是:

    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Mak John    Street Road, Hno 12, USA    100
    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Shai A Lee  UK                          90
    Name                Address             Matching Percentage
    ------------------------------------------------------------
    A watson Smeeth     UAE Street X01      70
场景2:如果我通过enumber=2,那么结果应该是:

    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Mak John    Street Road, Hno 12, USA    100
    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Shai A Lee  UK                          90
    Name                Address             Matching Percentage
    ------------------------------------------------------------
    A watson Smeeth     UAE Street X01      70
场景3:如果我通过enumber=3,那么结果应该是:

    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Mak John    Street Road, Hno 12, USA    100
    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Shai A Lee  UK                          90
    Name                Address             Matching Percentage
    ------------------------------------------------------------
    A watson Smeeth     UAE Street X01      70
场景4:如果我通过enumber=4,那么结果应该是:

    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Mak John    Street Road, Hno 12, USA    100
    Name        Address                     Matching Percentage
    ------------------------------------------------------------
    Shai A Lee  UK                          90
    Name                Address             Matching Percentage
    ------------------------------------------------------------
    A watson Smeeth     UAE Street X01      70
没有结果,因为我们没有相关匹配

    Name        Address                     Matching Percentage
    ------------------------------------------------------------

我希望这能对你有所帮助

with CTE1 as

(
Select enumber,Ltrim(SubString(ename,1,Isnull(Nullif(CHARINDEX(' ',ename),0),1000))) As Firstename,

Ltrim(SUBSTRING(ename,CharIndex(' ',ename),
CAse When (CHARINDEX(' ',ename,CHARINDEX(' ',ename)+1)-CHARINDEX(' ',ename))<=0 then 0 
else CHARINDEX(' ',ename,CHARINDEX(' ',ename)+1)-CHARINDEX(' ',ename) end )) as Middleename,

Ltrim(SUBSTRING(ename,Isnull(Nullif(CHARINDEX(' ',ename,Charindex(' ',ename)+1),0),CHARINDEX(' ',ename)),
Case when Charindex(' ',ename)=0 then 0 else LEN(ename) end)) as Lastename

From tbl_str_match_1
),

CTE2 as

(
Select *,Ltrim(SubString(name,1,Isnull(Nullif(CHARINDEX(' ',name),0),1000))) As FirstName,

Ltrim(SUBSTRING(name,CharIndex(' ',name),
CAse When (CHARINDEX(' ',name,CHARINDEX(' ',name)+1)-CHARINDEX(' ',name))<=0 then 0 
else CHARINDEX(' ',name,CHARINDEX(' ',name)+1)-CHARINDEX(' ',name) end )) as MiddleName,

Ltrim(SUBSTRING(name,Isnull(Nullif(CHARINDEX(' ',name,Charindex(' ',name)+1),0),CHARINDEX(' ',name)),
Case when Charindex(' ',name)=0 then 0 else LEN(name) end)) as LastName

From tbl_str_match_4
)

select CTE2.name,CTE2.address from CTE1 inner join CTE2 on  CTE1.Firstename = CTE2.FirstName and CTE1.Lastename = CTE2.LastName
where CTE1.enumber = 1

希望以下内容对您有所帮助

我首先将tbl_1和tbl_4中的名称标记为

之后,我比较了tbl_1和tbl_4中的标记

关于匹配百分比的问题。 在Shai A Lee的例子中,你有2个匹配Shai,Lee在3个Shai,A,Lee的总数中,那么匹配百分比不应该是66.67吗

with split_ename_1 
  as (
        SELECT a.enumber
            ,a.ename
            ,a.eaddress      
            ,split.a.value('.', 'VARCHAR(100)') AS Data  
        FROM  
        (
            SELECT enumber
                ,ename
                ,eaddress
                ,CAST ('<M>' + REPLACE(rtrim(ename), ' ', '</M><M>') + '</M>' AS XML) AS Data  
            FROM  tbl_str_match_1
        ) AS A CROSS APPLY Data.nodes ('/M') AS Split(a)
     )
,split_ename_4
   as (SELECT a.name            
             ,a.address      
             ,split.a.value('.', 'VARCHAR(100)') AS Data  
             ,COUNT(*) over(partition by a.name) as  tot_cnt
        FROM  
        (
            SELECT name
                   ,address
                   ,CAST ('<M>' + REPLACE(rtrim(name), ' ', '</M><M>') + '</M>' AS XML) AS Data  
              FROM  tbl_str_match_4
        ) AS A CROSS APPLY data.nodes ('/M') AS split(a)
       )
   select a.ename
         ,count(a.data) as tokens_1
         ,count(b.data) as tokens_4
         ,max(b.tot_cnt) as tot_tokens_4
         ,case when count(b.data)=0 then 0 else count(b.data)*1.00/max(b.tot_cnt)*1.00 end as matching_percentage
     from split_ename_1 a
left join split_ename_4 b
       on a.data=b.data
group by a.ename
您可以将CTE与字符串拆分结合使用来完成这项工作

我在tbl_str_match_4中添加了一个标识列,以简化此操作

 DECLARE @enumber INT = 2

;WITH c1 AS 
( 
  --To split the ename from first  table 

   SELECT s.value AS name
   FROM tbl_str_match_1 t
   CROSS APPLY STRING_SPLIT(t.ename, ' ') AS s
   WHERE enumber=@enumber
)
,c2 AS
( 
   --To split the matching names from second table of matched records

   SELECT t.id,s.value AS name 
   FROM tbl_str_match_4 t
   CROSS APPLY STRING_SPLIT(t.name, ' ') AS s
   WHERE EXISTS(SELECT 1 FROM c1 c WHERE t.name LIKE '%'+c.name+'%')
)
,c3 AS 
( 
   --To calculate the percentage of match

   SELECT id,
   CAST (COUNT(c1.name) AS FLOAT )/ CAST (COUNT(c2.name) AS FLOAT ) * 100 As Percentage
   FROM c2
   LEFT JOIN  c1 on c1.name =c2.name
   GROUP BY id
) 
--display the details
SELECT t.*,c3.Percentage FROM tbl_str_match_4 t
JOIN c3 ON t.Id=c3.Id

对于

levenshtein distance???正确发布样本数据很好,但您也应该发布预期结果。@ZoharPeled,添加了预期结果。无法获得第一条记录enumber=1的结果,请检查添加的预期结果。您的意思是名字、中间名、姓氏不按顺序排列吗?是的。还有可能出现名称拼写错误,比如一个表有Santhana,另一个表有Santana,我也想通过显示字符串百分比匹配来显示这些记录,如第三个记录的预期结果所示。工作正常,需要一点性能,因为它可以获得23秒的100万条记录,索引用于搜索列。此时,查询会对令牌进行动态拆分。如果这些值是静态的,那么创建一个新表并用标记加载它可能是值得的。例如:使用split_ename_1和split_ename_4的内容创建表,并在列数据上创建索引。