搜索并替换字符串t-SQL
我正在尝试编写一个查询来替换字符串末尾出现的所有内容。 我有一些杂音词(确切地说是104),如果它们出现在字符串的末尾,就需要从字符串中删除它们 例如,两个噪音词是--Company,LLC 以下是一些示例和预期输出:搜索并替换字符串t-SQL,sql,sql-server,tsql,Sql,Sql Server,Tsql,我正在尝试编写一个查询来替换字符串末尾出现的所有内容。 我有一些杂音词(确切地说是104),如果它们出现在字符串的末尾,就需要从字符串中删除它们 例如,两个噪音词是--Company,LLC 以下是一些示例和预期输出: American Company, LLC --Expected output --American (both noise words should be removed) American LLC,LLC --Expected output -- American Ameri
American Company, LLC --Expected output --American (both noise words should be removed)
American LLC,LLC --Expected output -- American
American Company American Company-- American Company American (one noise word occurs in between other words, so it should not be removed)
目前我有以下问题:
DECLARE @NEWSTRING VARCHAR(max)
DECLARE @NEWSTRINGlength nvarchar(max)
SET @NEWSTRING = 'American Company American Company Company, LLC LLC' ;
SET @NEWSTRINGlength = len(@newstring)
SELECT @NEWSTRINGlength
CREATE TABLE #item (item Nvarchar(250) null)
INSERT INTO #item
SELECT 'Company' as item
UNION ALL
SELECT 'LLC' as item
DECLARE @unwantedCharecters VARCHAR(50) = '%[~,@,#,$,%,&,*,(,),.,!, ]%'
WHILE PATINDEX( @unwantedCharecters, @NEWSTRING ) > 0
SELECT @NEWSTRING = ltrim(rtrim(Replace(REPLACE( @NEWSTRING, SUBSTRING( @NEWSTRING, PATINDEX( @unwantedCharecters, @NEWSTRING ), 1 ),''),'-',' ')))
SELECT @NEWSTRING = substring(rtrim(@NEWSTRING), 1, len(@newstring) - len(ITEM)) FROM #item WHERE rtrim(@NEWSTRING) LIKE '%' + ITEM
噪声词的每一次出现都应该被删除,除非它们出现在其他词之间。这将实现以下目的:
WITH
DirtyValues AS(
SELECT * FROM (VALUES
(1, 'American Company, LLC') --Expected output --American (both noise words should be removed)
, (2, 'American LLC,LLC') --Expected output -- American
, (3, 'American Company American Company')-- American Company American (one noise word occurs in between other words, so it should not be removed)
) AS T(ID, Dirty)
),
NoisyWords AS(
SELECT * FROM (VALUES
(' ') -- Just apend the chars to be filtered to your noise word list
, (',')
, ('LLC')
, ('Company')
) AS T(Noisy)
),
DoSomeMagic AS(
SELECT ID
, Result = REVERSE(Dirty)
FROM DirtyValues
UNION ALL
SELECT ID
, Result = SUBSTRING(Result, DATALENGTH(Noisy)+1, DATALENGTH(Result))
FROM DoSomeMagic
CROSS APPLY(
SELECT
Noisy = REVERSE(Noisy)
FROM NoisyWords
) AS T
WHERE PATINDEX('%' + Noisy + '%', Result) = 1
),
PickBestResult AS(
SELECT DoSomeMagic.ID
, [clean as a whistle] = REVERSE(DoSomeMagic.Result)
, [Rank] = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATALENGTH(Result) ASC)
FROM DoSomeMagic
)
SELECT *
FROM PickBestResult
WHERE [Rank] = 1
它的作用是:
- 前两个CTE是您的数据集,您当然希望为自己的表更改它们
- DoSomeMagic是递归CTE,首先反转字符串以便能够从末尾开始搜索,然后交叉应用所有噪声字,并检查字符串现在的开头是否以反向噪声字开始。如果是这样,请将其移除并继续操作,直到在开始处未找到任何噪音词
- 然后,PickBestResult将[排名]每一行,结果最短的一行将获得排名1