Sql server 在基于集合的SQL中从VARCHAR字段提取段/值的最佳方法
以以下数据为例:Sql server 在基于集合的SQL中从VARCHAR字段提取段/值的最佳方法,sql-server,sql,regex,tsql,sql-server-2008-r2,Sql Server,Sql,Regex,Tsql,Sql Server 2008 R2,以以下数据为例: SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject] UNION ALL SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject] UNION ALL SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]
我想做的是提取以下数据:
WITH ExampleData
AS ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113'
)
SELECT dbo.fnParseString(2, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Ref] ,
dbo.fnParseString(3, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Type] ,
dbo.fnParseString(4, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [OurRef]
FROM ExampleData
如您所见,我需要将Ref、Type和OurRef提取为单独的列,以确保在处理生成的电子邮件时高效地使用基于集合的SQL
通常在这种情况下,我会使用如下函数:
CREATE FUNCTION dbo.fnParseString (
@Section SMALLINT ,
@Delimiter CHAR ,
@Text VARCHAR(MAX)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @NextPos SMALLINT;
DECLARE @LastPos SMALLINT;
DECLARE @Found SMALLINT;
SELECT @NextPos = CHARINDEX(@Delimiter, @Text, 1) ,
@LastPos = 0 ,
@Found = 1
WHILE @NextPos > 0
AND ABS(@Section) <> @Found
SELECT @LastPos = @NextPos ,
@NextPos = CHARINDEX(@Delimiter, @Text, @NextPos + 1) ,
@Found = @Found + 1
RETURN LTRIM(RTRIM(CASE
WHEN @Found <> ABS(@Section) OR @Section = 0 THEN NULL
WHEN @Section > 0 THEN SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
ELSE SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
END))
END
正如您所看到的,我有一个解决方案,可以得到我想要的最终结果,但是使用凌乱的udf并不理想&我想知道是否有更好的方法来处理这样的事情-也许是内联正则表达式?也就是说,我认为PATINDEX()
接受正则表达式作为搜索字符串-这与SUBSTRING()
结合可以满足我的需要,但我真的不知道从哪里开始
编辑:请注意,这是一个简化的示例,主题是可变的,我还将采用相同的技术来解析主体,主体将有8项数据,我需要使用各种分隔符来解析,因此这排除了使用ParseName()
,因为它只允许4个部分,我不能使用固定长度(即substring()
),因为长度会非常不同(特别是如果涉及到不同的帮助台(它们是),这就是为什么我会按照PATINDEX()
&substring()
的思路思考的原因。我建议使用这个:
;WITH CTE
AS
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]
)
, CTEPart
as
(
SELECT [Subject], REPLACE(SUBSTRING([Subject], 25, 1000), ', ', '.') Part
FROM CTE
)
SELECT
[Subject],
PARSENAME(Part, 1) AS [Ref],
PARSENAME(Part, 2) AS [Type],
PARSENAME(Part, 3) AS [OurRef]
FROM CTEPart
我建议使用以下方法:
;WITH CTE
AS
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]
)
, CTEPart
as
(
SELECT [Subject], REPLACE(SUBSTRING([Subject], 25, 1000), ', ', '.') Part
FROM CTE
)
SELECT
[Subject],
PARSENAME(Part, 1) AS [Ref],
PARSENAME(Part, 2) AS [Type],
PARSENAME(Part, 3) AS [OurRef]
FROM CTEPart
此示例为Oracle查询。使用的所有函数均为ANSI SQL标准,可在任何SQL中使用。此示例仅剪切字符串的REF部分。您只需对Type、OutRef等重复所有步骤即可。此示例假设您的REF始终包含0-0,并且REF之后始终有“,”可以用WhitePac替换可以使用NVL()cna:INSTR(str,NVL(',','')。我认为这种方法比将值硬编码到SUBSTR中更通用…:
SELECT str, SUBSTR(str, ref_start_pos, ref_end_pos) final_ref
FROM
(
SELECT str, ref_start_pos, INSTR(str, ',', ref_start_pos)-ref_start_pos AS ref_end_pos
FROM
(
SELECT str, INSTR(str, '0')-1 AS ref_start_pos
FROM
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS str
FROM dual
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
FROM dual
)
)
)
/
SQL>
STR | FINAL_REF
------------------------------------------------------------------------
HelpDesk Call Reference F0012345, Call Update, 40111 | F0012345
HelpDesk Call Reference F0012346, Call Resolved, 40112 | F0012346
SQL Server版本(由OP添加):
此示例为Oracle查询。使用的所有函数均为ANSI SQL标准,可在任何SQL中使用。此示例仅剪切字符串的REF部分。您只需对Type、OutRef等重复所有步骤即可。此示例假设您的REF始终包含0-0,并且REF之后始终有“,”可以用WhitePac替换可以使用NVL()cna:INSTR(str,NVL(',','')。我认为这种方法比将值硬编码到SUBSTR中更通用…:
SELECT str, SUBSTR(str, ref_start_pos, ref_end_pos) final_ref
FROM
(
SELECT str, ref_start_pos, INSTR(str, ',', ref_start_pos)-ref_start_pos AS ref_end_pos
FROM
(
SELECT str, INSTR(str, '0')-1 AS ref_start_pos
FROM
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS str
FROM dual
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
FROM dual
)
)
)
/
SQL>
STR | FINAL_REF
------------------------------------------------------------------------
HelpDesk Call Reference F0012345, Call Update, 40111 | F0012345
HelpDesk Call Reference F0012346, Call Resolved, 40112 | F0012346
SQL Server版本(由OP添加):
在做了额外的工作之后,我们决定不使用Art的答案中的方法(即使它有效) 我们需要一种更强大的方法来验证和提取子字符串,因此我通过CLR路由使用正则表达式(感谢您为我指明了正确的方向) 我采取的方法如下:
WITH ExampleData
AS ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113'
)
SELECT dbo.fnParseString(2, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Ref] ,
dbo.fnParseString(3, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Type] ,
dbo.fnParseString(4, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [OurRef]
FROM ExampleData
首先,我编译了以下CLR:(从C#示例转换为VB)
然后我编写了以下包装函数以简化使用:
EXEC sp_configure
'clr enabled' ,
'1'
GO
RECONFIGURE
USE [db_Utility]
GO
CREATE ASSEMBLY SQL_CLR_RegExp FROM 'D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\SQL_CLR_RegExp.dll' WITH
PERMISSION_SET = SAFE
GO
-- =============================================
-- Returns 1 or 0 if input matches pattern
-- VB function: RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
-- =============================================
CREATE FUNCTION [dbo].[RegexMatch]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX)
)
RETURNS [bit]
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexMatch]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexReplace]
(
@expression [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@replace [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexReplace]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectAll]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchDelimiter [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectAll]
GO
-- =============================================
-- Returns finding matchIndex of a zero based index
-- RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOne]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchIndex [int]
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectOne]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Jordon Pilling>
-- Create date: <30/01/2013>
-- Description: <Calls RegexSelectOne with start and end text and cleans the result>
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOneWithScrub]
(
@Haystack VARCHAR(MAX),
@StartNeedle VARCHAR(MAX),
@EndNeedle VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @ReturnStr VARCHAR(MAX)
--#### Extract text from HayStack using Start and End Needles
SET @ReturnStr = dbo.RegexSelectOne(@Haystack, REPLACE(@StartNeedle, ' ','\s') + '((.|\n)+?)' + REPLACE(@EndNeedle, ' ','\s'), 0)
--#### Remove the Needles
SET @ReturnStr = REPLACE(@ReturnStr, @StartNeedle, '')
SET @ReturnStr = REPLACE(@ReturnStr, @EndNeedle, '')
--#### Trim White Space
SET @ReturnStr = LTRIM(RTRIM(@ReturnStr))
--#### Trim Line Breaks and Carriage Returns
SET @ReturnStr = dbo.SuperTrim(@ReturnStr)
RETURN @ReturnStr
END
GO
DECLARE @Subject VARCHAR(250) = 'HelpDesk Call Reference F0012345, Call Update, 40111'
DECLARE @Ref VARCHAR(250) = NULL
IF dbo.RegexMatch(@Subject, '^HelpDesk\sCall\sReference\sF[0-9]{7},\s(Call\sResolved|Call\sUpdate|New\scall\slogged),(|\s+)([0-9]+|unknown)$') = 1
SET @Ref = ISNULL(dbo.RegexSelectOneWithScrub(@Subject, 'HelpDesk Call Reference', ','), 'Invalid (#1)')
ELSE
SET @Ref = 'Invalid (#2)'
SELECT @Ref
这在多个搜索中使用要快得多,在处理大量具有不同起始和结束短语的文本时功能更强大。经过额外的工作,我们决定在Art的答案中不使用这种方法(即使它有效) 我们需要一种更强大的方法来验证和提取子字符串,因此我通过CLR路由使用正则表达式(感谢您为我指明了正确的方向) 我采取的方法如下:
WITH ExampleData
AS ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113'
)
SELECT dbo.fnParseString(2, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Ref] ,
dbo.fnParseString(3, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Type] ,
dbo.fnParseString(4, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [OurRef]
FROM ExampleData
首先,我编译了以下CLR:(从C#示例转换为VB)
然后我编写了以下包装函数以简化使用:
EXEC sp_configure
'clr enabled' ,
'1'
GO
RECONFIGURE
USE [db_Utility]
GO
CREATE ASSEMBLY SQL_CLR_RegExp FROM 'D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\SQL_CLR_RegExp.dll' WITH
PERMISSION_SET = SAFE
GO
-- =============================================
-- Returns 1 or 0 if input matches pattern
-- VB function: RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
-- =============================================
CREATE FUNCTION [dbo].[RegexMatch]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX)
)
RETURNS [bit]
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexMatch]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexReplace]
(
@expression [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@replace [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexReplace]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectAll]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchDelimiter [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectAll]
GO
-- =============================================
-- Returns finding matchIndex of a zero based index
-- RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOne]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchIndex [int]
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectOne]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Jordon Pilling>
-- Create date: <30/01/2013>
-- Description: <Calls RegexSelectOne with start and end text and cleans the result>
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOneWithScrub]
(
@Haystack VARCHAR(MAX),
@StartNeedle VARCHAR(MAX),
@EndNeedle VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @ReturnStr VARCHAR(MAX)
--#### Extract text from HayStack using Start and End Needles
SET @ReturnStr = dbo.RegexSelectOne(@Haystack, REPLACE(@StartNeedle, ' ','\s') + '((.|\n)+?)' + REPLACE(@EndNeedle, ' ','\s'), 0)
--#### Remove the Needles
SET @ReturnStr = REPLACE(@ReturnStr, @StartNeedle, '')
SET @ReturnStr = REPLACE(@ReturnStr, @EndNeedle, '')
--#### Trim White Space
SET @ReturnStr = LTRIM(RTRIM(@ReturnStr))
--#### Trim Line Breaks and Carriage Returns
SET @ReturnStr = dbo.SuperTrim(@ReturnStr)
RETURN @ReturnStr
END
GO
DECLARE @Subject VARCHAR(250) = 'HelpDesk Call Reference F0012345, Call Update, 40111'
DECLARE @Ref VARCHAR(250) = NULL
IF dbo.RegexMatch(@Subject, '^HelpDesk\sCall\sReference\sF[0-9]{7},\s(Call\sResolved|Call\sUpdate|New\scall\slogged),(|\s+)([0-9]+|unknown)$') = 1
SET @Ref = ISNULL(dbo.RegexSelectOneWithScrub(@Subject, 'HelpDesk Call Reference', ','), 'Invalid (#1)')
ELSE
SET @Ref = 'Invalid (#2)'
SELECT @Ref
这在多个搜索中使用起来要快得多,在处理大量具有不同屏幕开始和结束短语的文本时功能更强大。@Ben Thank:)您的-没有黑客攻击。+1很好,但只要数据一致且不超过4部分,此解决方案就可以工作。非常抱歉,您在我编辑问题时回答了我的问题–请参阅问题的编辑部分–基本上,我没有使用PARSENAME或FixedLength子字符串的奢侈:'(@Ben Thank:)您的-没有黑客攻击。+1很好,但只要数据一致且不超过4部分,此解决方案就可以工作。非常抱歉,您在我编辑问题时回答了我的问题–请参阅问题的编辑部分–基本上,我没有使用PARSENAME或FixedLength子字符串的奢侈:'(如果你有复杂的解析逻辑,我建议使用
CLR
function.CLR和Regex。如果你的格式没有相同的结构,你应该发布它们。你为什么要在TSQL中这样做?正如其他人所建议的那样,使用.NET Regex代码可能比使用TSQL容易得多,因为TSQL是一种不适合处理文本的语言。我会D考虑CLR程序或外部脚本/程序来解析数据。既然提到了发送电子邮件,在这种情况下,外部程序可能更容易。@ PANDLIFE我们最初希望在SQL中这样做,因为这是一个现有的系统,已经有很多SQL程序来处理不同类型的电子邮件。(而且效果非常好),但是,是的,我同意这是一个更复杂的问题,其他的和CLR可能是解决的方法-只需要有人把它作为答案,这样我就可以相应地标记它。@HeavenCore好的,让我们来具体化问题。你有由任何分隔符分隔的字符串。每个字符串可以有不同的部分计数,还是相同的部分计数ch一开始就知道?如果你有复杂的解析逻辑,我建议使用CLR
function.CLR和Regex。如果你的格式没有相同的结构,你应该发布它们。为什么要在TSQL中这样做?正如其他人所建议的,使用.NET Regex代码可能比使用TSQL容易得多,因为TSQL是一种糟糕的语言