SQL字符串操作,查找所有排列

SQL字符串操作,查找所有排列,sql,sql-server,string,tsql,parsing,Sql,Sql Server,String,Tsql,Parsing,所以我在一列中有一组字符串,格式为“/a/B/C/D/E” A、 B、C、D、E表示长度不同的字符串。如何从上面的字符串转换成一组始终保持降序、按时间顺序排列的字符串。我的意思是A只能跟在B后面,B只能跟在C后面,而且必须跟在A前面。结果如下: 结果: '/A' '/A/B' '/A/B/C' '/A/B/C/D' '/A/B/C/D/E' txt ----------- /A /A/B /A/B/C /A/B/C/D /A/B/C/D/E txt ------------ /A /

所以我在一列中有一组字符串,格式为“/a/B/C/D/E”

A、 B、C、D、E表示长度不同的字符串。如何从上面的字符串转换成一组始终保持降序、按时间顺序排列的字符串。我的意思是A只能跟在B后面,B只能跟在C后面,而且必须跟在A前面。结果如下:

结果:

'/A'

'/A/B'

'/A/B/C'

'/A/B/C/D'

'/A/B/C/D/E'
txt
-----------
/A
/A/B
/A/B/C
/A/B/C/D
/A/B/C/D/E
txt
------------
/A
/A/B
/A/B/C
/A/B/C/D
/A/B/C/D/E

您可以使用递归CTE:

with x as (
      select '/A/B/C/D/E' as col
     ),
     cte as (
      select col
      from x
      union all
      select left(col, len(col) - charindex('/', reverse(col))) as col
      from cte
      where col like '/%/%'
     )
select *
from cte;
如果有100多个零件,则需要使用“最大递归”选项


是一个SQL小提琴。

这实际上很简单。为了获得最佳性能,您需要基于设置。我建议买一个。简单易行的解决方案是:

declare @string varchar(100) = '/A/B/C/D/E';

with iTally(n) as 
( select top (len(@string)/2) (row_number() over (order by (select null))-1)*2+2
  from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x),
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(x)) -- up to 100 character
select txt = substring(@string,1,n)
from iTally;
返回:

'/A'

'/A/B'

'/A/B/C'

'/A/B/C/D'

'/A/B/C/D/E'
txt
-----------
/A
/A/B
/A/B/C
/A/B/C/D
/A/B/C/D/E
txt
------------
/A
/A/B
/A/B/C
/A/B/C/D
/A/B/C/D/E
要确保文本正确排序,请执行以下操作:

declare @string varchar(100) = '/B/D/A/E/C';

with iTally(n) as 
( select top (len(@string)/2) (row_number() over (order by (select null))-1)*2+2
  from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x),
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(x)) -- up to 100 character
select txt = substring(reOrder.newString, 1, n)
from iTally
cross apply
(
  select '/'+substring(@string,n,1)
  from iTally
  order by substring(@string,n,1)
  for xml path('')
) reOrder(newString);
返回:

'/A'

'/A/B'

'/A/B/C'

'/A/B/C/D'

'/A/B/C/D/E'
txt
-----------
/A
/A/B
/A/B/C
/A/B/C/D
/A/B/C/D/E
txt
------------
/A
/A/B
/A/B/C
/A/B/C/D
/A/B/C/D/E

我刚刚想到几天前我写了一个函数,它非常适合这种类型的东西。函数如下所示,请注意我的代码注释以了解有关函数的更多详细信息

解决方案

select itemNumber = tokenlen/2, leftToken
from dbo.edgeNgrams8k('/A/B/C/D/E')
where tokenlen % 2 = 0;
if object_id('dbo.edgeNgrams8k', 'IF') is not null drop function dbo.edgeNgrams8k;
go
create function dbo.edgeNgrams8k(@string varchar(8000))
/*****************************************************************************************
Purpose
  edgeNgrams8k is an inline table valued function (itvf) that accepts a varchar(8000) 
  input string (@string) and returns a series of character-level left and right edge 
  n-grams. An edge n-gram (referred to herin as an "edge-gram" for brevity) is a type of 
  n-gram (see https://en.wikipedia.org/wiki/N-gram). Instead of a contiguous series of 
  n-sized tokens (n-grams), however, an edge n-gram is a series of tokens that that begin 
  with the input string's first character then increases by one character, the next in the
  string, unitl the token is as long as the input string. 

  Left edge-grams start at the beginning of the string and grow from left-to-right. Right
  edge-grams begin at the end of the string and grow from right-to-left. Note this query
  and the result-set:

  select * from dbo.edgeNgrams8k('ABC');

  tokenlen   leftToken    rightTokenIndex  righttoken
  ---------- ------------ ---------------- ----------
  1          A            3                C
  2          AB           2                BC
  3          ABC          1                ABC

Developer Notes:
 1. For more about N-Grams in SQL Server see: http://www.sqlservercentral.com/articles/Tally+Table/142316/
    For more about Edge N-Grams see the documentation by Elastic here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html 

 2. dbo.edgeNgrams8k is deterministic. For more about determinism see: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions

 3. If you need to sort this data without getting a sort in your execution plan you can 
    sort by tokenLen for ascending order, or by rightTokenIndex for descending order.

------------------------------------------------------------------------------------------
Usage Examples:
  I need to turn /A/B/C/D/E into:
  /A
  /A/B
  .....
  /A/B/C/D/E

  select leftToken 
    from dbo.edgeNgrams8k('/A/B/C/D/E')
  where tokenLen % 2 = 0

------------------------------------------------------------------------------------------
History:
 20171125 - Initial Development - Developed by Alan Burstein  
*****************************************************************************************/
returns table with schemabinding as return
with iTally(n) as 
(
  select top (len(@string)) row_number() over (order by (select $))
  from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x), -- 10^1 = 10
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(x), -- 10^2 = 100
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) c(x), -- 10^3 = 1000
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) d(x)  -- 10^4 = 10000
)
select top (convert(bigint, len(@string), 0))
  tokenlen        = n,
  leftToken       = substring(@string,1,n),
  rightTokenIndex = len(@string)+1-n,
  righttoken      = substring(@string,len(@string)+1-n, n)
from itally;
go
结果

itemNumber           leftToken
-------------------- -----------
1                    /A
2                    /A/B
3                    /A/B/C
4                    /A/B/C/D
5                    /A/B/C/D/E
Gordon (unsorted)
------------------------------------------------------------------------------------------------------------------------------------------------------
Table 'Worktable'. **Scan count 100001, logical reads 3492199,** physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 1, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   **CPU time = 4625 ms,  elapsed time = 4721 ms.**

Alan (sorted) - serial
------------------------------------------------------------------------------------------------------------------------------------------------------
Table 'Worktable'. **Scan count 20979, logical reads 563853**, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 1, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   **CPU time = 1782 ms,  elapsed time = 1790 ms.**

Alan (sorted) - parallel
------------------------------------------------------------------------------------------------------------------------------------------------------
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 9, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. **Scan count 20979, logical reads 563860**, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   **CPU time = 3762 ms,  elapsed time = 992 ms.**

Alan (unsorted) - serial
------------------------------------------------------------------------------------------------------------------------------------------------------
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 9, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 **SQL Server Execution Times:
 CPU time = 219 ms,  elapsed time = 217 ms.

Alan (unsorted) - parallel
------------------------------------------------------------------------------------------------------------------------------------------------------
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 9, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

**SQL Server Execution Times:
  CPU time = 393 ms,  elapsed time = 101 ms.**
功能

select itemNumber = tokenlen/2, leftToken
from dbo.edgeNgrams8k('/A/B/C/D/E')
where tokenlen % 2 = 0;
if object_id('dbo.edgeNgrams8k', 'IF') is not null drop function dbo.edgeNgrams8k;
go
create function dbo.edgeNgrams8k(@string varchar(8000))
/*****************************************************************************************
Purpose
  edgeNgrams8k is an inline table valued function (itvf) that accepts a varchar(8000) 
  input string (@string) and returns a series of character-level left and right edge 
  n-grams. An edge n-gram (referred to herin as an "edge-gram" for brevity) is a type of 
  n-gram (see https://en.wikipedia.org/wiki/N-gram). Instead of a contiguous series of 
  n-sized tokens (n-grams), however, an edge n-gram is a series of tokens that that begin 
  with the input string's first character then increases by one character, the next in the
  string, unitl the token is as long as the input string. 

  Left edge-grams start at the beginning of the string and grow from left-to-right. Right
  edge-grams begin at the end of the string and grow from right-to-left. Note this query
  and the result-set:

  select * from dbo.edgeNgrams8k('ABC');

  tokenlen   leftToken    rightTokenIndex  righttoken
  ---------- ------------ ---------------- ----------
  1          A            3                C
  2          AB           2                BC
  3          ABC          1                ABC

Developer Notes:
 1. For more about N-Grams in SQL Server see: http://www.sqlservercentral.com/articles/Tally+Table/142316/
    For more about Edge N-Grams see the documentation by Elastic here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html 

 2. dbo.edgeNgrams8k is deterministic. For more about determinism see: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions

 3. If you need to sort this data without getting a sort in your execution plan you can 
    sort by tokenLen for ascending order, or by rightTokenIndex for descending order.

------------------------------------------------------------------------------------------
Usage Examples:
  I need to turn /A/B/C/D/E into:
  /A
  /A/B
  .....
  /A/B/C/D/E

  select leftToken 
    from dbo.edgeNgrams8k('/A/B/C/D/E')
  where tokenLen % 2 = 0

------------------------------------------------------------------------------------------
History:
 20171125 - Initial Development - Developed by Alan Burstein  
*****************************************************************************************/
returns table with schemabinding as return
with iTally(n) as 
(
  select top (len(@string)) row_number() over (order by (select $))
  from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x), -- 10^1 = 10
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(x), -- 10^2 = 100
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) c(x), -- 10^3 = 1000
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) d(x)  -- 10^4 = 10000
)
select top (convert(bigint, len(@string), 0))
  tokenlen        = n,
  leftToken       = substring(@string,1,n),
  rightTokenIndex = len(@string)+1-n,
  righttoken      = substring(@string,len(@string)+1-n, n)
from itally;
go
更新-性能测试

为了强调我的观点,我准备了一个100K行测试

首先是递归CTE解决方案,然后是我在as内联表值函数中的第一个解决方案。您需要内联表值函数,因为它们可以从并行处理中获益,稍后我将向您介绍。职能:

-- Gordon's logic as an inline table valued function
create function dbo.rCTE_GL (@string varchar(8000))
returns table as return
with x as (select @string as col),
     cte as (
      select col
      from x
      union all
      select left(col, len(col) - charindex('/', reverse(col))) as col
      from cte
      where col like '/%/%'
     )
select *
from cte;
GO

-- My logic as a table valued function
create function dbo.tally_AB(@string varchar(8000))
returns table as return    
with iTally(n) as 
( select top (len(@string)/2) (row_number() over (order by (select null))-1)*2+2
  from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x),
       (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(x)) -- up to 100 character
select txt = substring(reOrder.newString, 1, n)
from iTally
cross apply
(
  select '/'+substring(@string,n,1)
  from iTally
  order by substring(@string,n,1)
  for xml path('')
) reOrder(newString);
GO
性能测试

我正在生成一个id为100K的行,这样我们就可以知道字符串来自哪里。首先是rCTE解决方案,然后是带有串行执行计划和并行执行计划的每个解决方案(使用跟踪标志8649)

结果

itemNumber           leftToken
-------------------- -----------
1                    /A
2                    /A/B
3                    /A/B/C
4                    /A/B/C/D
5                    /A/B/C/D/E
Gordon (unsorted)
------------------------------------------------------------------------------------------------------------------------------------------------------
Table 'Worktable'. **Scan count 100001, logical reads 3492199,** physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 1, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   **CPU time = 4625 ms,  elapsed time = 4721 ms.**

Alan (sorted) - serial
------------------------------------------------------------------------------------------------------------------------------------------------------
Table 'Worktable'. **Scan count 20979, logical reads 563853**, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 1, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   **CPU time = 1782 ms,  elapsed time = 1790 ms.**

Alan (sorted) - parallel
------------------------------------------------------------------------------------------------------------------------------------------------------
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 9, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. **Scan count 20979, logical reads 563860**, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   **CPU time = 3762 ms,  elapsed time = 992 ms.**

Alan (unsorted) - serial
------------------------------------------------------------------------------------------------------------------------------------------------------
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 9, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 **SQL Server Execution Times:
 CPU time = 219 ms,  elapsed time = 217 ms.

Alan (unsorted) - parallel
------------------------------------------------------------------------------------------------------------------------------------------------------
Table '#strings____________________________________________________________________________________________________________00000000004C'. 
Scan count 9, logical reads 346, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

**SQL Server Execution Times:
  CPU time = 393 ms,  elapsed time = 101 ms.**

给你。根据您的需要和使用的CPU数量,tally table解决方案比递归CTE快4-40倍,并且只需读取一小部分

用你正在使用的数据库标记你的问题。@GordonLinoff DoneTo详细说明Linoff博士的请求:用适当的软件(MySQL、Oracle、DB2等)和版本标记数据库问题是很有帮助的,例如
sql-server-2014
。语法和特征的差异通常会影响答案。请注意,
tsql
缩小了选择范围,但没有指定数据库。