Sql server 2008 在sql server中拆分和匹配行_Sql Server 2008

Sql server 2008 在sql server中拆分和匹配行

sql-server-2008

Sql server 2008 在sql server中拆分和匹配行,sql-server-2008,Sql Server 2008,我正在使用SQLServer2008 我的桌子看起来像这样： ID Column ---------------------------------- 1 This is a Sample Text 2 Sample Text is typed here 3 Here the sample text is 4 Typing a sample ID Column MostCommon Common

我正在使用SQLServer2008

我的桌子看起来像这样：

ID      Column
----------------------------------
1       This is a Sample Text
2       Sample Text is typed here
3       Here the sample text is
4       Typing a sample

ID Column                     MostCommon  Common1  Common2  NonCommon
---------------------------------------------------------------------------------------
1  This is a Sample Text      Sample      Text     is       This a
2  Sample Text is typed here  Sample      Text     is       typed here
3  Here the sample text is    Sample      Text     is       Here the
4  Typing a sample            sample      NULL     NULL     Typing A

我需要输出如下：

ID      Column
----------------------------------
1       This is a Sample Text
2       Sample Text is typed here
3       Here the sample text is
4       Typing a sample

ID Column                     MostCommon  Common1  Common2  NonCommon
---------------------------------------------------------------------------------------
1  This is a Sample Text      Sample      Text     is       This a
2  Sample Text is typed here  Sample      Text     is       typed here
3  Here the sample text is    Sample      Text     is       Here the
4  Typing a sample            sample      NULL     NULL     Typing A

有人能帮我在SQLServer2008中编写一个sp/function/query吗

“示例”显示在所有行中。所以我可以把它保留为最常见的单词，“text”，“is”，第二个最常见的单词，可以在第1,2,3行找到。所有其他单词都与其他行不匹配，将被移动到非通用类别。下面是如何操作的，首先您必须创建一个拆分字符串的函数，然后计算出现次数并根据需要显示它们。由于要显示可变的列数，这会使其更加复杂：

/*
CREATE FUNCTION dbo.SplitStrings_XML
(
   @List       NVARCHAR(MAX),
   @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
   RETURN 
   (  
      SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
      FROM 
      ( 
        SELECT x = CONVERT(XML, '<i>' + 
                   REPLACE(@List, @Delimiter, '</i><i>') + '</i>').query('.')
      ) AS a CROSS APPLY x.nodes('i') AS y(i)
   );
GO
*/

CREATE TABLE #t(ID INT, Col VARCHAR(1000))
INSERT #t
VALUES
(1,       'This is a Sample Text'),
(2,       'Sample Text is typed here'),
(3,       'Here the sample text is'),
(4,       'Typing a sample')

DECLARE @MinimumNumberOfOccurances INT = 2

SELECT  a.ID,
        a.Col,
        b.Item
INTO    #SplitedStrings
FROM    #t a
CROSS APPLY dbo.SplitStrings_XML(a.Col, N' ') b


SELECT  b.Item,
        COUNT(*) cnt
INTO    #SplitedStringsGrouped
FROM    #t a
CROSS   APPLY dbo.SplitStrings_XML(a.Col, N' ') b
GROUP   BY b.Item

SELECT      b.*,
            a.cnt
INTO        #ResultTable
FROM        #SplitedStringsGrouped a
RIGHT JOIN  #SplitedStrings b ON 
            b.Item = a.Item
            AND a.cnt > @MinimumNumberOfOccurances
ORDER BY    b.ID, a.cnt DESC, LEN(a.Item) DESC

DECLARE @ColumnNames VARCHAR(1000) = STUFF(
(
    SELECT  ',[' + Item + ']'
    FROM    #SplitedStringsGrouped
    WHERE   cnt > @MinimumNumberOfOccurances
    FOR     XML PATH('')
)
, 1, 1, '')

DECLARE @TableHeader VARCHAR(1000) = STUFF(
(
    SELECT  ',MAX([' + Item + ']) AS [Common' + 
            CAST((ROW_NUMBER() OVER 
                (ORDER BY cnt DESC, LEN(Item) DESC) - 1) 
                    AS VARCHAR(5)) 
            + ']'
    FROM    #SplitedStringsGrouped
    WHERE   cnt > @MinimumNumberOfOccurances
    FOR     XML PATH('')
)
, 1, 1, '')

SELECT  ID,
        Item,
        ROW_NUMBER() OVER 
            (PARTITION BY ID ORDER BY ID) Num
INTO    #NonCommon
FROM    #ResultTable
WHERE   cnt IS NULL

DECLARE @sql VARCHAR(1000) = 
'
SELECT  MAX(pvt.ID) ID, MAX(pvt.Col) [Column],
        '+@TableHeader+',
        RTRIM((
            SELECT  a.Item + '' ''
            FROM    #NonCommon a
            WHERE   a.ID = pvt.ID
            FOR     XML PATH('''')
        )) NonCommon        
FROM    #ResultTable a
PIVOT   (
    MAX(Item) FOR Item IN ('+@ColumnNames+')
) pvt
GROUP  BY pvt.ID
'

EXEC(@sql)

DROP TABLE #t
DROP TABLE #SplitedStringsGrouped
DROP TABLE #SplitedStrings
DROP TABLE #ResultTable
DROP TABLE #NonCommon

你能用英语解释一下你的目标吗？？只是给我们输入和输出，让我们猜测你想做什么并没有多大帮助…..为什么文本优先于普通文本1？是基于长度吗？。将某事物定义为普通事物的规则是什么？是否存在于75%的行中？除了一个？。是否有任何行计数4？查找行之间的公共模式。所有行中都会显示“样本”。所以我可以把它保留为最常见的单词，“text”，“is”，第二个最常见的单词，可以在第1,2,3行找到。所有其他单词与其他行不匹配，将被移动到非常用类别！！不客气，我很高兴它能起作用。我希望有一个更简单的解决方案，但这是我唯一能想到的。您可以将不同的值设置为@MinimumNumberOfOfOccurrances，以定义一个单词要被视为普通单词，必须找到多少次。