Sql server 基于SQL Server的留言板中的单词流行排行榜

Sql server 基于SQL Server的留言板中的单词流行排行榜,sql-server,split,group-by,sum,alphanumeric,Sql Server,Split,Group By,Sum,Alphanumeric,在SQL server数据库中,我有一个包含以下列的消息表: Id INT1,1 详细信息VARCHAR5000 日期时间输入日期时间 Personented VARCHAR25 消息非常基本,只允许字母数字字符和少数特殊字符,如下所示: `¬!"£$%^&*()-_=+[{]};:'@#~\|,<.>/? 它不需要特别聪明。如果把“不”和“不”作为单独的词来对待,那就完全可以了 我无法将单词拆分成一个名为单词的临时表 一旦有了临时表,我将应用以下查询: SELECT

在SQL server数据库中,我有一个包含以下列的消息表:

Id INT1,1 详细信息VARCHAR5000 日期时间输入日期时间 Personented VARCHAR25 消息非常基本,只允许字母数字字符和少数特殊字符,如下所示:

`¬!"£$%^&*()-_=+[{]};:'@#~\|,<.>/?
它不需要特别聪明。如果把“不”和“不”作为单独的词来对待,那就完全可以了

我无法将单词拆分成一个名为单词的临时表

一旦有了临时表,我将应用以下查询:

SELECT 
    Word, 
    SUM(Word) AS WordCount 
FROM #Words 
GROUP BY Word 
ORDER BY SUM(Word) DESC

请帮忙。

就我个人而言,我会去掉几乎所有的特殊字符,然后在空格字符上使用拆分器。在您允许的字符中,单词中只会出现';其他任何东西都要符合语法

您还没有发布您使用的SQL的版本,因此我将使用SQL Server 2017语法。如果没有最新版本,则需要使用嵌套替换替换TRANSLATE,以便替换。。。REPLACEM.Detail,,,,,,,,,,,,,,并查找字符串拆分器,例如Jeff Moden的


值得注意的是,这将表现得非常出色。SQL Server不是为此类工作而设计的。我还想你会得到一些奇怪的结果,其中会包括数字。日期等内容将被拆分,9000000等数字将被视为9和000,超链接将被分隔。

这完全可以。多谢各位
SELECT 
    Word, 
    SUM(Word) AS WordCount 
FROM #Words 
GROUP BY Word 
ORDER BY SUM(Word) DESC
USE Sandbox;
GO
CREATE TABLE [Messages] (Detail varchar(5000));

INSERT INTO [Messages]
VALUES ('Personally, I would strip out almost all the special characters, and then use a splitter on the space character. Of your permitted characters, only `''` is going to appear in a word; anything else is going to be grammatical. You haven''t posted what version of SQL you''re using, so I''ve going to use SQL Server 2017 syntax. If you don''t have the latest version, you''ll need to replace `TRANSLATE` with a nested `REPLACE` (So `REPLACE(REPLACE(REPLACE(REPLACE(... REPLACE(M.Detail, ''¬'','' ''),...),''/'','' ''),''?'','' '')`, and find a string splitter (for example, Jeff Moden''s [DelimitedSplit8K](http://www.sqlservercentral.com/articles/Tally+Table/72993/)).'),
       ('As a note, this is going to perform **AWFULLY**. SQL Server is not designed for this type of work. I also imagine you''ll get some odd results and it''ll include numbers in there. Things like dates are going to get split out,, numbers like `9,000,000` would be treated as the words `9` and `000`, and hyperlinks will be separated.')
GO
WITH Replacements AS(
    SELECT TRANSLATE(Detail, '`¬!"£$%^&*()-_=+[{]};:@#~\|,<.>/?','                                 ') AS StrippedDetail
    FROM [Messages] M)
SELECT SS.[value], COUNT(*) AS WordCount
FROM Replacements R
     CROSS APPLY string_split(R.StrippedDetail,' ') SS
WHERE LEN(SS.[value]) > 0
GROUP BY SS.[value]
ORDER BY WordCount DESC;
GO
DROP TABLE [Messages];