Sql 如何计算表中每个关键字在短语表中出现的次数?
假设我有一个名为短语的表,其中包含一些文本字符串Sql 如何计算表中每个关键字在短语表中出现的次数?,sql,sql-server-2008,Sql,Sql Server 2008,假设我有一个名为短语的表,其中包含一些文本字符串 +--+---------------+ |ID|PHRASE | +--+---------------+ |0 |"HELLO BYE YES"| +--+---------------+ |1 |"NO WHY NOT" | +--+---------------+ |2 |"NO YES" | +--+---------------+ 我想将下列单词出现的次数添加到“出现”列中,我们将此表称为“关键字”:
+--+---------------+
|ID|PHRASE |
+--+---------------+
|0 |"HELLO BYE YES"|
+--+---------------+
|1 |"NO WHY NOT" |
+--+---------------+
|2 |"NO YES" |
+--+---------------+
我想将下列单词出现的次数添加到“出现”列中,我们将此表称为“关键字”:
我现在想编写一个查询,将关键字更新为以下内容:
+--------+----------+
|KEYWORD |OCCURRENCE|
+--------+----------+
|"YES" |2 |
+--------+----------+
|"NO" |2 |
+--------+----------+
|"HELLO" |1 |
+--------+----------+
|"CHEESE"|0 |
+--------+----------+
请注意,我已经得到了一个名为dbo.RegExIsMatch的函数,它可以处理字符串匹配,这样,如果参数1与参数2中的字符串匹配,它将返回1:
UPDATE KEYWORDS SET OCCURRENCE =
(
SELECT SUM
(
-- the following returns 1 if the keyword exists in the phrase, or 0 otherwise
CASE WHEN dbo.RegExIsMatch('.*' + KEYWORDS.KEYWORD + '.*',PHRASES.PHRASE,1) = 1 THEN 1 ELSE 0 END
)
FROM PHRASES
CROSS JOIN KEYWORDS
)
但这不起作用,它只是用相同的数字填充每一行。我确信这是一个很简单的问题,我只是在绞尽脑汁思考SQL。您的查询有三个不同的表,但问题只有两个。这就是你的意思吗
UPDATE Keywords
SET OCCURRENCE = (SELECT SUM(CASE WHEN dbo.RegExIsMatch('.*' + KEYWORDS.KEYWORD + '.*',PHRASES.PHRASE,1) = 1
THEN 1 ELSE 0
END)
FROM PHRASES
);
否则,如果有三个表,则需要将子查询与外部表关联起来。这似乎可行
MERGE INTO KEYWORDS masterList
USING (
SELECT COUNT(*) AS OCCURRENCE,KEYWORDS.KEYWORD AS KEYWORD FROM
KEYWORDS AS keywordList
CROSS JOIN PHRASES AS phraseList
WHERE (dbo.RegExIsMatch('.*' + keywordList.KEYWORD + '.*',phraseList.PHRASE,1) = 1)
GROUP BY KEYWORD
) frequencyList
ON (masterList.KEYWORD = frequencyList.KEYWORD)
WHEN MATCHED THEN
UPDATE SET masterList.OCCURRENCE = frequencyList.OCCURRENCE;
试着用这种方法,从我的角度看是有效的 -------表创建
declare @PHRASE table (ID int,PHRASE varchar(max))
insert into @PHRASE
select 0,'"Hello Bye Yes"'
union all
select 1,'"No Why Not"'
union all
select 2,'"No Yes"'
select * from @PHRASE
declare @Keywords table (KEYWORD varchar(10),OCCURANCE int)
insert into @Keywords
select 'YES',null
union all
select 'NO',null
union all
select 'HELLO',null
union all
select 'CHEESE',null
select * from @Keywords
----------Script for requirement
create table #table (name varchar(max),)
DECLARE @str VARCHAR(25)
DECLARE curs_Fp CURSOR FOR
SELECT c.PHRASE FROM @PHRASE c
OPEN curs_Fp
FETCH NEXT FROM curs_Fp INTO @str
WHILE @@FETCH_STATUS = 0
BEGIN
while patindex('%["]%',@str) > 0
SET @str = REPLACE( @str, SUBSTRING( @str, patindex('%["]%',@str), 1 ),'')
set @str = @str+' '
WHILE CHARINDEX(' ', @str) > 0
BEGIN
DECLARE @tmpstr VARCHAR(50)
SET @tmpstr = SUBSTRING(@str, 1, ( CHARINDEX(' ', @str) - 1 ))
insert into #table (name) select @tmpstr
SET @str = SUBSTRING(@str, CHARINDEX(' ', @str) + 1, LEN(@str))
END
FETCH NEXT FROM curs_Fp INTO @str
END
CLOSE curs_Fp
DEALLOCATE curs_Fp
update y
set y.OCCURANCE = isnull(x.occurance,0)
from
@Keywords y
left join
--#table x on y.keyword = x.name
(select a.name,count(a.name) occurance from #table a group by a.name) x on y.KEYWORD = x.name
select * from @Keywords
drop table #table
由于我没有要测试的函数dbo.RegExIsMatch,所以我仅使用sqlserver开箱即用的东西给出了这个稍有不同的示例 您可能在任何地方都得到了1的计数,因为您使用的是不带GROUP BY的SUM 请注意,这不是100%准确,因为我没有使用正则表达式,只是简单的愚蠢字符串函数,但如果您要修改正则表达式函数来执行正则表达式替换,您可以将我的调用替换为该函数,这将为您提供正确的结果 另一个微小的变化是将所有关键字的初始值设置为0,而不是NULL 还要注意的是,我不再做交叉连接,而是对包含单词的短语进行连接,这样,出现的情况不会被多次覆盖,我想这也是在你的情况下发生的情况
INSERT INTO KEYWORDS (KEYWORD, OCCURRENCE)
SELECT 'YES', 0
UNION
SELECT 'NO', 0
UNION
SELECT 'HELLO', 0
UNION
SELECT 'CHEESE', 0;
UPDATE KEYWORDS SET KEYWORDS.OCCURRENCE = KEYWORDS.OCCURRENCE +
(LEN(PHRASES.PHRASE) - LEN(REPLACE(PHRASES.PHRASE, KEYWORDS.KEYWORD, ''))) / LEN(KEYWORDS.KEYWORD)
FROM KEYWORDS
INNER JOIN PHRASES ON CHARINDEX(KEYWORDS.KEYWORD, PHRASES.PHRASE) > 0;
PS:对于那个简单愚蠢的字符串计数,我使用了从这个中稍微修改的代码,是的,那是一个错误,我曾试图简化这个问题的原始代码,但没有简化那一个..您可以通过将条件移动到WHERE子句并计数而不是求和来简化它:SET OCCURRENCE=SELECT Count*FROM Phrases WHERE dbo.RegExIsMatch…=1这并没有将关键字CHEESE的出现次数设置为0。我认为他不需要计算单词在每个短语中出现的次数,也就是说,YES只计算一次,而不是3次,这样一个简单的类似表达式就可以完成这项工作。
INSERT INTO KEYWORDS (KEYWORD, OCCURRENCE)
SELECT 'YES', 0
UNION
SELECT 'NO', 0
UNION
SELECT 'HELLO', 0
UNION
SELECT 'CHEESE', 0;
UPDATE KEYWORDS SET KEYWORDS.OCCURRENCE = KEYWORDS.OCCURRENCE +
(LEN(PHRASES.PHRASE) - LEN(REPLACE(PHRASES.PHRASE, KEYWORDS.KEYWORD, ''))) / LEN(KEYWORDS.KEYWORD)
FROM KEYWORDS
INNER JOIN PHRASES ON CHARINDEX(KEYWORDS.KEYWORD, PHRASES.PHRASE) > 0;