Mysql 对两个文本列中匹配的字符数进行计数的SQL查询
我需要计算两个文本列中有多少个字符相等(相同大小,在同一个表中)。 例如:Mysql 对两个文本列中匹配的字符数进行计数的SQL查询,mysql,sql,pattern-matching,Mysql,Sql,Pattern Matching,我需要计算两个文本列中有多少个字符相等(相同大小,在同一个表中)。 例如: RowNum: Template: Answers: ------- --------- -------- 1 ABCDEABCDEABCDE ABCDAABCDBABCDC 2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB 选择一些应返回的函数(模板、答案): RowNum: Result: ------- ------- 1
RowNum: Template: Answers:
------- --------- --------
1 ABCDEABCDEABCDE ABCDAABCDBABCDC
2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB
选择一些应返回的函数(模板、答案):
RowNum: Result:
------- -------
1 12
2 10
数据库是MySQL。如果您运行的是MySQL 8.0,则可以使用递归查询逐字符比较字符串:
with recursive chars as (
select rownum, template, answers, 1 idx, 0 res from mytable
union all
select
rownum,
template,
answers,
idx + 1,
res + ( substr(template, idx, 1) = substr(answers, idx, 1) )
from chars
where idx <= least(char_length(template), char_length(answers))
)
select rownum, max(res) result from chars group by rownum order by rownum
请记住,对于大型数据集,此查询不会很好地执行。还有一些效率稍高的解决方案(通常使用数字表而不是递归cte),但正如Gordon Linoff所评论的那样,如果需要运行此类查询,基本上需要修复数据结构。您应该将每个字符及其
rownum
索引存储在单独的行中。具体化适当的数据结构,这样您就不需要在每次查询中动态生成它。不完全是MySQL,但这里有一些在SQL Server中工作的东西。也许它会被翻译过来
DROP TABLE IF EXISTS #tmp
CREATE TABLE #tmp (
[RowNum] INT IDENTITY(1,1) PRIMARY KEY,
[Template] NVARCHAR(20),
[Answer] NVARCHAR(20),
[Result] INT
)
INSERT INTO #tmp
VALUES ('ABCDEABCDEABCDE','ABCDAABCDBABCDC', NULL),
('EDAEDAEDAEDAEDA','EDBEDBEDBEDBEDB', NULL)
--SELECT * FROM #tmp
DECLARE @current_template NVARCHAR(50) -- Variable to hold the current template
, @current_answer NVARCHAR(50) -- Variable to hold the current answer
, @template_char CHAR(1) -- Char for template letter
, @answer_char CHAR(1) -- Char for answer letter
, @word_index INT -- Index (position) within each word
, @match_counter INT -- Match counter for each word
, @max_iter INT = (SELECT TOP 1 RowNum FROM #tmp ORDER BY RowNum DESC) -- Max iterations
, @row_idx INT = (SELECT TOP 1 RowNum FROM #tmp) -- Minimum RowNum as initial row index value.
WHILE (@row_idx <= @max_iter)
BEGIN
SET @match_counter = 0 -- Reset match counter for each row
SET @word_index = 1 -- Reset word index for each row
SET @current_template = (SELECT [Template] FROM #tmp WHERE RowNum = @row_idx)
SET @current_answer = (SELECT [Answer] FROM #tmp WHERE RowNum = @row_idx)
WHILE (@word_index <= LEN(@current_template))
BEGIN
SET @template_char = SUBSTRING(@current_template, @word_index, 1)
SET @answer_char = SUBSTRING(@current_answer, @word_index, 1)
IF (@answer_char = @template_char)
BEGIN
SET @match_counter += 1
END
SET @word_index += 1
END
UPDATE #tmp
SET Result = @match_counter
WHERE RowNum = @row_idx
SET @row_idx += 1
END
输出:
RowNum Template Answer Result
1 ABCDEABCDEABCDE ABCDAABCDBABCDC 12
2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB 10
@峡谷。这不是SQL的好用例。如果你想像这样存储多个值,你应该为每个字符使用单独的行。也许我应该用视图“规范化”它。如果您使用的是MySQL 8+,GMB的解决方案是很好的。
SELECT * FROM #tmp
RowNum Template Answer Result
1 ABCDEABCDEABCDE ABCDAABCDBABCDC 12
2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB 10