Mysql 一列中有多个文本值，需要查询以查找最可重复的单词_Mysql

Mysql 一列中有多个文本值，需要查询以查找最可重复的单词

mysql

Mysql 一列中有多个文本值，需要查询以查找最可重复的单词,mysql,Mysql,我有一个列，存储用户的简历/标题。它是由用户自定义编写的，可以有任意多个单词 id title 1 Business Development Executive Cold Calling & Cold Emailing expert Entrepreneur 2 Director of Online Marketing and entrepreneur 3 Art Director and Entrepreneur 4 Corporate Development at Yaho

我有一个列，存储用户的简历/标题。它是由用户自定义编写的，可以有任意多个单词

id title
1  Business Development Executive Cold Calling & Cold Emailing expert Entrepreneur
2  Director of Online Marketing and entrepreneur
3  Art Director and Entrepreneur 
4  Corporate Development at Yahoo!
5  Snr Program Manager, Yahoo

我试图找出一个显示词频的mysql查询：

Entrepreneur 3
development  2
director     2

我知道如果我可以将值中的每个单词作为单独的行返回，那么我就可以使用普通分组。我已经看过了，但找不到一个函数，它可以将文本拆分成单独一行的单词

可以这样做吗？

尝试选择所有职务并将其作为数组返回。然后在php中执行以下操作：

<?php
$array = array("Business Development Executive Cold Calling & Cold Emailing expert  Entrepreneur ", "Director of Online Marketing and entrepreneur", "Art Director and Entrepreneur", "Corporate Development at Yahoo!", "Snr Program Manager, Yahoo");
$words = "";
foreach($array as $val) $words .= " ".strtolower($val);
print_r(array_count_values(str_word_count($words, 1)));
?>

尝试选择所有职务并将其作为数组返回。然后在php中执行以下操作：

<?php
$array = array("Business Development Executive Cold Calling & Cold Emailing expert  Entrepreneur ", "Director of Online Marketing and entrepreneur", "Art Director and Entrepreneur", "Corporate Development at Yahoo!", "Snr Program Manager, Yahoo");
$words = "";
foreach($array as $val) $words .= " ".strtolower($val);
print_r(array_count_values(str_word_count($words, 1)));
?>

您可以通过加入一个用于挑选第n个单词的人工数字系列来实现这一点。不幸的是，mysql在生成序列时没有内置方法，所以有点难看，但这里是：

select
  substring_index(substring_index(title, ' ', num), ' ', -1) word,
  count(*) count
from job j
join (select 1 num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 10 union select 11 union select 12) n
on length(title) >= length(replace(title, ' ', '')) + num - 1
group by 1
order by 2 desc

请参阅使用数据并生成预期输出的示例

遗憾的是，必须对数字系列的每个值进行硬编码的限制也限制了要处理的列的字数（在本例中为12）。如果序列中有太多的数字，这并不重要，而且您始终可以添加更多的数字以覆盖更大的预期输入文本。

您可以通过加入用于挑选第n个单词的人造数字序列来实现这一点。不幸的是，mysql在生成序列时没有内置方法，所以有点难看，但这里是：

select
  substring_index(substring_index(title, ' ', num), ' ', -1) word,
  count(*) count
from job j
join (select 1 num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 10 union select 11 union select 12) n
on length(title) >= length(replace(title, ' ', '')) + num - 1
group by 1
order by 2 desc

请参阅使用数据并生成预期输出的示例

遗憾的是，必须对数字系列的每个值进行硬编码的限制也限制了要处理的列的字数（在本例中为12）。如果序列中的数字太多，这并不重要，而且您可以添加更多的数字以覆盖更大的预期输入文本。

不容易。SQL不太适合这样的任务-您最好查询bios并用（大概是服务器端）代码解析它们。您的设计有点缺陷。您的职务标题表中应该有一个类别表引用类别ID。请其他人投票重新打开该表好吗？我为它想出了一个解决方案，但在我发布它之前它就被关闭了。@Bohemian-你得到了我的选票！：）不容易。SQL不太适合这样的任务-您最好查询bios并用（大概是服务器端）代码解析它们。您的设计有点缺陷。您的职务标题表中应该有一个类别表引用类别ID。请其他人投票重新打开该表好吗？我为它想出了一个解决方案，但在我发布它之前它就被关闭了。@Bohemian-你得到了我的选票！：）我们需要做的不是逐行逐列地对一列中的所有数据进行排序，然后给出总计数。请看我提供的例子。我相信这正是我的例子所做的。我将包含所有“bio/title”结果的数组与foreach循环组合成一个长字符串。然后我返回所有“bio/title”条目中使用的每个单词的数组，并计算它们出现的次数。如果这不是你想要的，请澄清什么是错的/缺失的。想要在mysql中使用它，而不是PHP我们需要这样做，而不是逐行逐列地删除一列中的所有数据，然后给出总体计数。请看我提供的例子。我相信这正是我的例子所做的。我将包含所有“bio/title”结果的数组与foreach循环组合成一个长字符串。然后我返回所有“bio/title”条目中使用的每个单词的数组，并计算它们出现的次数。如果这不是你想要的，请澄清什么是错的/缺失的