Mysql 查找列值重复/相似的行

Mysql 查找列值重复/相似的行,mysql,select,duplicates,Mysql,Select,Duplicates,我想从下表中选择fname列中与第一行具有相似值的所有行。从这个表中,我想检索ID为2、5和7的行(因为“anna”在“anna”之后,“michaela”和“michaal”在“michael”之后) 到目前为止,我得到的是: select *, count(fname) cnt from users group by soundex(fname) having count(soundex(fname)) > 1; 但由于我将其分组,结果是 +----+----------+---

我想从下表中选择fname列中与第一行具有相似值的所有行。从这个表中,我想检索ID为2、5和7的行(因为“anna”在“anna”之后,“michaela”和“michaal”在“michael”之后)

到目前为止,我得到的是:

select *, count(fname) cnt 
from users group by soundex(fname) 
having count(soundex(fname)) > 1;
但由于我将其分组,结果是

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  1 | anna     | milski   |   2 |
|  3 | michael  | michaels |   3 |
+----+----------+----------+-----+
我想要的是:

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  2 |  anna    | nyugen   |   2 |
|  5 | michaela | king     |   3 |
|  7 | michaal  | hardy    |   3 |
+----+----------+----------+-----+

关于查询,我应该更改什么?我尝试删除“分组依据”,但它改变了结果(我可能错了,没有对其进行广泛测试)。

您似乎得到了您想要的-
SOUNDEX(fname)
将只对名字而不是整个字符串进行SOUNDEX哈希。您可以研究以下几个选项:

SELECT *, COUNT(SOUNDEX(CONCAT(fname, lname))) AS cnt GROUP BY SOUNDEX(CONCAT(fname, lname)) HAVING cnt > 1;
这取决于您想要实现什么:计算相似的名字、姓氏或两者的一些synth散列。

我重新阅读了您最初的问题,并提出了以下解决方案:

SELECT *
FROM   users
WHERE  id IN
       (SELECT id
       FROM    users t4
               INNER JOIN
                       (SELECT  soundex(fname) AS snd,
                                COUNT(*)       AS cnt
                       FROM     users          AS t5
                       GROUP BY snd
                       HAVING   cnt > 1
                       )
                       AS t6
               ON      soundex(t4.fname)=snd
       )
AND    id NOT IN
       (SELECT  MIN(t2.id) AS wanted
       FROM     users t2
                INNER JOIN
                         (SELECT  soundex(fname) AS snd,
                                  COUNT(*)       AS cnt
                         FROM     users          AS t1
                         GROUP BY snd
                         HAVING   cnt > 1
                         )
                         AS t3
                ON       soundex(t2.fname)=snd
       GROUP BY snd
       );

这有点太复杂了,但它工作起来并完全满足了您的要求:)

这并不能解决我的问题,请重新阅读问题。您确定您将在soundex上同时获得
michaela
Michael
?我怀疑,你会得到他们中的任何一个。那没关系,如果它困扰你,你可以忽略它。呜呜!就是这样。你太棒了!非常感谢你。我真的很感激,这肯定会帮我在工作中省下$a:)顺便说一句,我自己也试过,但没有成功。我尝试完全删除“AND id NOT IN”子句,而是在某个地方提供带有“id>min(id)”的查询,这样它只会按顺序返回第二行(这是我想要的,看起来不那么复杂)。你们知道怎么做吗?若你们使用的是GROUPBY,你们不能只返回一行(“聚合的”)行来绕过GROUP。通过使用
id>MIN(id)
您仍然只能得到一行,但这次是第二行,而不是第一行。
SELECT *, COUNT(SOUNDEX(fname)) AS cnt1, COUNT(SOUNDEX(lname)) AS cnt2
GROUP BY SOUNDEX(fname), SOUNDEX(lname)
HAVING cnt1 > 1 OR cnt2 > 1
SELECT *
FROM   users
WHERE  id IN
       (SELECT id
       FROM    users t4
               INNER JOIN
                       (SELECT  soundex(fname) AS snd,
                                COUNT(*)       AS cnt
                       FROM     users          AS t5
                       GROUP BY snd
                       HAVING   cnt > 1
                       )
                       AS t6
               ON      soundex(t4.fname)=snd
       )
AND    id NOT IN
       (SELECT  MIN(t2.id) AS wanted
       FROM     users t2
                INNER JOIN
                         (SELECT  soundex(fname) AS snd,
                                  COUNT(*)       AS cnt
                         FROM     users          AS t1
                         GROUP BY snd
                         HAVING   cnt > 1
                         )
                         AS t3
                ON       soundex(t2.fname)=snd
       GROUP BY snd
       );