Mysql 基于一列删除重复项,并将具有值的行保留在不同的列中,如果没有,则保留ID最低的行

Mysql 基于一列删除重复项,并将具有值的行保留在不同的列中,如果没有,则保留ID最低的行,mysql,sql,Mysql,Sql,在Google Cloud上使用MySQL 5.7,我试图基于EmailAddress列消除MySQL数据的重复数据,但有些行在FullName列中有值,有些行没有。我想保留那些在FullName列中有值的行,但是如果EmailAddress值的行中没有一行是FullName值,那么只保留ID号最低的第一列-主键的副本 我最后将其分解为两个单独的查询,一个是首先删除FullName列中没有值的行,如果FullName列中有另一个重复的行,则删除该行: DELETE FROM customer_i

在Google Cloud上使用MySQL 5.7,我试图基于EmailAddress列消除MySQL数据的重复数据,但有些行在FullName列中有值,有些行没有。我想保留那些在FullName列中有值的行,但是如果EmailAddress值的行中没有一行是FullName值,那么只保留ID号最低的第一列-主键的副本

我最后将其分解为两个单独的查询,一个是首先删除FullName列中没有值的行,如果FullName列中有另一个重复的行,则删除该行:

DELETE
FROM customer_info
WHERE id IN
(
 SELECT * 
 FROM
    (   
     SELECT c1.id
     FROM customer_info c1
            INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
     WHERE 
            (trim(c1.FullName)='' or c1.FullName is NULL)
            and c2.FullName is not NULL
            and length(trim(c2.FullName))!=0
    ) t
)
DELETE
FROM customer_info
WHERE id IN
(
 SELECT * 
 FROM
    (   
     SELECT c1.id
     FROM customer_info c1
            INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
    ) t
)
以及另一个查询,用于删除在FullName列中找不到值的ID较大的行:

DELETE
FROM customer_info
WHERE id IN
(
 SELECT * 
 FROM
    (   
     SELECT c1.id
     FROM customer_info c1
            INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
     WHERE 
            (trim(c1.FullName)='' or c1.FullName is NULL)
            and c2.FullName is not NULL
            and length(trim(c2.FullName))!=0
    ) t
)
DELETE
FROM customer_info
WHERE id IN
(
 SELECT * 
 FROM
    (   
     SELECT c1.id
     FROM customer_info c1
            INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
    ) t
)
这是可行的,但不是真的。有一次,当我让它整夜运行以获取较小的数据段时,它工作了,当我醒来时,出现了一个错误,但我查看了数据,它就完成了

<>我在查询中漏掉了一些东西,这使得它效率很低,或者只是因为这类查询的过程是否平淡无奇,而在我的代码中没有可能会做出切实的改进呢?我已经将一个Google Cloud SQL实例最大化为db-n1-highmem-32大小,内存为32 GB,存储空间为1000 GB,但在运行了一个小时后,它仍然阻塞并抛出了一个2013年的错误。我总共需要做300多万行

例如,这:

id | FullName      | EmailAddress            |
----------------------------------------------
1  | John Doe      | john.doe@email.com      |
2  | null          | janedoe@box.com         |
3  | null          | billybob@bobby.com      |
4  | null          | john.doe@email.com      |
5  | John Lennon   | jlennon@yoohoo.com      |
6  | null          | james.smith@coolmail.com|
7  | null          | billybob@bobby.com      |
8  | Jane Doe      | janedoe@box.com         |
这将导致:

id | FullName      | EmailAddress            |
----------------------------------------------
1  | John Doe      | john.doe@email.com      |
3  | null          | billybob@bobby.com      |
5  | John Lennon   | jlennon@yoohoo.com      |
6  | null          | james.smith@coolmail.com|
8  | Jane Doe      | janedoe@box.com         |
在这种情况下,使用exists可能更简单

delete
from customer_info c
where (trim(c.FullName)='' or c.FullName is null)
  and exists (   
     select 1
     from customer_info i
     where i.Email = c.EmailAddress 
      and trim(i.FullName)>'' 
  )

delete
from customer_info c
where exists (
     select 1
     from customer_info i
     where i.Email = c.EmailAddress 
       and i.id < c.id
  )

你能提供一个样本数据的例子吗?@Suraz我已经在原始帖子中添加了这个例子。