Mysql 基于一列删除重复项,并将具有值的行保留在不同的列中,如果没有,则保留ID最低的行
在Google Cloud上使用MySQL 5.7,我试图基于EmailAddress列消除MySQL数据的重复数据,但有些行在FullName列中有值,有些行没有。我想保留那些在FullName列中有值的行,但是如果EmailAddress值的行中没有一行是FullName值,那么只保留ID号最低的第一列-主键的副本 我最后将其分解为两个单独的查询,一个是首先删除FullName列中没有值的行,如果FullName列中有另一个重复的行,则删除该行:Mysql 基于一列删除重复项,并将具有值的行保留在不同的列中,如果没有,则保留ID最低的行,mysql,sql,Mysql,Sql,在Google Cloud上使用MySQL 5.7,我试图基于EmailAddress列消除MySQL数据的重复数据,但有些行在FullName列中有值,有些行没有。我想保留那些在FullName列中有值的行,但是如果EmailAddress值的行中没有一行是FullName值,那么只保留ID号最低的第一列-主键的副本 我最后将其分解为两个单独的查询,一个是首先删除FullName列中没有值的行,如果FullName列中有另一个重复的行,则删除该行: DELETE FROM customer_i
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
WHERE
(trim(c1.FullName)='' or c1.FullName is NULL)
and c2.FullName is not NULL
and length(trim(c2.FullName))!=0
) t
)
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
) t
)
以及另一个查询,用于删除在FullName列中找不到值的ID较大的行:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
WHERE
(trim(c1.FullName)='' or c1.FullName is NULL)
and c2.FullName is not NULL
and length(trim(c2.FullName))!=0
) t
)
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
) t
)
这是可行的,但不是真的。有一次,当我让它整夜运行以获取较小的数据段时,它工作了,当我醒来时,出现了一个错误,但我查看了数据,它就完成了
<>我在查询中漏掉了一些东西,这使得它效率很低,或者只是因为这类查询的过程是否平淡无奇,而在我的代码中没有可能会做出切实的改进呢?我已经将一个Google Cloud SQL实例最大化为db-n1-highmem-32大小,内存为32 GB,存储空间为1000 GB,但在运行了一个小时后,它仍然阻塞并抛出了一个2013年的错误。我总共需要做300多万行
例如,这:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe@email.com |
2 | null | janedoe@box.com |
3 | null | billybob@bobby.com |
4 | null | john.doe@email.com |
5 | John Lennon | jlennon@yoohoo.com |
6 | null | james.smith@coolmail.com|
7 | null | billybob@bobby.com |
8 | Jane Doe | janedoe@box.com |
这将导致:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe@email.com |
3 | null | billybob@bobby.com |
5 | John Lennon | jlennon@yoohoo.com |
6 | null | james.smith@coolmail.com|
8 | Jane Doe | janedoe@box.com |
在这种情况下,使用exists可能更简单
delete
from customer_info c
where (trim(c.FullName)='' or c.FullName is null)
and exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and trim(i.FullName)>''
)
delete
from customer_info c
where exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and i.id < c.id
)
你能提供一个样本数据的例子吗?@Suraz我已经在原始帖子中添加了这个例子。