MySQL在不使用主键的情况下删除重复项
我在MySQL数据库中有这个表MySQL在不使用主键的情况下删除重复项,mysql,database,csv,duplicates,key,Mysql,Database,Csv,Duplicates,Key,我在MySQL数据库中有这个表 1 test1.csv Jan Thomas Sales 5000 2 test1.csv Jan Michael Sales 200 3 test1.csv Thomas John Technology 12900 4 test2.csv Robert James Technology 5500 5 test2.csv Robert
1 test1.csv Jan Thomas Sales 5000
2 test1.csv Jan Michael Sales 200
3 test1.csv Thomas John Technology 12900
4 test2.csv Robert James Technology 5500
5 test2.csv Robert Albertson Technology 6000
6 test2.csv Mark Jeffries Technology 900
7 test2.csv Ted James Technology 10000
8 test2.csv Mayla Arthurs Technology 7000
9 test2.csv Mayla Smith Technology 9500
10 test3.csv Mayla Anthony Technology 3000
11 test3.csv Mayla Mark Technology 3000
12 test4.csv Mayla Roberts Technology 8500
13 test4.csv Anthony Anderson Marketing 9500
14 test5.csv Anthony Smith Technology 6000
15 test5.csv Jan Thomas Sales 5000
16 test5.csv Jan Michael Sales 200
17 test5.csv Thomas John Technology 12900
18 test1.csv Jan Michael Sales 8000
19 test1.csv Thomas John Technology 1540
20 test2.csv Mayla Smith Technology 10500
21 test3.csv Mayla Anthony Technology 5600
22 test4.csv Anthony Anderson Marketing 2500
23 test5.csv Brian Earl HR 1200
24 test5.csv John Smith HR_Sales 2000
25 test6.csv Jan Thomas HR_Sales 12000
26 test6.csv Jan Michael Education 1500
27 test7.csv Thomas John HR_Sales 1000
创建表的SQL代码在本文末尾。每个记录由文件名、姓氏、姓氏、部门、工资组成。有时,同一记录存在于多个文件中-我不能有这些重复的记录
如你所见:
id=15、16、17分别是id=1、2、3的副本
我需要删除文件名不同但记录相同的重复项
其他信息:
DELETE FROM employee WHERE id IN(15,16,17)
,因为
我不知道哪些行将被复制*.csv
文件不断更新。这意味着,如果我创建了一个新的索引列,那么我不能附加包含数据库中已有记录副本的*.csv
文件。因此,我不能使用索引列或groupby()
您可以使用MySQL中的
delete
和join
删除重复项:
delete e
from employee e left join
(select firstname, lastname, dept, salary, min(filename) as filename
from employee e
group by firstname, lastname, dept, salary
) tokeep
on e.firstname = tokeep.firstname and e.lastname = tokeep.lastname and
e.dept = tokeep.dept and e.salary = tokeep.salary and
tokeep.filename = e.filename
where tokeep.filename is null;
听起来您需要一个单独的表来保存员工和文件之间的映射,然后同一个员工可以在多个文件中。您只需对除id和文件名之外的所有字段设置唯一约束即可。加载CSV时,您可以简单地告诉它忽略重复的行错误并继续
delete e
from employee e left join
(select firstname, lastname, dept, salary, min(filename) as filename
from employee e
group by firstname, lastname, dept, salary
) tokeep
on e.firstname = tokeep.firstname and e.lastname = tokeep.lastname and
e.dept = tokeep.dept and e.salary = tokeep.salary and
tokeep.filename = e.filename
where tokeep.filename is null;