MYSQL PHP基于重复列将多行合并为一行
我有一个电子邮件列表,其中有很多重复数据,我想合并在某一列中有重复数据的行 这是我的桌子:MYSQL PHP基于重复列将多行合并为一行,php,mysql,merge,duplicates,Php,Mysql,Merge,Duplicates,我有一个电子邮件列表,其中有很多重复数据,我想合并在某一列中有重复数据的行 这是我的桌子: autoid,title,lastname,firstname,middlename,prefix, fulladdress,address1,address2, city,state,zip,country,county,phone1,phone2,email,id, ts 我想合并基于电子邮件和电话的重复行1。如果两行中的内容相同,那么我希望合并这些行并填充任何空格,然后删除第二行。autoid较低
autoid,title,lastname,firstname,middlename,prefix,
fulladdress,address1,address2,
city,state,zip,country,county,phone1,phone2,email,id, ts
我想合并基于电子邮件和电话的重复行1。如果两行中的内容相同,那么我希望合并这些行并填充任何空格,然后删除第二行。autoid较低的行中的数据优先于id较高的行
如果我们可以用一个mysql查询来实现这一点,那就太好了,但是如果我们必须使用PHP,那也可以
通过电子邮件、电话分组1
如果您只插入这个,您将得到任意一个组合行。如果希望值优先于NULL
字段,可以使用聚合函数,例如MIN
:
选择MIN(标题)、MIN(姓氏)…
从表名
通过电子邮件、电话进行分组1
但这将决定对每行分别采用哪个值。在MySQL中,合并行但按照您描述的方式进行的查询相当棘手。您可以有一个查询,该查询按照匹配列的顺序列出所有行,然后使用用户变量来填补空白。但是,不填补不匹配行中的空白将是困难的,因此每个匹配对使用一个子查询可能会更好。除了性能和查询的可读性之外,总的来说,使用PHP解决方案可能会更好
在PHP中,事情应该相当简单:使用
通过电子邮件订购,phnone1,自动识别ASC
然后在PHP端,对于从数据库中读取的每一行,检查它是否与之前在两个特定列中读取的行匹配。如果是,则在列上迭代,并在执行时替换
null
s。我现在不是一个PHP程序员,所以其他人可能更适合为此编写代码片段。我们真的不喜欢提供有关StackOverflow的完整代码解决方案,通常我们会帮助您编写自己的代码,但我不确定如果不亲自编写代码,我是否可以解释所有步骤,下面是一些起始代码
这是未经测试的原始代码
首先复制现有表,直到我们知道此代码不会损坏或删除现有数据,然后在副本上执行所有操作,一旦完成并验证其正常工作,然后将其应用于正确的表
使用以下命令创建副本:
CREATE TABLE EmailListCopy LIKE EmailList;
INSERT EmailListCopy SELECT * FROM EmailList;
PHP代码:
<?php
//This script will first query the table for ALL results, store them in arrays,
//then loop through the arrays to search the table for duplicates using individual
//sql queries and then compare the results, update the entry as needed and delete the
//duplicate row. THIS CODE IS NOT OPTIMIZED!! DO NOT RUN CONTINUOUSLY!! This should be
//used for OCCASIONAL merging ONLY!! i.e. Once-a-day or once-a-week etc...
$result="";
$duplicatesFound;
//Setup arrays to hold the original query information
$autoidArray = array();
$titleArray = array();
$lastnameArray = array();
$firstnameArray = array();
$middlenameArray = array();
$prefixArray = array();
$fulladdressArray = array();
$address1Array = array();
$address2Array = array();
$cityArray = array();
$stateArray = array();
$zipArray = array();
$countryArray = array();
$countyArray = array();
$phone1Array = array();
$phone2Array = array();
$emailArray = array();
$idArray = array();
$tsArray = array();
$link=mysqli_connect($hostname,$dbname,$password,$username);
if(mysqli_connect_errno())
{
$result="Error connecting to database: ".mysqli_connect_error();
}
else
{
$stmt=mysqli_prepare($link,"SELECT autoid,title,lastname,firstname,middlename,prefix,fulladdress,address1,address2,city,state,zip,country,county,phone1,phone2,email,id,ts FROM " . $table);
mysqli_stmt_execute($stmt);
mysqli_stmt_bind_result($stmt, $autoid, $title, $lastname, $firstname, $middlename, $prefix, $fulladdress, $address1, $address2, $city, $state, $zip, $country, $county, $phone1, $phone2, $email, $id, $ts);
if(mysqli_stmt_errno($stmt))
{
$result="Error executing SQL statement: ".mysqli_stmt_error($stmt);
}
else
{
mysqli_stmt_store_result($stmt);
if(mysqli_stmt_num_rows($stmt)==0)
{
$result="0 rows returned (Empty table)";
}
else
{
while(mysqli_stmt_fetch($stmt))
{
//Load results into arrays
array_push($autoidArray, $autoid);
array_push($titleArray, $title);
array_push($lastnameArray, $lastname);
array_push($firstnameArray, $firstname);
array_push($middlenameArray, $middlename);
array_push($prefixArray, $prefix);
array_push($fulladdressArray, $fulladdress);
array_push($address1Array, $address1);
array_push($address2Array, $address2);
array_push($cityArray, $city);
array_push($stateArray, $state);
array_push($zipArray, $zip);
array_push($countryArray, $country);
array_push($countyArray, $county);
array_push($phone1Array, $phone1);
array_push($phone2Array, $phone2);
array_push($emailArray, $email);
array_push($idArray, $id);
array_push($tsArray, $ts);
}
}
mysqli_stmt_free_result($stmt);
}
for($i=0;$i<count($emailArray);$i++)
{
$duplicatestmt=mysqli_prepare($link,"SELECT autoid,title,lastname,firstname,middlename,prefix,fulladdress,address1,address2,city,state,zip,country,county,phone1,phone2,email,id,ts FROM " . $table . " WHERE email=? OR phone1=?");
mysqli_stmt_bind_param($duplicatestmt, 'si', $emailArray[$i], $phone1Array[$i]);
mysqli_stmt_execute($duplicatestmt);
mysqli_stmt_bind_result($duplicatestmt, $autoid, $title, $lastname, $firstname, $middlename, $prefix, $fulladdress, $address1, $address2, $city, $state, $zip, $country, $county, $phone1, $phone2, $email, $id, $ts);
if(mysqli_stmt_errno($duplicatestmt))
{
$result="Error executing SQL statement: ".mysqli_stmt_error($duplicatestmt);
}
else
{
mysqli_stmt_store_result($duplicatestmt);
if(mysqli_stmt_num_rows($duplicatestmt)==0)
{
//NO Duplicate entry found, loop again;
echo "<p>No Dublicate Found</p>";
}
else
{
while(mysqli_stmt_fetch($duplicatestmt))
{
//Found a duplicate
echo "<p>Dublicate Found</p>";
if($autoid > $autoidArray[$i])
{
if($email=="" && $phone1=="")
{
echo "<p>Both email and phone1 are empty. Skipping...</p>";
else
{
$duplicatesFound++;
//The autoid of the duplicate just found is greater then the autoid of the
//one used to find the duplicate (older). Therefor update the entry and remove the
//duplicate
//
//This checks each of the values and if the lower autoid one is blank, then will add the
//value to the table in the lower autoid row
//NOTE:** If having any problems with the queries below try removing the single quotes -> ' <- from any "autoid=" portion of the query
if($titleArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$title."' WHERE autoid='".$autoidArray[$i]."'");}
if($lastnameArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$firstname."' WHERE autoid='".$autoidArray[$i]."'");}
if($firstnameArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$lastname."' WHERE autoid='".$autoidArray[$i]."'");}
if($middlenameArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$middlename."' WHERE autoid='".$autoidArray[$i]."'");}
if($prefixArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$prefix."' WHERE autoid='".$autoidArray[$i]."'");}
if($fulladdressArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$fulladdress."' WHERE autoid='".$autoidArray[$i]."'");}
if($address1Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$address1."' WHERE autoid='".$autoidArray[$i]."'");}
if($address2Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$address2."' WHERE autoid='".$autoidArray[$i]."'");}
if($cityArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$city."' WHERE autoid='".$autoidArray[$i]."'");}
if($stateArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$state."' WHERE autoid='".$autoidArray[$i]."'");}
if($zipArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$zip."' WHERE autoid='".$autoidArray[$i]."'");}
if($countryArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$country."' WHERE autoid='".$autoidArray[$i]."'");}
if($countyArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$county."' WHERE autoid='".$autoidArray[$i]."'");}
if($phone1Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$phone1."' WHERE autoid='".$autoidArray[$i]."'");}
if($phone2Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$phone2."' WHERE autoid='".$autoidArray[$i]."'");}
if($emailArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$email."' WHERE autoid='".$autoidArray[$i]."'");}
if($idArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$id."' WHERE autoid='".$autoidArray[$i]."'");}
if($tsArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$ts."' WHERE autoid='".$autoidArray[$i]."'");}
//Now that it has been updated, delete the duplicate entry
mysqli_query($link, "DELETE FROM EmailListCopy WHERE autoid='".$autoid."'");
echo "<p>Duplicate to be updated DELETE FROM EmailListCopy WHERE autoid='".$autoid."'</p>";
}
}
else
{
//The duplicate autoid is lower then the one used to query either an entry we already but is still in the arrays, or something else.
//This is to be skipped.
echo "<p>Duplicate not to be updated</p>";
}
}
$result="Merged ".$duplicatesFound." rows.";
}
mysqli_stmt_free_result($stmt);
}
}
mysqli_stmt_close($duplicatestmt);
mysqli_stmt_close($stmt);
mysqli_close($link);
}
echo $result;
?>
我不确定是否有mysql查询会自动合并它们,但PHP绝对是一个解决方案。我会先让mySQL诸神检查一下,如果没有,那么让我知道,我们会为您提供一个PHP解决方案。好的,SnareChops让我们编写一些PHP代码,这些代码可以工作。@ToddWelch,当与某人交谈时,在他的名字前面加一个@
。在这种情况下,SnareChops在任何情况下都会得到通知,因为到目前为止,他是唯一一个发表评论的人,但总的来说,他可能没有得到通知。看到了。非常感谢你提供MvG的信息。我想我会尝试用上面的SnareChops编写一些PHP代码。在做了一些语法错误修复后,我能够让脚本运行,但它删除了23000条记录中的所有记录,其中7条记录除外。在仔细考虑之后,我在最初的帖子中犯了一个错误。我的一些记录只有一封电子邮件或一部电话,有些可能没有。脚本应该找到phone1或电子邮件是否相同,如果相同,则合并。如果两个都是空的,那就别管它了。@ToddWelch我在echo“找到了重复的”下面添加了几行
应该检查email和phone1字段是否为空,如果为空,则跳过它。确保在SQL delete命令下完成处的else}
。代码是否按照预期进行了更改?您知道可以在SQL中使用COUNT吗?这样就删除了几乎所有的脚本?@Dave听起来不错。我不是一个非常熟悉SQL的人,所以根据OP请求(需要SQL或PHP答案),我首先让SQL的人来尝试一下。然后,当OP联系我寻求一个PHP解决方案时,我尽了最大努力,以我目前的知识水平,以我所知道的唯一方式回答了这个问题。SQL帮助消除了什么?