Php 在多维数组中识别和收集潜在的重复项
我有一个像下面这样的数组,我想在其中识别重复项Php 在多维数组中识别和收集潜在的重复项,php,arrays,duplicates,Php,Arrays,Duplicates,我有一个像下面这样的数组,我想在其中识别重复项 $names = array( array("Name" => "John Smith", "ID" => 65), array("Name" => "Richard Johnson", "ID" => 96), array("Name" =>
$names = array(
array("Name" => "John Smith", "ID" => 65),
array("Name" => "Richard Johnson", "ID" => 96),
array("Name" => "John Smith", "ID" => 1105),
...
)
有很多类似的问题,但大多数问题只涉及在存在重复项时返回真/假值,或者简单地删除重复项。我很抱歉,如果有一个问题与这个问题相同,结果是,我看了,但找不到适合我的情况的东西
我只想识别一对数组,它们包含相同的“Name”值,但可能不同的“ID”值。我知道这可能会导致反向值的重复结果,但我想我可以自己解决这个问题。我不想删除重复的值,我只想识别它们
理想情况下,它将返回类似于以下内容(或类似内容)的数组
然后,我可以将其处理成一个更加精致和用户友好的数组
我正在考虑使用递归的in_数组函数,或者可能使用第二个工作数组。有什么想法吗?为什么不循环使用名称作为键创建一个新数组?在此处测试以下各项: 或者简单地说:
$names = array(
array("Name" => 'John Smith', "ID" => 65),
array("Name" => 'Richard Johnson', "ID" => 96),
array("Name" => 'John Smith', "ID" => 1105)
);
$users = [];
foreach($names as $usersArray){
$users[$usersArray['Name']][] = $usersArray['ID'];
}
print_r($users);
为什么不循环使用名称作为键创建一个新数组呢?在此处测试以下各项: 或者简单地说:
$names = array(
array("Name" => 'John Smith', "ID" => 65),
array("Name" => 'Richard Johnson', "ID" => 96),
array("Name" => 'John Smith', "ID" => 1105)
);
$users = [];
foreach($names as $usersArray){
$users[$usersArray['Name']][] = $usersArray['ID'];
}
print_r($users);
对于这种情况,通常需要定义一些逻辑,以便根据记录中的值创建哈希,以确定相等性。一旦定义了该属性,就可以使用简单的循环和关联数组来跟踪哪些记录具有重复项
<?php
/**
* Define an algorithm for equality between records.
*
* @param $record
* @return string
*/
function generateHashForUserRecord($record)
{
return sha1($record['Name']);
}
$names = [
['Name' => 'John Smith', 'ID' => 65],
['Name' => 'Richard Johnson', 'ID' => 96],
['Name' => 'John Smith', 'ID' => 1105]
];
// This map will be an populated with all records, keyed by hash
$hashBuffer = [];
// Buffer for hashes that are associated with more than one record
$duplicateHashes = [];
// This will be populated with the duplicate records
$duplicateRecords = [];
// Iterate through all of the records
foreach($names as $currRecord)
{
// Generate a has for the record
$currHash = generateHashForUserRecord($currRecord);
// If the hash is not in the hashtable yet, create an array to hold entries with this hash
if(!array_key_exists($currHash, $hashBuffer))
{
$hashBuffer[$currHash] = [];
}
else // If this hash is already in the buffer, we have a duplicate - add it to the $duplicateHashes array
{
$duplicateHashes[$hash] = $currHash;
}
// Add the record to the hash buffer
$hashBuffer[$currHash][] = $currRecord;
}
foreach($duplicateHashes as $currDuplicateHash)
{
$duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
}
print_r($duplicateRecords);
对于这种情况,您通常需要定义一些逻辑,以便根据记录中的值创建哈希,以确定相等性。一旦定义了该属性,就可以使用简单的循环和关联数组来跟踪哪些记录具有重复项
<?php
/**
* Define an algorithm for equality between records.
*
* @param $record
* @return string
*/
function generateHashForUserRecord($record)
{
return sha1($record['Name']);
}
$names = [
['Name' => 'John Smith', 'ID' => 65],
['Name' => 'Richard Johnson', 'ID' => 96],
['Name' => 'John Smith', 'ID' => 1105]
];
// This map will be an populated with all records, keyed by hash
$hashBuffer = [];
// Buffer for hashes that are associated with more than one record
$duplicateHashes = [];
// This will be populated with the duplicate records
$duplicateRecords = [];
// Iterate through all of the records
foreach($names as $currRecord)
{
// Generate a has for the record
$currHash = generateHashForUserRecord($currRecord);
// If the hash is not in the hashtable yet, create an array to hold entries with this hash
if(!array_key_exists($currHash, $hashBuffer))
{
$hashBuffer[$currHash] = [];
}
else // If this hash is already in the buffer, we have a duplicate - add it to the $duplicateHashes array
{
$duplicateHashes[$hash] = $currHash;
}
// Add the record to the hash buffer
$hashBuffer[$currHash][] = $currRecord;
}
foreach($duplicateHashes as $currDuplicateHash)
{
$duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
}
print_r($duplicateRecords);
您是否先尝试了一个简单的foreach
循环?这应该永远是你的解决方案;您可以重构后使用内置函数。@El_Vanja我现在正在尝试一个,但数组中有16k+个条目,我唯一的想法是将每个名称与其他名称进行比较,但这就像16k^16k操作这肯定不是一个好方法,但您不需要将它们全部进行比较。分组可以用来实现您需要的功能(参见答案)。您是否先尝试了一个简单的foreach
循环?这应该永远是你的解决方案;您可以重构后使用内置函数。@El_Vanja我现在正在尝试一个,但数组中有16k+个条目,我唯一的想法是将每个名称与其他名称进行比较,但这就像16k^16k操作这肯定不是一个好方法,但您不需要将它们全部进行比较。分组可以用来实现你所需要的(见答案)。我不明白散列的目的。名称是直接可比的。@El_Vanja它确实违背了雅格尼原则,但另一方面。。。你可能会需要它。是的,在这种情况下你可以这样做,但通常你会有更多的逻辑-更多的字段,等等。你可能至少想在散列之前让所有字符都大写或小写,等等。你甚至可以从函数返回空的名称字符串。关键是您希望有一个定义良好的单点来确定记录之间的相等性。散列来确定集合中的相等性通常就是这样做的。这似乎非常有效!我遇到的唯一问题是它试图分配太多的内存,但当试图一次将其应用于16000行时,这是合理的。我一次只做一组比较。谢谢我不明白散列的目的。名称是直接可比的。@El_Vanja它确实违背了雅格尼原则,但另一方面。。。你可能会需要它。是的,在这种情况下你可以这样做,但通常你会有更多的逻辑-更多的字段,等等。你可能至少想在散列之前让所有字符都大写或小写,等等。你甚至可以从函数返回空的名称字符串。关键是您希望有一个定义良好的单点来确定记录之间的相等性。散列来确定集合中的相等性通常就是这样做的。这似乎非常有效!我遇到的唯一问题是它试图分配太多的内存,但当试图一次将其应用于16000行时,这是合理的。我一次只做一组比较。谢谢在这之后,剩下要做的就是通过名称下的ID数过滤结果(这有点含蓄,但对于任何未来的初学者来说,最好是明确的)。这也非常有效!这也不会导致任何内存问题,而且似乎工作得非常快。非常感谢。在这之后,剩下要做的就是通过名称下的ID数过滤结果(这有点含蓄,但对于任何未来的初学者来说,最好是明确的)。这也非常有效!这也不会导致任何内存问题,而且似乎工作得非常快。非常感谢。
<?php
$names = [
['Name' => 'John Smith', 'ID' => 65],
['Name' => 'Richard Johnson', 'ID' => 96],
['Name' => 'John Smith', 'ID' => 1105]
];
$duplicateRecords = UserRecordHelper::getDuplicateRecords($names);
print_r($duplicateRecords);
class UserRecordHelper
{
public static function getDuplicateRecords($records)
{
// This map will be an populated with all records, keyed by hash
$hashBuffer = [];
// Buffer for hashes that are associated with more than one record
$duplicateHashes = [];
// This will be populated with the duplicate records
$duplicateRecords = [];
// Iterate through all of the records
foreach ($records as $currRecord)
{
// Generate a has for the record
$currHash = self::generateHashForUserRecord($currRecord);
// If the hash is not in the hashtable yet, create an array to hold entries with this hash
if (!array_key_exists($currHash, $hashBuffer))
{
$hashBuffer[$currHash] = [];
}
else // If this hash is already in the buffer, we have a duplicate - add it to the $duplicateHashes array
{
$duplicateHashes[$hash] = $currHash;
}
// Add the record to the hash buffer
$hashBuffer[$currHash][] = $currRecord;
}
foreach ($duplicateHashes as $currDuplicateHash)
{
$duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
}
return $duplicateRecords;
}
public static function generateHashForUserRecord($record)
{
return sha1($record['Name']);
}
}