Php 在多维数组中识别和收集潜在的重复项_Php_Arrays_Duplicates

Php 在多维数组中识别和收集潜在的重复项

php arrays

Php 在多维数组中识别和收集潜在的重复项,php,arrays,duplicates,Php,Arrays,Duplicates,我有一个像下面这样的数组，我想在其中识别重复项 $names = array( array("Name" => "John Smith", "ID" => 65), array("Name" => "Richard Johnson", "ID" => 96), array("Name" =>

我有一个像下面这样的数组，我想在其中识别重复项

$names = array(
    array("Name" => "John Smith",      "ID" => 65), 
    array("Name" => "Richard Johnson", "ID" => 96), 
    array("Name" => "John Smith",      "ID" => 1105),
    ...
)

有很多类似的问题，但大多数问题只涉及在存在重复项时返回真/假值，或者简单地删除重复项。我很抱歉，如果有一个问题与这个问题相同，结果是，我看了，但找不到适合我的情况的东西

我只想识别一对数组，它们包含相同的“Name”值，但可能不同的“ID”值。我知道这可能会导致反向值的重复结果，但我想我可以自己解决这个问题。我不想删除重复的值，我只想识别它们

理想情况下，它将返回类似于以下内容（或类似内容）的数组

然后，我可以将其处理成一个更加精致和用户友好的数组

我正在考虑使用递归的in_数组函数，或者可能使用第二个工作数组。有什么想法吗？

为什么不循环使用名称作为键创建一个新数组？在此处测试以下各项：

或者简单地说：

$names = array(
    array("Name" => 'John Smith', "ID" => 65), 
    array("Name" => 'Richard Johnson', "ID" => 96), 
    array("Name" => 'John Smith', "ID" => 1105)
);

$users = [];
foreach($names as $usersArray){
    
    $users[$usersArray['Name']][] = $usersArray['ID'];
    
}

print_r($users);

为什么不循环使用名称作为键创建一个新数组呢？在此处测试以下各项：

或者简单地说：

$names = array(
    array("Name" => 'John Smith', "ID" => 65), 
    array("Name" => 'Richard Johnson', "ID" => 96), 
    array("Name" => 'John Smith', "ID" => 1105)
);

$users = [];
foreach($names as $usersArray){
    
    $users[$usersArray['Name']][] = $usersArray['ID'];
    
}

print_r($users);

对于这种情况，通常需要定义一些逻辑，以便根据记录中的值创建哈希，以确定相等性。一旦定义了该属性，就可以使用简单的循环和关联数组来跟踪哪些记录具有重复项

<?php
/**
 * Define an algorithm for equality between records.
 *
 * @param $record
 * @return string
 */
function generateHashForUserRecord($record)
{
    return sha1($record['Name']);
}

$names = [
    ['Name' => 'John Smith', 'ID' => 65],
    ['Name' => 'Richard Johnson', 'ID' => 96],
    ['Name' => 'John Smith', 'ID' => 1105]
];

// This map will be an populated with all records, keyed by hash
$hashBuffer = [];

// Buffer for hashes that are associated with more than one record
$duplicateHashes = [];

// This will be populated with the duplicate records
$duplicateRecords = [];

// Iterate through all of the records
foreach($names as $currRecord)
{
    // Generate a has for the record
    $currHash = generateHashForUserRecord($currRecord);

    // If the hash is not in the hashtable yet, create an array to hold entries with this hash
    if(!array_key_exists($currHash, $hashBuffer))
    {
        $hashBuffer[$currHash] = [];
    }
    else // If this hash is already in the buffer, we have a duplicate - add it to the  $duplicateHashes array
    {
        $duplicateHashes[$hash] = $currHash;
    }

    // Add the record to the hash buffer
    $hashBuffer[$currHash][] = $currRecord;
}

foreach($duplicateHashes as $currDuplicateHash)
{
    $duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
}

print_r($duplicateRecords);

对于这种情况，您通常需要定义一些逻辑，以便根据记录中的值创建哈希，以确定相等性。一旦定义了该属性，就可以使用简单的循环和关联数组来跟踪哪些记录具有重复项
<?php
/**
 * Define an algorithm for equality between records.
 *
 * @param $record
 * @return string
 */
function generateHashForUserRecord($record)
{
    return sha1($record['Name']);
}

$names = [
    ['Name' => 'John Smith', 'ID' => 65],
    ['Name' => 'Richard Johnson', 'ID' => 96],
    ['Name' => 'John Smith', 'ID' => 1105]
];

// This map will be an populated with all records, keyed by hash
$hashBuffer = [];

// Buffer for hashes that are associated with more than one record
$duplicateHashes = [];

// This will be populated with the duplicate records
$duplicateRecords = [];

// Iterate through all of the records
foreach($names as $currRecord)
{
    // Generate a has for the record
    $currHash = generateHashForUserRecord($currRecord);

    // If the hash is not in the hashtable yet, create an array to hold entries with this hash
    if(!array_key_exists($currHash, $hashBuffer))
    {
        $hashBuffer[$currHash] = [];
    }
    else // If this hash is already in the buffer, we have a duplicate - add it to the  $duplicateHashes array
    {
        $duplicateHashes[$hash] = $currHash;
    }

    // Add the record to the hash buffer
    $hashBuffer[$currHash][] = $currRecord;
}

foreach($duplicateHashes as $currDuplicateHash)
{
    $duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
}

print_r($duplicateRecords);

您是否先尝试了一个简单的foreach
循环？这应该永远是你的解决方案；您可以重构后使用内置函数。@El_Vanja我现在正在尝试一个，但数组中有16k+个条目，我唯一的想法是将每个名称与其他名称进行比较，但这就像16k^16k操作这肯定不是一个好方法，但您不需要将它们全部进行比较。分组可以用来实现您需要的功能（参见答案）。您是否先尝试了一个简单的foreach循环？这应该永远是你的解决方案；您可以重构后使用内置函数。@El_Vanja我现在正在尝试一个，但数组中有16k+个条目，我唯一的想法是将每个名称与其他名称进行比较，但这就像16k^16k操作这肯定不是一个好方法，但您不需要将它们全部进行比较。分组可以用来实现你所需要的（见答案）。我不明白散列的目的。名称是直接可比的。@El_Vanja它确实违背了雅格尼原则，但另一方面。。。你可能会需要它。是的，在这种情况下你可以这样做，但通常你会有更多的逻辑-更多的字段，等等。你可能至少想在散列之前让所有字符都大写或小写，等等。你甚至可以从函数返回空的名称字符串。关键是您希望有一个定义良好的单点来确定记录之间的相等性。散列来确定集合中的相等性通常就是这样做的。这似乎非常有效！我遇到的唯一问题是它试图分配太多的内存，但当试图一次将其应用于16000行时，这是合理的。我一次只做一组比较。谢谢我不明白散列的目的。名称是直接可比的。@El_Vanja它确实违背了雅格尼原则，但另一方面。。。你可能会需要它。是的，在这种情况下你可以这样做，但通常你会有更多的逻辑-更多的字段，等等。你可能至少想在散列之前让所有字符都大写或小写，等等。你甚至可以从函数返回空的名称字符串。关键是您希望有一个定义良好的单点来确定记录之间的相等性。散列来确定集合中的相等性通常就是这样做的。这似乎非常有效！我遇到的唯一问题是它试图分配太多的内存，但当试图一次将其应用于16000行时，这是合理的。我一次只做一组比较。谢谢在这之后，剩下要做的就是通过名称下的ID数过滤结果（这有点含蓄，但对于任何未来的初学者来说，最好是明确的）。这也非常有效！这也不会导致任何内存问题，而且似乎工作得非常快。非常感谢。在这之后，剩下要做的就是通过名称下的ID数过滤结果（这有点含蓄，但对于任何未来的初学者来说，最好是明确的）。这也非常有效！这也不会导致任何内存问题，而且似乎工作得非常快。非常感谢。
<?php

$names = [
    ['Name' => 'John Smith', 'ID' => 65],
    ['Name' => 'Richard Johnson', 'ID' => 96],
    ['Name' => 'John Smith', 'ID' => 1105]
];

$duplicateRecords = UserRecordHelper::getDuplicateRecords($names);

print_r($duplicateRecords);

class UserRecordHelper
{
    public static function getDuplicateRecords($records)
    {
        // This map will be an populated with all records, keyed by hash
        $hashBuffer = [];

        // Buffer for hashes that are associated with more than one record
        $duplicateHashes = [];

        // This will be populated with the duplicate records
        $duplicateRecords = [];


        // Iterate through all of the records
        foreach ($records as $currRecord)
        {
            // Generate a has for the record
            $currHash = self::generateHashForUserRecord($currRecord);

            // If the hash is not in the hashtable yet, create an array to hold entries with this hash
            if (!array_key_exists($currHash, $hashBuffer))
            {
                $hashBuffer[$currHash] = [];
            }
            else // If this hash is already in the buffer, we have a duplicate - add it to the  $duplicateHashes array
            {
                $duplicateHashes[$hash] = $currHash;
            }

            // Add the record to the hash buffer
            $hashBuffer[$currHash][] = $currRecord;
        }

        foreach ($duplicateHashes as $currDuplicateHash)
        {
            $duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
        }

        return $duplicateRecords;
    }

    public static function generateHashForUserRecord($record)
    {
        return sha1($record['Name']);
    }
}