Performance 使用PowerShell比较两个CSV文件并更快地返回匹配值_Performance_Powershell_Csv

Performance 使用PowerShell比较两个CSV文件并更快地返回匹配值

performance powershell csv

Performance 使用PowerShell比较两个CSV文件并更快地返回匹配值,performance,powershell,csv,Performance,Powershell,Csv,我使用此代码匹配两个CSV文件，并获得所需的列在这段代码中，我比较了数据矩阵名称和名字，当我得到匹配时，我可以检索列“IGG” 但是它很慢。。。（18条线路20分钟）有人能帮我吗这是我的密码： foreach ($item in $fileContentIMM) { try { $Matricule = $item.'Matricule' $name = $item.'Nom' $firstname = $item.'Pre

我使用此代码匹配两个CSV文件，并获得所需的列在这段代码中，我比较了数据

矩阵

名称

和

名字

，当我得到匹配时，我可以检索列“IGG”

但是它很慢。。。（18条线路20分钟）

有人能帮我吗

这是我的密码：

foreach ($item in $fileContentIMM) 
{
    try
    {
        $Matricule = $item.'Matricule'
        $name = $item.'Nom'
        $firstname = $item.'Prenom'

        # find first matching row in $$fileContentMagic using wildcard
        $objMatch = $fileContentMagic | where { $_.'Matricule' -eq $Matricule -and $_.'NOM' -eq $name -and $_.'PRENOM' -eq $firstname}


        ##### check if any match found 
        if ($objMatch -eq $null)
        {
            $item  | ForEach-Object {
                $filechecktrue += [pscustomobject]@{
                    'MATRICULE' = $item.'Matricule'
                    'IGG' = 'noSet'
                    'NAME'  = $item.'Nom'
                    'FIRSTNAME' = $item.'Prenom'
                    'SERVICE' = $item.'Service'
                    'Immeuble'= $item.'Immeuble' 
                    'Niveau' = $item.'Niveau'
                    'Loc.' = $item.'Loc.'
                    'PDT' = $item.'PDT'
                    'Occ.' = $item.'Occ.'
                    'Site' = $item.'Site'
                }
            }
        }
        else
        {
            $item  | ForEach-Object {
                $filechecktrue += [pscustomobject]@{
                    'MATRICULE' = $item.'Matricule'
                    'IGG' = ($objMatch.'IGG' -join '/')
                    'NAME'  = $item.'Nom'
                    'FIRSTNAME' = $item.'Prenom'
                    'SERVICE' = $item.'Service'
                    'Immeuble'= $item.'Immeuble' 
                    'Niveau' = $item.'Niveau'
                    'Loc.' = $item.'Loc.'
                    'PDT' = $item.'PDT'
                    'Occ.' = $item.'Occ.'
                    'Site' = $item.'Site'
                }
            }

        }
    }
    catch
    {
        "ERROR: Problem reading line - skipping :" | Out-File $LogFile -Append -Force
        $item.nom + $item.prenom + $item.service| Out-File $LogFile -Append -Force
    }
}

我会简化这个过程，但这些更改不会对处理时间产生太大影响。我所做的唯一优化是将$filechecktrue更改为一个内存效率更高的列表

不确定这是否是脚本中缓慢的部分。这需要

$fileContentMagic

成为一个非常大的数组

$filechecktrue = New-Object System.Collections.ArrayList

foreach ($item in $fileContentIMM) 
{
    try
    {
        $Matricule = $item.'Matricule'
        $name = $item.'Nom'
        $firstname = $item.'Prenom'

        # find first matching row in $fileContentMagic using wildcard
        $objMatch = $fileContentMagic | Where-Object { $_.'Matricule' -eq $Matricule -and $_.'NOM' -eq $name -and $_.'PRENOM' -eq $firstname}

        #Create results object with common properties
        $o += [pscustomobject]@{
            'MATRICULE' = $item.'Matricule'
            'IGG' = 'noSet'
            'NAME'  = $item.'Nom'
            'FIRSTNAME' = $item.'Prenom'
            'SERVICE' = $item.'Service'
            'Immeuble'= $item.'Immeuble' 
            'Niveau' = $item.'Niveau'
            'Loc.' = $item.'Loc.'
            'PDT' = $item.'PDT'
            'Occ.' = $item.'Occ.'
            'Site' = $item.'Site'
        }

        ##### check if any match found 
        if ($objMatch)
        {
            #if not null, set IGG value. No need for foreach as $item is already a "foreach-value".
            $o.IGG = ($objMatch.'IGG' -join '/')
        }

        #Add result to arraylist
        $filechecktrue.Add($o)
    }
    catch
    {
        "ERROR: Problem reading line - skipping :" | Out-File $LogFile -Append -Force
        $item.nom + $item.prenom + $item.service| Out-File $LogFile -Append -Force
    }
}

您的第一个foreach在每次迭代时都返回一个$item对象，因此在代码块内再次对$item使用foreach（两次）是毫无意义的

尝试此操作（已删除冗余）：

我将读取您用于查找的文件，然后为此创建一个哈希表。哈希表对于查找非常有效

如果

FileContentMagic

中没有任何重复项，请尝试这样做：

# Use any character here which is guaranteed not to be present in the Matricule, Nom,
# or Prenom fields
$Delimiter = '|'

# Read the FileContent Magic into a HashTable for fast lookups
# The key is Matricule|Nom|Prenom
# The value is IGG joined with a forward slash
$FileContentMagic = @{}
Import-Csv -Path $FileContentMagicFileName | ForEach-Object {
    # Here we build our lookup key. The Trim() is just in case there's any leading or trailing
    # whitespace You can leave it out if you know you don't need it
    $Key = $_.Matricule.Trim(), $_.Nom.Trim(), $_.Prenom.Trim() -join $Delimiter

    # Since we only need the IGG value joined with a /, we'll just keep that
    $Value = $_.IGG -join '/'
    $FileContentMagic.Add($Key, $Value)
}

$FileContentIMM = Import-Csv -Path $FileContentIMMFileName

$FileCheckTrue = foreach ($item in $FileContentIMM) {
    $Key = $_.Matricule.Trim(), $_.Nom.Trim(), $_.Prenom.Trim() -join $Delimiter

    [PSCustomObject]@{
        'MATRICULE' = $item.'Matricule'
        'IGG'       = if ($FileContentMagic.ContainsKey($Key)) { $FileContentMagic[$Key] } else { 'noSet' }
        'NAME'      = $item.'Nom'
        'FIRSTNAME' = $item.'Prenom'
        'SERVICE'   = $item.'Service'
        'Immeuble'  = $item.'Immeuble' 
        'Niveau'    = $item.'Niveau'
        'Loc.'      = $item.'Loc.'
        'PDT'       = $item.'PDT'
        'Occ.'      = $item.'Occ.'
        'Site'      = $item.'Site'
    }
}

此外，任何时候使用

+=

连接数组时，都会带来显著的性能损失。避免使用它是值得的，因为每个赋值都会创建一个新数组，用新项复制整个数组，然后丢弃旧数组。效率很低

如果

$FileContentMagic

包含重复的键，则应将哈希表的加载方式更改为：

$FileContentMagic = @{}
Import-Csv -Path $FileContentMagicFileName | ForEach-Object {
    $Key = $_.Matricule.Trim(), $_.Nom.Trim(), $_.Prenom.Trim() -join $Delimiter
    if (!$FileContentMagic.ContainsKey($Key)) {
        $Value = $_.IGG -join '/'
        $FileContentMagic.Add($Key, $Value)
    }
    else {
        $FileContentMagic[$Key] += '/' + ($_.IGG -join '/')
    }
}

20分钟就可以找到18行，一共有多少行？你看过比较对象了吗？你确定这是比较慢的部分吗？这两个csv文件有多大？您是否测量或使用ex.

Write Host“import done”

来确定读取文件的速度是否慢？文件内容imm包含18行，文件内容magic 45000可能是我要检查的感谢EP这不会影响处理感谢的时间；）谢谢，我尝试了这个，我把时间除以3！但是，如果我需要比“IGG”更多的值，我该怎么做呢？更改为

$Value=$\ucode>并更新其余值，以便从哈希表中返回的对象访问IGG属性。这是一个简单的改变，你应该能够自己解决。我不建议运行您不理解的代码。是的，我只节省了几秒钟，但您是对的，谢谢：）
$FileContentMagic = @{}
Import-Csv -Path $FileContentMagicFileName | ForEach-Object {
    $Key = $_.Matricule.Trim(), $_.Nom.Trim(), $_.Prenom.Trim() -join $Delimiter
    if (!$FileContentMagic.ContainsKey($Key)) {
        $Value = $_.IGG -join '/'
        $FileContentMagic.Add($Key, $Value)
    }
    else {
        $FileContentMagic[$Key] += '/' + ($_.IGG -join '/')
    }
}