Multithreading 在Powershell中使用Start ThreadJob复制项目

Multithreading 在Powershell中使用Start ThreadJob复制项目,multithreading,powershell,copy-item,Multithreading,Powershell,Copy Item,在这篇文章的后面:我有以下几点: @mklement0的方法(从中复制并由中修改)有效,但由于它为每个文件创建线程的速度非常慢,而且在我的测试系统上,使用的文件约14000个,占用的内存大于4GB: # This works but is INCREDIBLY SLOW because it creates a thread per file Create sample CSV file with 10 rows. $FileList = Join-Path ([IO.Path]::GetT

在这篇文章的后面:我有以下几点:

@mklement0的方法(从中复制并由中修改)有效,但由于它为每个文件创建线程的速度非常慢,而且在我的测试系统上,使用的文件约14000个,占用的内存大于4GB:

# This works but is INCREDIBLY SLOW because it creates a thread per file
 Create sample CSV file with 10 rows.
 $FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
 @'
 Foo,SrcFileName,DestFileName,Bar
 1,c:\tmp\a,\\server\share\a,baz
 2,c:\tmp\b,\\server\share\b,baz
 3,c:\tmp\c,\\server\share\c,baz
 4,c:\tmp\d,\\server\share\d,baz
 5,c:\tmp\e,\\server\share\e,baz
 6,c:\tmp\f,\\server\share\f,baz
 7,c:\tmp\g,\\server\share\g,baz
 8,c:\tmp\h,\\server\share\h,baz
 9,c:\tmp\i,\\server\share\i,baz
 10,c:\tmp\j,\\server\share\j,baz
 '@ | Set-Content $FileList

# How many threads at most to run concurrently.
 $NumCopyThreads = 8

Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow

# Import the CSV data and transform it to [pscustomobject] instances
# with only .SrcFileName and .DestFileName properties - they take
# the place of your original [fileToCopy] instances.
$jobs = Import-Csv $FileList | Select-Object SrcFileName, DestFileName | 
  ForEach-Object {
    # Start the thread job for the file pair at hand.
    Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList $_ { 
        param($f) 
        [System.IO.Fileinfo]$DestinationFilePath = $f.DestFileName
        [String]$DestinationDir = $DestinationFilePath.DirectoryName
        if (-not (Test-path([Management.Automation.WildcardPattern]::Escape($DestinationDir)))) {
            new-item -Path $DestinationDir -ItemType Directory #-Verbose
        }
        copy-item -path $f.srcFileName -Destination $f.destFilename
        "Copied $($f.SrcFileName) to $($f.DestFileName)"
    }
  }

Write-Host "Waiting for $($jobs.Count) jobs to complete..."

# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"

# Clean up the temp. file
Remove-Item $FileList

(特别是PowerShell作业部分)让我想到了将完整列表分成1000个文件的批处理,当它在我的测试用例中运行时,我得到了15个线程(因为我有14500个文件),但线程只处理每个“块”中的第一个文件,然后停止:

<#
.SYNOPSIS
<Brief description>
For examples type:
Get-Help .\<filename>.ps1 -examples
.DESCRIPTION
Copys files from one path to another
.PARAMETER FileList
e.g. C:\path\to\list\of\files\to\copy.txt
.PARAMETER NumCopyThreads
default is 8 (but can be 100 if you want to stress the machine to maximum!)
.PARAMETER LogName
default is output.csv located in the same path as the Filelist
.EXAMPLE
to run using defaults just call this file:
.\CopyFilesToBackup
to run using anything else use this syntax:
.\CopyFilesToBackup -filelist C:\path\to\list\of\files\to\copy.txt -NumCopyThreads 20 -LogName C:\temp\backup.log -CopyMethod Runspace
.\CopyFilesToBackup -FileList .\copytest.csv -NumCopyThreads 30 -Verbose
.NOTES
#>

[CmdletBinding()] 
Param( 
    [String] $FileList = "C:\temp\copytest.csv", 
    [int] $NumCopyThreads = 8,
    [String] $LogName
) 

$filesPerBatch = 1000

$files = Import-Csv $FileList | Select-Object SrcFileName, DestFileName

$i = 0
$j = $filesPerBatch - 1
$batch = 1

Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow

$jobs = while ($i -lt $files.Count) {
    $fileBatch = $files[$i..$j]

    $jobName = "Batch$batch"
    Start-ThreadJob -Name $jobName -ThrottleLimit $NumCopyThreads -ArgumentList ($fileBatch) -ScriptBlock {
        param($filesInBatch)
        foreach ($f in $filesInBatch) {
            [System.IO.Fileinfo]$DestinationFilePath = $f.DestFileName
            [String]$DestinationDir = $DestinationFilePath.DirectoryName
            if (-not (Test-path([Management.Automation.WildcardPattern]::Escape($DestinationDir)))) {
                new-item -Path $DestinationDir -ItemType Directory -Verbose
            }
            copy-item -path $f.srcFileName -Destination $f.DestFileName -Verbose
        }
    } 

    $batch += 1
    $i = $j + 1
    $j += $filesPerBatch

    if ($i -gt $files.Count) {$i = $files.Count}
    if ($j -gt $files.Count) {$j = $files.Count}
}

Write-Host "Waiting for $($jobs.Count) jobs to complete..."

Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"

[CmdletBinding()]
参数(
[String]$FileList=“C:\temp\copytest.csv”,
[int]$NumCopyThreads=8,
[字符串]$LogName
) 
$filesPerBatch=1000
$files=导入Csv$FileList |选择对象SrcFileName,DestFileName
$i=0
$j=$filesPerBatch-1
$batch=1
编写主机“创建作业…”
$dtStart=[datetime]::UtcNow
$jobs=while($i-lt$files.Count){
$fileBatch=$files[$i..$j]
$jobName=“批次$Batch”
Start ThreadJob-Name$jobName-ThrottleLimit$NumCopyThreads-ArgumentList($fileBatch)-ScriptBlock{
参数($filesInBatch)
foreach($filesInBatch中的f){
[System.IO.Fileinfo]$DestinationFilePath=$f.DestFileName
[String]$DestinationDir=$DestinationFilePath.DirectoryName
if(-not(测试路径([Management.Automation.WildcardPattern]::Escape($DestinationDir))){
新项目-路径$DestinationDir-项目类型目录-详细
}
复制项-路径$f.srcFileName-目标$f.DestFileName-详细
}
} 
$batch+=1
$i=$j+1
$j+=$filesPerBatch
如果($i-gt$files.Count){$i=$files.Count}
如果($j-gt$files.Count){$j=$files.Count}
}
写入主机“正在等待$($jobs.Count)作业完成…”
接收作业-作业$jobs-等待-自动删除作业
写入主机“总失效时间:$([datetime]::UtcNow-$dtStart)”
我觉得我错过了一些明显的东西,但我不知道是什么

有人能帮忙吗?

更改:

Start ThreadJob-Name$jobName-ThrottleLimit$NumCopyThreads-ArgumentList($fileBatch)-ScriptBlock{

Start ThreadJob-Name$jobName-ThrottleLimit$NumCopyThreads-ArgumentList(,$fileBatch)-ScriptBlock{
注意参数列表中
$fileBatch
前面的逗号

之所以修复它,是因为
ArgumentList
需要一个数组,并将每个元素赋予参数。您试图将整个数组传递给第一个参数,这意味着您必须将数组放入数组中


显然(这对我来说是新闻),Powershell将愉快地将您的字符串作为
foreach
循环中的单个项数组处理,这就是为什么在每个批处理中处理第一个项。

因此,经过了一周的反复试验,终于达到了这一点,总的来说,我对结果非常满意。下面我将分享的脚本将处理过程中的~3个步骤我正在处理的文件的essing:

  • 创建文件夹
  • 将文件复制到新文件夹
  • 验证文件是否已正确复制

  • 它在抓狂的同时做这个。谢谢你找到这个。我需要做一件t恤,上面写着“是的,我忘了一个逗号”.在我大学的comp sci 101课程中,我的导师告诉我,他花了几个小时在一个问题上,结果发现这个问题是一个错误的逗号。这让我质疑我为什么会在那里;)我很好奇,在这种情况下,你是如何开始追踪的?我最近一直在使用VSCode,调试功能非常方便,但对于这种东西,它涉及到线程部分ion和vapourware;我意识到线程很难调试,但_(ツ)_/“-我一直在玩弄获得PowerShell Pro Tools的许可证,但不确定它是否值得:您能给出的任何建议都会很好!我注意到,当我尝试登录时,作业中运行的代码没有打印任何内容,因此我使用
    添加内容
    函数将数据附加到名为Batch1、Batch2等的日志中(即工作名称)。我的日志显示循环只运行了一次,批处理文件的输入数组的长度也是1。在这一点上,我查看了输入参数,该参数在
    -ArgumentList
    之前很好,因此我将重点放在
    -ArgumentList
    上。我真的什么都不知道……我只是一个淘汰猴子的过程;)@AlexFielder
    添加内容“output-file1.txt”“Line 1”
    。每次调用它时,它都会向输出文件添加一行。我使用作业名称作为输出文件名,为每个线程/作业使用不同的文件。
    .SYNOPSIS
    <Brief description>
    For examples type:
    Get-Help .\<filename>.ps1 -examples
    .DESCRIPTION
    Copys files from one path to another
    .PARAMETER FileList
    e.g. C:\path\to\list\of\files\to\copy.txt
    .PARAMETER NumCopyThreads
    default is 8 (but can be 100 if you want to stress the machine to maximum!)
    .PARAMETER FilesPerBatch
    default is 1000 this can be tweaked if performance becomes an issue because the Threading will HAMMER any network you run it on.
    .PARAMETER LogName
    Desired log file output. Must include full or relative (.\blah) path. If blank, location of FileList is used.
    .PARAMETER DryRun
    Boolean value denoting whether we're testing this thing or not. (Default is $false)
    .PARAMETER DryRunNum
    The number of files to Dry Run. (Default is 100)
    .EXAMPLE
    to run using defaults just call this file:
    .\CopyFilesToBackup
    to run using anything else use this syntax:
    .\CopyFilesToBackup -filelist C:\path\to\list\of\files\to\copy.txt -NumCopyThreads 20 -LogName C:\temp\backup.log -CopyMethod Runspace
    .\CopyFilesToBackup -FileList .\copytest.csv -NumCopyThreads 30 -Verbose
    .NOTES
    #>
    
    [CmdletBinding()] 
    Param( 
        [String] $FileList = "C:\temp\copytest.csv", 
        [int] $NumCopyThreads =75,
        [String] $JobName,
        [int] $FilesPerBatch = 1000,
        [String] $LogName,
        [Boolean] $DryRun = $false, #$true,
        [int] $DryRunNum = 100
    ) 
    
    
    
    Write-Host 'Creating log file if it does not exist...'
    
    function CreateFile([string]$filepath) {
        if (-not (Test-path([Management.Automation.WildcardPattern]::Escape($filepath)))) {
            new-item -Path $filepath -ItemType File
        }
        if (-not (Test-path([Management.Automation.WildcardPattern]::Escape($filepath)))) {
            return $false
        } else {
            return $true
        }
    }
    
    $dtStart = [datetime]::UtcNow
    
    if ($LogName -eq "") {
        [System.IO.Fileinfo]$CsvPath = $FileList
        [String]$LogDirectory = $CsvPath.DirectoryName
        [string]$LognameBaseName = $CsvPath.BaseName
        $LogName = $LogDirectory + "\" + $LognameBaseName + ".log"
        if (-not (CreateFile($LogName)) ) { 
            write-host "Unable to create log, exiting now!"
            Break
        }
    }
    else {
        if (-not (CreateFile($LogName)) ) { 
            write-host "Unable to create log, exiting now!"
            Break
        }
    }
    
    Add-Content -Path $LogName -Value "[INFO],[Src Filename],[Src Hash],[Dest Filename],[Dest Hash]"
    
    Write-Host 'Loading CSV data into memory...'
    
    $files = Import-Csv $FileList | Select-Object SrcFileName, DestFileName
    
    Write-Host 'CSV Data loaded...'
    
    Write-Host 'Collecting unique Directory Names...'
    
    $allFolders = New-Object "System.Collections.Generic.List[PSCustomObject]"
    
    ForEach ($f in $files) {
        [System.IO.Fileinfo]$DestinationFilePath = $f.DestFileName
        [String]$DestinationDir = $DestinationFilePath.DirectoryName
        $allFolders.add($DestinationDir)
    }
    
    $folders = $allFolders | get-unique
    
    Write-Host 'Creating Directories...'
    foreach($DestinationDir in $folders) {
        if (-not (Test-path([Management.Automation.WildcardPattern]::Escape($DestinationDir)))) {
            new-item -Path $DestinationDir -ItemType Directory | Out-Null #-Verbose
        }
    }
    Write-Host 'Finished Creating Directories...'
    $scriptBlock = {
        param(
            [PSCustomObject]$filesInBatch, 
            [String]$LogFileName)
            function ProcessFileAndHashToLog {
                param( [String]$LogFileName, [PSCustomObject]$FileColl)
                foreach ($f in $FileColl) {
                    $mutex = New-object -typename 'Threading.Mutex' -ArgumentList $false, 'MyInterProcMutex'
                    # [System.IO.Fileinfo]$DestinationFilePath = $f.DestFileName
                    # [String]$DestinationDir = $DestinationFilePath.DirectoryName
                    # if (-not (Test-path([Management.Automation.WildcardPattern]::Escape($DestinationDir)))) {
                    #     new-item -Path $DestinationDir -ItemType Directory | Out-Null #-Verbose
                    # }
                    copy-item -path $f.srcFileName -Destination $f.DestFileName | Out-Null #-Verbose
    
                    $srcHash = (Get-FileHash -Path $f.srcFileName -Algorithm SHA1).Hash #| Out-Null #could also use MD5 here but it needs testing
                    if (Test-path([Management.Automation.WildcardPattern]::Escape($f.destFileName))) {
                        $destHash = (Get-FileHash -Path $f.destFileName -Algorithm SHA1).Hash #| Out-Null #could also use MD5 here but it needs testing
                    } else {
                        $destHash = $f.destFileName + " not found at location."
                    }
                    if (-not ($null -eq $destHash) -and -not ($null -eq $srcHash)) {
                        $info = $f.srcFileName + "," + $srcHash + "," + $f.destFileName + "," + $destHash
                    }
                    $mutex.WaitOne() | Out-Null
                    $DateTime = Get-date -Format "yyyy-MM-dd HH:mm:ss:fff"
                    if ($DryRun) { Write-Host 'Writing to log file: '$LogFileName'...' }
                    Add-Content -Path $LogFileName -Value "$DateTime,$Info"
                    $mutex.ReleaseMutex() | Out-Null
                }
            }
            ProcessFileAndHashToLog -LogFileName $LogFileName -FileColl $filesInBatch
    }
    
    $i = 0
    $j = $filesPerBatch - 1
    $batch = 1
    Write-Host 'Creating jobs...'
    if (-not ($DryRun)) {
        $jobs = while ($i -lt $files.Count) {
            $fileBatch = $files[$i..$j]
            Start-ThreadJob -Name $jobName -ArgumentList $fileBatch, $LogName -ScriptBlock $scriptBlock #-ThrottleLimit $NumCopyThreads -ArgumentList $fileBatch, $LogName -ScriptBlock $scriptBlock
            $batch += 1
            $i = $j + 1
            $j += $filesPerBatch
            if ($i -gt $files.Count) {$i = $files.Count}
            if ($j -gt $files.Count) {$j = $files.Count}
        }
        Write-Host "Waiting for $($jobs.Count) jobs to complete..."
        Receive-Job -Job $jobs -Wait -AutoRemoveJob
    } else {
        Write-Host 'Going in Dry...'
        $DummyFileBatch = $files[$i..$DryRunNum]
        & $scriptBlock -filesInBatch $DummyFileBatch -LogFileName $LogName
        Write-Host 'That wasn''t so bad was it..?'
    }
    
    Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"