Search Powershell-优化非常非常大的csv和文本文件搜索和替换_Search_Powershell_Csv_Optimization_Text

Search Powershell-优化非常非常大的csv和文本文件搜索和替换

search powershell csv optimization text

Search Powershell-优化非常非常大的csv和文本文件搜索和替换,search,powershell,csv,optimization,text,Search,Powershell,Csv,Optimization,Text,我有一个目录，里面有大约3000个文本文件，当我把一个程序转移到一个新服务器时，我会定期搜索并替换这些文本文件每个文本文件可能平均有3000行，我需要一次搜索300-1000个词我正在替换与正在搜索的字符串相关的服务器前缀。因此，对于每一个csv条目，我都要查找搜索字符串，\\Old\u Server\“Search\u String”，并确保程序完成后，结果是“\\New\u Server\Search\u String” 我拼凑了一个powershell程序，它很有效。但它太慢了，我从来

我有一个目录，里面有大约3000个文本文件，当我把一个程序转移到一个新服务器时，我会定期搜索并替换这些文本文件

每个文本文件可能平均有3000行，我需要一次搜索300-1000个词

我正在替换与正在搜索的字符串相关的服务器前缀。因此，对于每一个csv条目，我都要查找

搜索字符串

，

\\Old\u Server\“Search\u String”

，并确保程序完成后，结果是

“\\New\u Server\Search\u String”

我拼凑了一个powershell程序，它很有效。但它太慢了，我从来没见过它完成

有没有加快速度的建议

编辑1：我按照建议更改了get内容，但在两个文件（约8000行）中搜索9个单独的搜索词仍然需要3分钟。我一定还是搞砸了；如果手动执行9次，记事本++搜索和替换仍然会快得多

我不确定如何删除第一个（获取内容），因为我想在对文件进行任何更改之前先制作一份文件副本进行备份

编辑2：所以这是一个数量级更快；它可能会在10秒内搜索一个文件。但现在它不写文件更改，只搜索目录中的第一个文件！我没有改变密码，所以我不知道它为什么会坏

编辑3：成功！我修改了下面发布的一个解决方案，使它更快、更快。它正在几秒钟内搜索每个文件。我可以反转循环顺序，这样它就可以将文件加载到数组中，然后搜索并替换CSV中的每个条目，而不是相反。如果我能让它工作，我会把它贴出来

最后的脚本如下，以供参考

#get input from the user
$old = Read-Host 'Enter the old cimplicity qualifier (F24, IRF3 etc'
$new = Read-Host 'Enter the new cimplicity qualifier (CB3, F24_2 etc)'
$DirName = Get-Date -format "yyyy_MM_dd_hh_mm"

New-Item -ItemType directory -Path $DirName -force
New-Item "$DirName\log.txt" -ItemType file -force -Value "`nMatched CTX files on $dirname`n"
$logfile = "$DirName\log.txt"

$VerbosePreference = "SilentlyContinue"


$points = import-csv SearchAndReplace.csv -header find #Import CSV File
#$ctxfiles = Get-ChildItem . -include *.ctx | select -expand fullname #Import local directory of CTX Files

$points | foreach-object { #For each row of points in the CSV file
    $findvar = $_.find #Store column 1 as string to search for  

    $OldQualifiedPoint = "\\\\"+$old+"\\" + $findvar #Use escape slashes to escape each invidual bs so it's not read as regex
    $NewQualifiedPoint = "\\"+$new+"\" + $findvar #escape slashes are NOT required on the new string
    $DuplicateNew = "\\\\" + $new + "\\" + "\\\\" + $new + "\\"
    $QualifiedNew = "\\" + $new + "\"

    dir . *.ctx | #Grab all CTX Files 
     select -expand fullname | #grab all of those file names and...
      foreach {#iterate through each file
                $DateTime = Get-Date -Format "hh:mm:ss"
                $FileName = $_
                Write-Host "$DateTime - $FindVar - Checking $FileName"
                $FileCopied = 0
                #Check file contents, and copy matching files to newly created directory
                If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
                   If (!($FileCopied)) {
                        Copy $FileName -Destination $DirName
                        $FileCopied = 1
                        Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
                        Write-Host "$DateTime - Found $Findvar in $filename"
                    }

                    $FileContent = Get-Content $Filename -ReadCount 0
                    $FileContent =
                    $FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
                    $FileContent | Set-Content $FileName
                }
           }
         $File.Dispose()
    }

是的，通过不使用

获取内容

，您可以使其速度更快。。。改用流读取器

$file = New-Object System.IO.StreamReader -Arg "test.txt"
while (($line = $file.ReadLine()) -ne $null) {
    # $line has your line
}
$file.dispose()

如果我读对了，您应该能够将一个3000行文件读入内存，并将这些替换作为一个数组操作来执行，这样就不需要遍历每一行。还可以将这些替换操作链接到单个命令中

dir . *.ctx | #Grab all CTX Files 
     select -expand fullname | #grab all of those file names and...
      foreach {#iterate through each file
                $DateTime = Get-Date -Format "hh:mm:ss"
                $FileName = $_
                Write-Host "$DateTime - $FindVar - Checking $FileName"
                #Check file contents, and copy matching files to newly created directory
                If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
                    Copy $FileName -Destination $DirName
                    Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
                    Write-Host "$DateTime - Found $Findvar in $filename"

                    $FileContent = Get-Content $Filename -ReadCount 0
                    $FileContent =
                      $FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
                     $FileContent | Set-Content $FileName
                }
           }

另一方面，selectstring将把filepath作为参数，因此您不必执行

获取内容

，然后通过管道将其传输到

selectstring

我想为此使用PowerShell，并创建了如下脚本：

$filepath = "input.csv"
$newfilepath = "input_fixed.csv"

filter num2x { $_ -replace "aaa","bbb" }
measure-command {
    Get-Content -ReadCount 1000 $filepath | num2x | add-content $newfilepath
}

我在笔记本电脑上花了19分钟处理6.5Gb的文件。下面的代码正在批量读取文件（使用ReadCount），并使用应该优化性能的过滤器

但后来我尝试了，它在3分钟内完成了同样的事情！完全不同

而（$line=$file.ReadLine（））

将在第一个空行停止。与$null比较更好。是的；大多数文件都有零散的空行：PUsing.readline（）仍在一次执行一行。这些文件只有3000行，因此您应该能够将整个文件读入内存，然后将其替换为数组，例如（获取内容$FileName-r 0）-replace@mjolinorIdk这跟我的帖子有什么关系。。。也许考虑评论原来的问题，甚至张贴你的答案。我想每个文件都可以很快加载到内存中；他们中没有一个超过两个梅格，但正如我所说的，这是无法忍受的缓慢。我对powershell了解不多，但有人建议我逐行阅读。您仍在条件检查中使用get content，因此仍需要很长时间。只需进行替换，然后检查您是否更改了任何内容，并输出“XX发现”我明白您的意思，不知道readcount参数，并且

-r 0

对我来说意义不大。不过，这是一个与众不同的世界，酷，酷，太棒了！这很有效，而且速度很快。因为我正在将整个文件加载到一个数组中；我认为在这个阶段，改变循环顺序会更快。目前，我正在提取一个csv条目，然后搜索所有文件。打开一个文件，然后搜索所有CSV条目可能会更快。谢谢在这种情况下，我将完全放弃选择字符串测试，只需通过CSV集合运行每个文件。它可能比在CSV循环的每次迭代中返回并运行另一个select字符串要快。