Performance 提高检查文件分隔符的性能
在花了一些时间寻找最清晰的方法来检查文件主体是否具有与头相同数量的分隔符后,我产生了以下代码:Performance 提高检查文件分隔符的性能,performance,powershell,powershell-4.0,powershell-ise,Performance,Powershell,Powershell 4.0,Powershell Ise,在花了一些时间寻找最清晰的方法来检查文件主体是否具有与头相同数量的分隔符后,我产生了以下代码: Param #user enters the directory path and delimiter they are checking for ( [string]$source, [string]$delim ) #try { $lineNum = 1 $thisOK = 0 $badLine = 0 $noDelim = 0 $archive = ("*archive*","
Param #user enters the directory path and delimiter they are checking for
(
[string]$source,
[string]$delim
)
#try {
$lineNum = 1
$thisOK = 0
$badLine = 0
$noDelim = 0
$archive = ("*archive*","*Archive*","*ARCHIVE*");
foreach ($files in Get-ChildItem $source -Exclude $archive) #folder directory may have sub folders, as a temp workaround just made sure to exclude any folder with archive
{
$read2 = New-Object System.IO.StreamReader($files.FullName)
$DataLine = (Get-Content $files.FullName)[0]
$validCount = ([char[]]$DataLine -eq $delim).count #count of delimeters in the header
$lineNum = 1 #used to write to host which line is bad in file
$thisOK = 0 #used for if condition to let the host know that the file has delimeters that line up with header
$badLine = 0 #used so the write-host doesnt meet the if condition and write the file is ok after throwing an error
while (!$read2.EndOfStream)
{
$line = $read2.ReadLine()
$total = $line.Split($delim).Length - 1;
if ($total -eq $validCount)
{
$thisOK = 1
}
elseif ($total -ne $validCount)
{
Write-Output "Error on line $lineNum for file $files. Line number $lineNum has $total delimeters and the header has $validCount"
$thisOK = 0
$badLine = 1
break; #break or else it will repeat each line that is bad
}
$lineNum++
}
if ($thisOK = 1 -and $badLine -eq 0 -and $validCount -ne 0)
{
Write-Output "$files is ok"
}
if ($validCount -eq 0)
{
Write-Output "$files does not contain entered delimeter: $delim"
}
$read2.Close()
$read2.Dispose()
} #end foreach loop
#} catch {
# $ErrorMessage = $_.Exception.Message
# $FailedItem = $_.Exception.ItemName
#}
它适用于我迄今为止测试的内容。然而,当涉及到较大的文件时,需要花费相当长的时间。我想知道我能做些什么或改变这段代码,使它能更快地处理这些文本/CSV文件
此外,我的try..catch
语句被注释掉,因为当我包含它们时脚本似乎没有运行-没有错误,只是输入了一个新的命令行。作为一个想法,我希望合并一个简单的GUI,以便其他用户进行双重检查
示例文件:
HeaderA|HeaderB|HeaderC|HeaderD //header line
DataLnA|DataLnBBDataLnC|DataLnD|DataLnE //bad line
DataLnA|DataLnB|DataLnC|DataLnD| //bad line
DataLnA|DataLnB|DataLnC|DataLnD //good line
HeaderA | HeaderB | HeaderC | HeaderD//标题行
DataLnA | DataLNBDataLNC | DataLnD | DataLnE//坏线
DataLnA | DataLnB | DataLnC | DataLnD |//坏线
DataLnA | DataLnB | DataLnC | DataLnD//线路良好
现在我看了一下,我想可能有一个问题,如果delimeters的值正确,但列不匹配,如下所示:
HeaderA|HeaderB|HeaderC|HeaderD
DataLnA|DataLnBDataLnC|DataLnD|
HeaderA | HeaderB | HeaderC | HeaderD
DataLnA | datalnbdatanc | DataLnD |我看到的主要问题是您正在读取文件两次--一次是调用
Get Content
,将整个文件读取到内存中,第二次是调用while
循环。通过替换此行,您可以将流程速度提高一倍:
$DataLine = (Get-Content $files.FullName)[0] #inefficient
为此:
$DataLine = Get-Content $files.FullName -First 1 #efficient
一个带有“分隔符”的文件示例将有助于澄清您的问题,目前我很难理解您试图实现的目标。这被认为是一个很好的提问礼仪,因此完全缩进您的代码。它使潜在的回答者更容易理解。通常,强烈建议阅读帮助部分中有关如何编写好问题的信息。您已经有了
StreamReader
。为什么要使用Get Content
读取第一行?对不起,分隔符是像“|”或“,”这样的字符。基本上,我们接收的文件的头具有适当数量的分隔符,但文件的实际主体可能缺少一些分隔符,并在运行ETL@AnsgarWiechers我用它来获取标题行中分隔符的数量,以便与文件的其余部分进行比较。谢谢,我想我写那部分的时候太懒了。它奏效了,所以我会坚持下去。很高兴能帮上忙!