Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/powershell/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用powershell从大文件中提取文本_Powershell - Fatal编程技术网

使用powershell从大文件中提取文本

使用powershell从大文件中提取文本,powershell,Powershell,我们有一个生成许多大型日志文件的应用程序,我希望使用PowerShell解析这些文件,并以CSV或带分隔符“|”的文本形式获得输出。我尝试使用select字符串,但无法获得预期结果。下面我已经发布了日志格式和预期结果 日志文件数据: 如何使用PowerShell实现上述结果 谢谢正如LotPings建议的那样,您需要将日志文件内容分解为单独的块。 然后使用regex,您可以捕获所需的值并将其存储在对象中,然后将其导出到CSV文件 大概是这样的: $log = @" ---------------

我们有一个生成许多大型日志文件的应用程序,我希望使用PowerShell解析这些文件,并以CSV或带分隔符“|”的文本形式获得输出。我尝试使用select字符串,但无法获得预期结果。下面我已经发布了日志格式和预期结果

日志文件数据:

如何使用PowerShell实现上述结果


谢谢

正如LotPings建议的那样,您需要将日志文件内容分解为单独的块。 然后使用regex,您可以捕获所需的值并将其存储在对象中,然后将其导出到CSV文件

大概是这样的:

$log = @"
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4322
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File2
Dest File         => c\temp2
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4321
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File1
Dest File         => c\temp1
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4323
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File3
Dest File         => c\temp3
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4324
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File4
Dest File         => c\temp4
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
"@

# first break the log into 'Record Id' blocks
$blocks = @()
$regex = [regex] '(?m)(Record Id[^-]+)'
$match = $regex.Match($log)
while ($match.Success) {
    $blocks += $match.Value
    $match = $match.NextMatch()
} 

# next, parse out the required values for each block and create objects to export
$blocks | ForEach-Object {
    if ($_ -match '(?s)Submitter Id\s+=>\s+(?<submitter>[^\s]+).+Start Time\s+=>\s+(?<starttime>[^\s]+)\s+Start Date\s+=>\s+(?<startdate>[^\s]+).+Message Text\s+=>\s+(?<messagetext>[\w ,.;-_]+).+Src File\s+=>\s+(?<sourcefile>[\w ,.;-_]+).+Dest File\s+=>\s+(?<destinationfile>[\w ,.;-_]+)') {
        [PSCustomObject]@{
            'Submitter Id' = $matches['submitter']
            'Start Time'   = $matches['starttime']
            'Start Date'   = $matches['startdate']
            'Message Text' = $matches['messagetext']
            'Src File'     = $matches['sourcefile']
            'Dest File'    = $matches['destinationfile']
        }
    }
} | Export-Csv -Path '<PATH_TO_YOUR_OUTPUT_CSV>' -Delimiter '|' -NoTypeInformation

正如LotPings所建议的,您需要将日志文件内容分解为单独的块。 然后使用regex,您可以捕获所需的值并将其存储在对象中,然后将其导出到CSV文件

大概是这样的:

$log = @"
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4322
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File2
Dest File         => c\temp2
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4321
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File1
Dest File         => c\temp1
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4323
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File3
Dest File         => c\temp3
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4324
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File4
Dest File         => c\temp4
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
"@

# first break the log into 'Record Id' blocks
$blocks = @()
$regex = [regex] '(?m)(Record Id[^-]+)'
$match = $regex.Match($log)
while ($match.Success) {
    $blocks += $match.Value
    $match = $match.NextMatch()
} 

# next, parse out the required values for each block and create objects to export
$blocks | ForEach-Object {
    if ($_ -match '(?s)Submitter Id\s+=>\s+(?<submitter>[^\s]+).+Start Time\s+=>\s+(?<starttime>[^\s]+)\s+Start Date\s+=>\s+(?<startdate>[^\s]+).+Message Text\s+=>\s+(?<messagetext>[\w ,.;-_]+).+Src File\s+=>\s+(?<sourcefile>[\w ,.;-_]+).+Dest File\s+=>\s+(?<destinationfile>[\w ,.;-_]+)') {
        [PSCustomObject]@{
            'Submitter Id' = $matches['submitter']
            'Start Time'   = $matches['starttime']
            'Start Date'   = $matches['startdate']
            'Message Text' = $matches['messagetext']
            'Src File'     = $matches['sourcefile']
            'Dest File'    = $matches['destinationfile']
        }
    }
} | Export-Csv -Path '<PATH_TO_YOUR_OUTPUT_CSV>' -Delimiter '|' -NoTypeInformation

正如我在评论中提到的,您需要分离记录,并尝试使用复杂的正则表达式匹配数据

看RegEx直播 研究该链接右上角每个元素的解释

此脚本:

## Q:\Test\2018\11\29\SO_53541952.ps1

$LogFile = '.\SO_53541952.log'
$CsvFile = '.\SO_53541952.csv'
$ExcelFile='.\SO_53541952.xlsx'

## see the regex live <https://regex101.com/r/1TWm7i/1>
$RE = [RegEx]"(?sm)^Submitter Id +=> (?<SubmitterID>.*?$).*?^Start Time +=> (?<StartTime>[0-9:]{8}) +Start Date +=> (?<StartDate>[0-9\/]{10}).*?^Message Text +=> (?<MessageText>.*?$).*?^Src File +=> (?<SrcFile>.*?$).*?^Dest File +=> (?<DestFile>.*?$)"


$Data = (Get-Content $LogFile -raw) -split "(?sm)(?=^Record Id)" | ForEach-Object {
    If ($_ -match $RE){
        [PSCustomObject]@{
            'Submitter Id' = $Matches.SubmitterId
            'Start Time'   = $Matches.StartTime
            'Start Date'   = $Matches.StartDate
            'Message Text' = $Matches.MessageText
            'Src File'     = $Matches.SrcFile
            'Dest File'    = $Matches.DestFile
        }
    }
}
$Data | Format-Table -Auto
$Data | Export-Csv $CsvFile  -NoTypeInformation -Delimiter '|'

#$Data | Out-Gridview
## with the ImportExcel module you can directly generate an excel file
$Data | Export-Excel $ExcelFile -AutoSize # -Show
安装后,您将直接获得一个
.xlsx
文件:


正如我在评论中提到的,您需要分离记录,并尝试使用复杂的正则表达式匹配数据

看RegEx直播 研究该链接右上角每个元素的解释

此脚本:

## Q:\Test\2018\11\29\SO_53541952.ps1

$LogFile = '.\SO_53541952.log'
$CsvFile = '.\SO_53541952.csv'
$ExcelFile='.\SO_53541952.xlsx'

## see the regex live <https://regex101.com/r/1TWm7i/1>
$RE = [RegEx]"(?sm)^Submitter Id +=> (?<SubmitterID>.*?$).*?^Start Time +=> (?<StartTime>[0-9:]{8}) +Start Date +=> (?<StartDate>[0-9\/]{10}).*?^Message Text +=> (?<MessageText>.*?$).*?^Src File +=> (?<SrcFile>.*?$).*?^Dest File +=> (?<DestFile>.*?$)"


$Data = (Get-Content $LogFile -raw) -split "(?sm)(?=^Record Id)" | ForEach-Object {
    If ($_ -match $RE){
        [PSCustomObject]@{
            'Submitter Id' = $Matches.SubmitterId
            'Start Time'   = $Matches.StartTime
            'Start Date'   = $Matches.StartDate
            'Message Text' = $Matches.MessageText
            'Src File'     = $Matches.SrcFile
            'Dest File'    = $Matches.DestFile
        }
    }
}
$Data | Format-Table -Auto
$Data | Export-Csv $CsvFile  -NoTypeInformation -Delimiter '|'

#$Data | Out-Gridview
## with the ImportExcel module you can directly generate an excel file
$Data | Export-Excel $ExcelFile -AutoSize # -Show
安装后,您将直接获得一个
.xlsx
文件:


您想要的输出格式有一个标题用于
开始日期
,但有两列用于时间和日期?一般来说,我会使用一个正则表达式将日志拆分为记录,使用另一个相当复杂的正则表达式通过(命名的)捕获组获取数据。谢谢,我已经纠正了这个问题,包括时间powershell内置的任何东西都不会读取文件,神奇地确定相关行的块并为您重新格式化。您当然可以使用Get content来读取文件和所有可用的cmdlet,但这只是一个提取转换加载问题。如果您的文件很大(GB),用您喜欢的编程语言解决这个问题可能会更有效。您想要的输出格式有一个标题用于
Start date
,但有两列用于time和date?一般来说,我会使用一个正则表达式将日志拆分为记录,使用另一个相当复杂的正则表达式通过(命名的)捕获组获取数据。谢谢,我已经纠正了这个问题,包括时间powershell内置的任何东西都不会读取文件,神奇地确定相关行的块并为您重新格式化。您当然可以使用Get content来读取文件和所有可用的cmdlet,但这只是一个提取转换加载问题。如果你的文件很大(GB),用你最喜欢的编程语言解决这个问题可能会更有效。谢谢你的帮助。你是如何使用正则表达式来解决这个问题的,这很有趣。我想学习使用正则表达式,你能告诉我应该从哪里开始吗?好的,请在上查找理论,然后实践,或者感谢你的帮助。你是如何使用正则表达式来解决这个问题的,这很有趣。我想学习使用正则表达式,你能告诉我从哪里开始吗?好的,先查阅理论,然后再实践