使用powershell从大文件中提取文本
我们有一个生成许多大型日志文件的应用程序,我希望使用PowerShell解析这些文件,并以CSV或带分隔符“|”的文本形式获得输出。我尝试使用select字符串,但无法获得预期结果。下面我已经发布了日志格式和预期结果 日志文件数据: 如何使用PowerShell实现上述结果使用powershell从大文件中提取文本,powershell,Powershell,我们有一个生成许多大型日志文件的应用程序,我希望使用PowerShell解析这些文件,并以CSV或带分隔符“|”的文本形式获得输出。我尝试使用select字符串,但无法获得预期结果。下面我已经发布了日志格式和预期结果 日志文件数据: 如何使用PowerShell实现上述结果 谢谢正如LotPings建议的那样,您需要将日志文件内容分解为单独的块。 然后使用regex,您可以捕获所需的值并将其存储在对象中,然后将其导出到CSV文件 大概是这样的: $log = @" ---------------
谢谢正如LotPings建议的那样,您需要将日志文件内容分解为单独的块。 然后使用regex,您可以捕获所需的值并将其存储在对象中,然后将其导出到CSV文件 大概是这样的:
$log = @"
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4322
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File2
Dest File => c\temp2
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4321
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File1
Dest File => c\temp1
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4323
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File3
Dest File => c\temp3
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4324
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File4
Dest File => c\temp4
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
"@
# first break the log into 'Record Id' blocks
$blocks = @()
$regex = [regex] '(?m)(Record Id[^-]+)'
$match = $regex.Match($log)
while ($match.Success) {
$blocks += $match.Value
$match = $match.NextMatch()
}
# next, parse out the required values for each block and create objects to export
$blocks | ForEach-Object {
if ($_ -match '(?s)Submitter Id\s+=>\s+(?<submitter>[^\s]+).+Start Time\s+=>\s+(?<starttime>[^\s]+)\s+Start Date\s+=>\s+(?<startdate>[^\s]+).+Message Text\s+=>\s+(?<messagetext>[\w ,.;-_]+).+Src File\s+=>\s+(?<sourcefile>[\w ,.;-_]+).+Dest File\s+=>\s+(?<destinationfile>[\w ,.;-_]+)') {
[PSCustomObject]@{
'Submitter Id' = $matches['submitter']
'Start Time' = $matches['starttime']
'Start Date' = $matches['startdate']
'Message Text' = $matches['messagetext']
'Src File' = $matches['sourcefile']
'Dest File' = $matches['destinationfile']
}
}
} | Export-Csv -Path '<PATH_TO_YOUR_OUTPUT_CSV>' -Delimiter '|' -NoTypeInformation
正如LotPings所建议的,您需要将日志文件内容分解为单独的块。 然后使用regex,您可以捕获所需的值并将其存储在对象中,然后将其导出到CSV文件 大概是这样的:
$log = @"
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4322
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File2
Dest File => c\temp2
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4321
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File1
Dest File => c\temp1
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4323
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File3
Dest File => c\temp3
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
Record Id => STM
Process Name => STMDA Stat Log Time => 00:02:59
Process Number => 51657 Stat Log Date => 11/29/2018
Submitter Id => STMDA@4324
SNode User Id => de34fc5
Start Time => 00:02:59 Start Date => 11/29/2018
Stop Time => 00:02:59 Stop Date => 11/29/2018
SNODE => dfdvrvbsdfgg
Completion Code => 0
Message Id => ncpa
Message Text => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y
FASP=> N
From Node => P
Src File => File4
Dest File => c\temp4
Src CCode => 0 Dest CCode => 0
Src Msgid => ncpa Dest Msgid => ncpa
Bytes Read => 4000 Bytes Written => 4010
Records Read => 5 Records Written => 5
Bytes Sent => 4010 Bytes Received => 4010
RUs Sent => 0 RUs Received => 1
------------------------------------------------------------------------------
"@
# first break the log into 'Record Id' blocks
$blocks = @()
$regex = [regex] '(?m)(Record Id[^-]+)'
$match = $regex.Match($log)
while ($match.Success) {
$blocks += $match.Value
$match = $match.NextMatch()
}
# next, parse out the required values for each block and create objects to export
$blocks | ForEach-Object {
if ($_ -match '(?s)Submitter Id\s+=>\s+(?<submitter>[^\s]+).+Start Time\s+=>\s+(?<starttime>[^\s]+)\s+Start Date\s+=>\s+(?<startdate>[^\s]+).+Message Text\s+=>\s+(?<messagetext>[\w ,.;-_]+).+Src File\s+=>\s+(?<sourcefile>[\w ,.;-_]+).+Dest File\s+=>\s+(?<destinationfile>[\w ,.;-_]+)') {
[PSCustomObject]@{
'Submitter Id' = $matches['submitter']
'Start Time' = $matches['starttime']
'Start Date' = $matches['startdate']
'Message Text' = $matches['messagetext']
'Src File' = $matches['sourcefile']
'Dest File' = $matches['destinationfile']
}
}
} | Export-Csv -Path '<PATH_TO_YOUR_OUTPUT_CSV>' -Delimiter '|' -NoTypeInformation
正如我在评论中提到的,您需要分离记录,并尝试使用复杂的正则表达式匹配数据 看RegEx直播 研究该链接右上角每个元素的解释 此脚本:
## Q:\Test\2018\11\29\SO_53541952.ps1
$LogFile = '.\SO_53541952.log'
$CsvFile = '.\SO_53541952.csv'
$ExcelFile='.\SO_53541952.xlsx'
## see the regex live <https://regex101.com/r/1TWm7i/1>
$RE = [RegEx]"(?sm)^Submitter Id +=> (?<SubmitterID>.*?$).*?^Start Time +=> (?<StartTime>[0-9:]{8}) +Start Date +=> (?<StartDate>[0-9\/]{10}).*?^Message Text +=> (?<MessageText>.*?$).*?^Src File +=> (?<SrcFile>.*?$).*?^Dest File +=> (?<DestFile>.*?$)"
$Data = (Get-Content $LogFile -raw) -split "(?sm)(?=^Record Id)" | ForEach-Object {
If ($_ -match $RE){
[PSCustomObject]@{
'Submitter Id' = $Matches.SubmitterId
'Start Time' = $Matches.StartTime
'Start Date' = $Matches.StartDate
'Message Text' = $Matches.MessageText
'Src File' = $Matches.SrcFile
'Dest File' = $Matches.DestFile
}
}
}
$Data | Format-Table -Auto
$Data | Export-Csv $CsvFile -NoTypeInformation -Delimiter '|'
#$Data | Out-Gridview
## with the ImportExcel module you can directly generate an excel file
$Data | Export-Excel $ExcelFile -AutoSize # -Show
安装后,您将直接获得一个.xlsx
文件:
正如我在评论中提到的,您需要分离记录,并尝试使用复杂的正则表达式匹配数据 看RegEx直播 研究该链接右上角每个元素的解释 此脚本:
## Q:\Test\2018\11\29\SO_53541952.ps1
$LogFile = '.\SO_53541952.log'
$CsvFile = '.\SO_53541952.csv'
$ExcelFile='.\SO_53541952.xlsx'
## see the regex live <https://regex101.com/r/1TWm7i/1>
$RE = [RegEx]"(?sm)^Submitter Id +=> (?<SubmitterID>.*?$).*?^Start Time +=> (?<StartTime>[0-9:]{8}) +Start Date +=> (?<StartDate>[0-9\/]{10}).*?^Message Text +=> (?<MessageText>.*?$).*?^Src File +=> (?<SrcFile>.*?$).*?^Dest File +=> (?<DestFile>.*?$)"
$Data = (Get-Content $LogFile -raw) -split "(?sm)(?=^Record Id)" | ForEach-Object {
If ($_ -match $RE){
[PSCustomObject]@{
'Submitter Id' = $Matches.SubmitterId
'Start Time' = $Matches.StartTime
'Start Date' = $Matches.StartDate
'Message Text' = $Matches.MessageText
'Src File' = $Matches.SrcFile
'Dest File' = $Matches.DestFile
}
}
}
$Data | Format-Table -Auto
$Data | Export-Csv $CsvFile -NoTypeInformation -Delimiter '|'
#$Data | Out-Gridview
## with the ImportExcel module you can directly generate an excel file
$Data | Export-Excel $ExcelFile -AutoSize # -Show
安装后,您将直接获得一个.xlsx
文件:
您想要的输出格式有一个标题用于
开始日期
,但有两列用于时间和日期?一般来说,我会使用一个正则表达式将日志拆分为记录,使用另一个相当复杂的正则表达式通过(命名的)捕获组获取数据。谢谢,我已经纠正了这个问题,包括时间powershell内置的任何东西都不会读取文件,神奇地确定相关行的块并为您重新格式化。您当然可以使用Get content来读取文件和所有可用的cmdlet,但这只是一个提取转换加载问题。如果您的文件很大(GB),用您喜欢的编程语言解决这个问题可能会更有效。您想要的输出格式有一个标题用于Start date
,但有两列用于time和date?一般来说,我会使用一个正则表达式将日志拆分为记录,使用另一个相当复杂的正则表达式通过(命名的)捕获组获取数据。谢谢,我已经纠正了这个问题,包括时间powershell内置的任何东西都不会读取文件,神奇地确定相关行的块并为您重新格式化。您当然可以使用Get content来读取文件和所有可用的cmdlet,但这只是一个提取转换加载问题。如果你的文件很大(GB),用你最喜欢的编程语言解决这个问题可能会更有效。谢谢你的帮助。你是如何使用正则表达式来解决这个问题的,这很有趣。我想学习使用正则表达式,你能告诉我应该从哪里开始吗?好的,请在上查找理论,然后实践,或者感谢你的帮助。你是如何使用正则表达式来解决这个问题的,这很有趣。我想学习使用正则表达式,你能告诉我从哪里开始吗?好的,先查阅理论,然后再实践