Powershell ConvertFrom字符串模板问题与不规则数据集
我有一个数据集,我正试图将其规范化为Powershell ConvertFrom字符串模板问题与不规则数据集,powershell,machine-learning,powershell-5.0,Powershell,Machine Learning,Powershell 5.0,我有一个数据集,我正试图将其规范化为PsCustomObject。我一直在尝试使用convertfromstring的机器学习模板功能,但部分成功。一个问题是,我能找到的所有示例都具有相同结构的数据集。我的不全一样 我相信一个天才可以直接从原始数据中做出来,但我已经对它进行了一些操作,以达到我的目的 原始样本数据: 使用以下脚本: $lines = $testText.Split("`n") #$testText is the above data wrapped in a here-strin
PsCustomObject
。我一直在尝试使用convertfromstring
的机器学习模板功能,但部分成功。一个问题是,我能找到的所有示例都具有相同结构的数据集。我的不全一样
我相信一个天才可以直接从原始数据中做出来,但我已经对它进行了一些操作,以达到我的目的
原始样本数据:
使用以下脚本:
$lines = $testText.Split("`n") #$testText is the above data wrapped in a here-string
$NewLines = @()
foreach($line in $lines)
{
[regex]$regex = '-'
$HyphenCount = $regex.Matches($line).count
#$HyphenCount
switch ($HyphenCount)
{
1{
$newLines += $line -replace "-",","
}
2{
$split = $line.Split("-",2)
$newlines += $split -join ","
}
3{
if($line.Contains("mode-"))
{
#$line
$split = $line.Split("-",4)
$newlines += $split -join ","
}
else
{
$split = $line.Split("-",3)
$newlines += $split -join ","
}
}
4{
$split = $line.Split("-",3) #this assumes the fourth hyphen is part of description
$newlines += $split -join ","
}
5{
$split = $line.Split("-",4)
$newlines += $split -join ","
}
}
}
操纵数据集:
我得到的原始数据如下所示:
IDE00001,ENG99061,Production mode,Access control
IDE00001,ENG115730,Production mode,Aussenbeleuchtung
IDE00001,ENG112304,Production mode,Heckwischer
IDE00001,ENG98647,Production mode,Interior lighting
IDE00001,ENG115729,Production mode,Scheinwerferreinigung
IDE00001,ENG115731,Production mode,Virtuel_pedal
IDE00002,Transport mode
IDE00820,Activating and deactivating all development messages
IDE01550,Service position
IDE02152,Characteristics in production mode
IDE02269,MAS04382,Acknowledgement signals-Optical feedback during locking
IDE02332,Deactivate production mode
IDE02488,DWA Interior monitoring
IDE02711,ENG116690,Rear Window Wiper-Automatisches Heckwischen
IDE99999,Test-two hyphens
IDE99999,ENG123456,Test-four-Hyphens
IDE99999,ENG123456,Production mode,test-five-hyphens
通过以下模板传递上述数据已使我尽可能接近所需,但仍存在一些问题:
$template = @'
{object*:{ide:IDE00001},{code?:ENG99061},{mode?:Production mode},{description?:Access control}}
{object*:{ide:IDE00001},{code?:ENG115730},{mode?:Dev mode},{description?:Aussenbeleuchtung}}
{object*:{ide:IDE00001},{code?:ENG115731},{mode?:Production mode},{description?:Virtuel_pedal}}
{object*:{ide:IDE02711},{code?:ENG116690},{description?:Rear Window Wiper-Automatisches Heckwischen}}
{object*:{ide:IDE00820},{description?:{!mode?:{!code?:Activating and deactivating all development messages}}}}
{object*:{ide:IDE01550},{description?:{!mode?:{!code?:Service position}}}}
{object*:{ide:IDE02488},{description?:{!mode?:{!code?:DWA Interior monitoring}}}}
{object*:{ide:IDE00002},{mode?:Transport mode}}
'@
$testText | ConvertFrom-String -TemplateContent $template -OutVariable out | Out-Null
$out.object
迄今为止的结果:
结果如下:
ide code mode description
--- ---- ---- -----------
IDE00001 ENG99061 Production mode Access control
IDE00001 ENG115730 Production mode Aussenbeleuchtung
IDE00001 ENG112304 Production mode Heckwischer
IDE00001 ENG98647 Production mode Interior lighting
IDE00001 ENG115729 Production mode Scheinwerferreinigung
IDE00001 ENG115731 Production mode Virtuel_pedal
IDE00002 Transport mode Transport mode
IDE00820 Activating and deactivating all development messages
IDE01550 Service position
IDE02152 production mode Characteristics in production mode
IDE02269 MAS04382 Acknowledgement signals-Optical feedback during locking
IDE02332 production mode Deactivate production mode
IDE02488 DWA Interior monitoring
IDE02711 ENG116690 Rear Window Wiper-Automatisches Heckwischen
IDE99999 Test-two hyphens
IDE99999 ENG123456 Test-four-Hyphens
问题领域:
传输模式
不应出现在说明
列中生产模式
不应在模式
列中。它不知怎么地从描述中找到了这一点
我就是想不出来。因此,如果有人有任何想法 另一种选择是,如果您的输入数据足够系统化,您可以使用正则表达式对其进行解析:
$inputText = @"
IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen
"@ -split "`n"
$pattern = '^((?<ide>[IDE0-9]+)-)((?<code>[A-Z0-9]+)-)?((?<mode>Production mode|Transport mode)-?)?(?<description>.*?)$'
foreach ($line in $inputText)
{
$isMatch = $line -match $pattern
if (-not $isMatch)
{
Write-Warning "Cannot parse expression: $line"
continue
}
New-Object psobject -Property ([ordered]@{
'Ide' = $Matches.ide
'Code' = $Matches.code
'Mode' = $Matches.mode
'Description' = $Matches.description
})
}
$inputText=@”
IDE00001-ENG99061-生产模式访问控制
IDE00001-ENG115730-生产模式Aussenbeleuchtung
IDE00001-ENG11204-生产模式Heckwischer
IDE00001-ENG98647-生产模式内部照明
IDE00001-ENG115729-生产模式Scheinwerferreinigung
IDE00001-ENG115731-生产模式-Virtuel_踏板
IDE00002传输模式
IDE00820激活和停用所有开发消息
IDE01550维修位置
IDE02152生产模式中的特性
IDE02269-MAS04382-锁定期间的确认信号光反馈
IDE02332停用生产模式
IDE02488-DWA内部监控
IDE02711-ENG116690-后窗雨刮器自动系统Heckwischen
“@-split”`n”
$pattern='^((?[IDE0-9]+)-(((?[A-Z0-9]+)-)(((?)生产模式|运输模式)-)(?*?)
foreach($inputText中的行)
{
$isMatch=$line-匹配$pattern
如果(-not$isMatch)
{
写入警告“无法分析表达式:$line”
持续
}
新对象psobject-属性([ordered]@{
“Ide”=$Matches.Ide
“Code”=$Matches.Code
“Mode”=$Matches.Mode
“Description”=$Matches.Description
})
}
您说过您的数据不是以相同的方式构造的。也许你的正则表达式需要比上面给出的要复杂得多。或者,如果可以识别可能出现的所有不同结构,则可以使用不同的正则表达式多次运行解析
IDE00002 Transport mode Transport mode
IDE02152 production mode Characteristics in production mode
IDE02332 production mode Deactivate production mode
$inputText = @"
IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen
"@ -split "`n"
$pattern = '^((?<ide>[IDE0-9]+)-)((?<code>[A-Z0-9]+)-)?((?<mode>Production mode|Transport mode)-?)?(?<description>.*?)$'
foreach ($line in $inputText)
{
$isMatch = $line -match $pattern
if (-not $isMatch)
{
Write-Warning "Cannot parse expression: $line"
continue
}
New-Object psobject -Property ([ordered]@{
'Ide' = $Matches.ide
'Code' = $Matches.code
'Mode' = $Matches.mode
'Description' = $Matches.description
})
}