Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/powershell/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
多行数据:使用Powershell从CSV中删除LF(但不是CRLF)_Powershell_Csv_Newline - Fatal编程技术网

多行数据:使用Powershell从CSV中删除LF(但不是CRLF)

多行数据:使用Powershell从CSV中删除LF(但不是CRLF),powershell,csv,newline,Powershell,Csv,Newline,我有一些CSV数据需要通过删除内联换行符和特殊字符(如排版引号)来清理。我觉得我可以通过Python或Unix UTIL来实现这一点,但我被困在一个非常普通的Windows 2012框中,所以我尝试一下PowerShell v5,尽管我缺乏使用它的经验 以下是我希望实现的目标: $InputFile: "INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{C

我有一些CSV数据需要通过删除内联换行符和特殊字符(如排版引号)来清理。我觉得我可以通过Python或Unix UTIL来实现这一点,但我被困在一个非常普通的Windows 2012框中,所以我尝试一下PowerShell v5,尽管我缺乏使用它的经验

以下是我希望实现的目标:

$InputFile

"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020306","John","Davis","Employee was not dressed appropriately."{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, “That’s not my job”"{CRLF}
"00020309","Bob","Meyers","Employee did the following:{LF}
• Showed up late{LF}
• Did not complete assignments{LF}
• Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}
"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, ""That's not my job"""{CRLF}
"00020309","Bob","Meyers","Employee did the following: * Showed up late * Did not complete assignments * Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}
$OutputFile

"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020306","John","Davis","Employee was not dressed appropriately."{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, “That’s not my job”"{CRLF}
"00020309","Bob","Meyers","Employee did the following:{LF}
• Showed up late{LF}
• Did not complete assignments{LF}
• Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}
"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, ""That's not my job"""{CRLF}
"00020309","Bob","Meyers","Employee did the following: * Showed up late * Did not complete assignments * Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}
以下代码起作用:

(Get-Content $InputFile -Raw) `
    -replace '(?<!\x0d)\x0a',' ' `
    -replace "[‘’´]","'" `
    -replace '[“”]','""' `
    -replace "\xa0"," " `
    -replace '[•·]','*' | Set-Content $OutputFile -Encoding ASCII
但我的对象上似乎没有notes属性:

Exception setting "notes": "The property 'notes' cannot be found on this object. Verify that the property exists and can be set."
At C:\convert.ps1:53 char:5
+     $_.notes= $_.notes -replace '(?<!\x0d)\x0a',' '
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], SetValueInvocationException
    + FullyQualifiedErrorId : ExceptionWhenSetting
注意:

  • 有关强健的解决方案,请参阅我的

  • 下面的答案对于性能良好的一般逐行处理解决方案可能仍然有意义,尽管它总是将仅LF实例也视为行分隔符(已更新为使用相同的正则表达式来区分行的开始行和添加到问题的AutoIt解决方案中使用的行的延续行)


考虑到文件的大小,出于性能原因,我建议继续使用纯文本处理:

  • 该语句支持快速逐行处理;它将CRLF和LF都识别为换行符,PowerShell通常会这样做。但是,请注意,如果返回的每一行都有其尾部换行符被剥离,则无法判断输入行是否仅以LF的CRLF结尾

  • 直接使用.NET类型,绕过管道并启用对输出文件的快速写入

  • 有关PowerShell的一般性能提示,请参阅

注意:上面的假设没有行继续符也匹配regex模式
“^”[^“,]”
,希望它足够健壮(您认为它是,因为您的AutoIt解决方案基于它)

行的开头和后续行的延续之间的这种简单区别避免了为了区分CRLF和LF换行而需要较低级别的文件I/O,而my需要这样做。

注意:

  • 有关强健的解决方案,请参阅我的

  • 下面的答案对于性能良好的一般逐行处理解决方案可能仍然有意义,尽管它总是将仅LF实例也视为行分隔符(已更新为使用相同的正则表达式来区分行的开始行和添加到问题的AutoIt解决方案中使用的行的延续行)


考虑到文件的大小,出于性能原因,我建议继续使用纯文本处理:

  • 该语句支持快速逐行处理;它将CRLF和LF都识别为换行符,PowerShell通常会这样做。但是,请注意,如果返回的每一行都有其尾部换行符被剥离,则无法判断输入行是否仅以LF的CRLF结尾

  • 直接使用.NET类型,绕过管道并启用对输出文件的快速写入

  • 有关PowerShell的一般性能提示,请参阅

注意:上面的假设没有行继续符也匹配regex模式
“^”[^“,]”
,希望它足够健壮(您认为它是,因为您的AutoIt解决方案基于它)


行的开头和后续行的延续之间的这种简单区别避免了为了区分CRLF和LF换行而需要较低级别的文件I/O,my就是这样做的。

第一个答案可能比这更好,因为我不确定PS是否需要以这种方式将所有内容加载到内存中(虽然我认为是的),但从你上面开始,我一直在思考这一点

# Import CSV into a variable
$InputFile = Import-Csv $InputFilePath

# Gets all field names, stores in $Fields
$InputFile | Get-Member -MemberType NoteProperty | 
Select-Object Name | Set-Variable Fields

# Updates each field entry
$InputFile | ForEach-Object {
    $thisLine = $_
    $Fields | ForEach-Object {
            ($thisLine).($_.Name) = ($thisLine).($_.Name) `
                -replace '(?<!\x0d)\x0a',' ' `
                -replace "[‘’´]","'" `
                -replace '[“”]','""' `
                -replace "\xa0"," " `
                -replace '[•·]','*'
            }
    $thisLine | Export-Csv $OutputFile -NoTypeInformation -Encoding ASCII -Append
} 
#将CSV导入变量
$InputFile=导入Csv$InputFilePath
#获取所有字段名,存储在$Fields中
$InputFile |获取成员-成员类型NoteProperty |
选择对象名称|设置变量字段
#更新每个字段条目
$InputFile | ForEach对象{
$thisLine=$_
$Fields | ForEach对象{
($thisLine)。($.Name)=($thisLine)。($.Name)`

-替换“(?第一个答案可能比这个好,因为我不确定PS是否需要以这种方式将所有内容加载到内存中(尽管我认为它需要),但是,从上面开始,我一直在思考这一点

# Import CSV into a variable
$InputFile = Import-Csv $InputFilePath

# Gets all field names, stores in $Fields
$InputFile | Get-Member -MemberType NoteProperty | 
Select-Object Name | Set-Variable Fields

# Updates each field entry
$InputFile | ForEach-Object {
    $thisLine = $_
    $Fields | ForEach-Object {
            ($thisLine).($_.Name) = ($thisLine).($_.Name) `
                -replace '(?<!\x0d)\x0a',' ' `
                -replace "[‘’´]","'" `
                -replace '[“”]','""' `
                -replace "\xa0"," " `
                -replace '[•·]','*'
            }
    $thisLine | Export-Csv $OutputFile -NoTypeInformation -Encoding ASCII -Append
} 
#将CSV导入变量
$InputFile=导入Csv$InputFilePath
#获取所有字段名,存储在$Fields中
$InputFile |获取成员-成员类型NoteProperty |
选择对象名称|设置变量字段
#更新每个字段条目
$InputFile | ForEach对象{
$thisLine=$_
$Fields | ForEach对象{
($thisLine)。($.Name)=($thisLine)。($.Name)`
-替换“(?这里是另一个“逐行”尝试,有点类似于mklement0的答案。它假设没有“行继续”行以“开头。希望它的性能更好

# Clear contents of file (Not sure if you need/want this...)
if (Test-Path -type leaf $OutputFile) { Clear-Content $OutputFile }

# Flag for first entry, since no data manipulation needed there
$firstEntry = $true

foreach($line in [System.IO.File]::ReadLines($InputFile)) {
    if ($firstEntry) {
        Add-Content -Path $OutputFile -Value $line -NoNewline
        $firstEntry = $false
    }
    else {
        if ($line[0] -eq '"') { Add-Content -Path $OutputFile "`r`n" -NoNewline}
        else { Add-Content -Path $OutputFile " " -NoNewline}
        $sanitizedLine = $line -replace '(?<!\x0d)\x0a',' ' `
                               -replace "[‘’´]","'" `
                               -replace '[“”]','""' `
                               -replace "\xa0"," " `
                               -replace '[•·]','*'
        Add-Content -Path $OutputFile -Value $sanitizedLine -NoNewline
    }
}
#清除文件内容(不确定是否需要/想要此…)
if(测试路径-类型叶$OutputFile){Clear Content$OutputFile}
#第一个条目的标志,因为这里不需要数据操作
$firstEntry=$true
foreach([System.IO.File]中的行)::ReadLines($InputFile)){
如果($firstEntry){
添加内容-路径$OutputFile-值$line-非WLine
$firstEntry=$false
}
否则{
if($line[0]-eq''){Add Content-Path$OutputFile“`r`n”-NoNewline}
else{addcontent-Path$OutputFile”“-NoNewline}
$sanitizedLine=$line-replace'(?这里是另一个“逐行”尝试,有点类似于mklement0的答案。它假设没有“行继续”行以“开头。希望它的性能更好

# Clear contents of file (Not sure if you need/want this...)
if (Test-Path -type leaf $OutputFile) { Clear-Content $OutputFile }

# Flag for first entry, since no data manipulation needed there
$firstEntry = $true

foreach($line in [System.IO.File]::ReadLines($InputFile)) {
    if ($firstEntry) {
        Add-Content -Path $OutputFile -Value $line -NoNewline
        $firstEntry = $false
    }
    else {
        if ($line[0] -eq '"') { Add-Content -Path $OutputFile "`r`n" -NoNewline}
        else { Add-Content -Path $OutputFile " " -NoNewline}
        $sanitizedLine = $line -replace '(?<!\x0d)\x0a',' ' `
                               -replace "[‘’´]","'" `
                               -replace '[“”]','""' `
                               -replace "\xa0"," " `
                               -replace '[•·]','*'
        Add-Content -Path $OutputFile -Value $sanitizedLine -NoNewline
    }
}
#清除文件内容(不确定是否需要/想要此…)
if(测试路径-t