多行数据：使用Powershell从CSV中删除LF（但不是CRLF）_Powershell_Csv_Newline

多行数据：使用Powershell从CSV中删除LF（但不是CRLF）

powershell csv

多行数据：使用Powershell从CSV中删除LF（但不是CRLF）,powershell,csv,newline,Powershell,Csv,Newline,我有一些CSV数据需要通过删除内联换行符和特殊字符（如排版引号）来清理。我觉得我可以通过Python或Unix UTIL来实现这一点，但我被困在一个非常普通的Windows 2012框中，所以我尝试一下PowerShell v5，尽管我缺乏使用它的经验以下是我希望实现的目标： $InputFile： "INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{C

我有一些CSV数据需要通过删除内联换行符和特殊字符（如排版引号）来清理。我觉得我可以通过Python或Unix UTIL来实现这一点，但我被困在一个非常普通的Windows 2012框中，所以我尝试一下PowerShell v5，尽管我缺乏使用它的经验

以下是我希望实现的目标：

$InputFile

：

"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020306","John","Davis","Employee was not dressed appropriately."{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, “That’s not my job”"{CRLF}
"00020309","Bob","Meyers","Employee did the following:{LF}
• Showed up late{LF}
• Did not complete assignments{LF}
• Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}

"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, ""That's not my job"""{CRLF}
"00020309","Bob","Meyers","Employee did the following: * Showed up late * Did not complete assignments * Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}

$OutputFile

：

"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020306","John","Davis","Employee was not dressed appropriately."{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, “That’s not my job”"{CRLF}
"00020309","Bob","Meyers","Employee did the following:{LF}
• Showed up late{LF}
• Did not complete assignments{LF}
• Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}

"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, ""That's not my job"""{CRLF}
"00020309","Bob","Meyers","Employee did the following: * Showed up late * Did not complete assignments * Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}

以下代码起作用：

(Get-Content $InputFile -Raw) `
    -replace '(?<!\x0d)\x0a',' ' `
    -replace "[‘’´]","'" `
    -replace '[“”]','""' `
    -replace "\xa0"," " `
    -replace '[•·]','*' | Set-Content $OutputFile -Encoding ASCII

但我的对象上似乎没有notes属性：

Exception setting "notes": "The property 'notes' cannot be found on this object. Verify that the property exists and can be set."
At C:\convert.ps1:53 char:5
+     $_.notes= $_.notes -replace '(?<!\x0d)\x0a',' '
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], SetValueInvocationException
    + FullyQualifiedErrorId : ExceptionWhenSetting

注意：

有关强健的解决方案，请参阅我的
下面的答案对于性能良好的一般逐行处理解决方案可能仍然有意义，尽管它总是将仅LF实例也视为行分隔符（已更新为使用相同的正则表达式来区分行的开始行和添加到问题的AutoIt解决方案中使用的行的延续行）

考虑到文件的大小，出于性能原因，我建议继续使用纯文本处理：

该语句支持快速逐行处理；它将CRLF和LF都识别为换行符，PowerShell通常会这样做。但是，请注意，如果返回的每一行都有其尾部换行符被剥离，则无法判断输入行是否仅以LF的CRLF结尾
直接使用.NET类型，绕过管道并启用对输出文件的快速写入
有关PowerShell的一般性能提示，请参阅

注意：上面的假设没有行继续符也匹配regex模式
“^”[^“，]”
，希望它足够健壮（您认为它是，因为您的AutoIt解决方案基于它）

行的开头和后续行的延续之间的这种简单区别避免了为了区分CRLF和LF换行而需要较低级别的文件I/O，而my需要这样做。

注意：

有关强健的解决方案，请参阅我的
下面的答案对于性能良好的一般逐行处理解决方案可能仍然有意义，尽管它总是将仅LF实例也视为行分隔符（已更新为使用相同的正则表达式来区分行的开始行和添加到问题的AutoIt解决方案中使用的行的延续行）

考虑到文件的大小，出于性能原因，我建议继续使用纯文本处理：

该语句支持快速逐行处理；它将CRLF和LF都识别为换行符，PowerShell通常会这样做。但是，请注意，如果返回的每一行都有其尾部换行符被剥离，则无法判断输入行是否仅以LF的CRLF结尾
直接使用.NET类型，绕过管道并启用对输出文件的快速写入
有关PowerShell的一般性能提示，请参阅

注意：上面的假设没有行继续符也匹配regex模式
“^”[^“，]”
，希望它足够健壮（您认为它是，因为您的AutoIt解决方案基于它）

行的开头和后续行的延续之间的这种简单区别避免了为了区分CRLF和LF换行而需要较低级别的文件I/O，my就是这样做的。

第一个答案可能比这更好，因为我不确定PS是否需要以这种方式将所有内容加载到内存中（虽然我认为是的），但从你上面开始，我一直在思考这一点

# Import CSV into a variable
$InputFile = Import-Csv $InputFilePath

# Gets all field names, stores in $Fields
$InputFile | Get-Member -MemberType NoteProperty | 
Select-Object Name | Set-Variable Fields

# Updates each field entry
$InputFile | ForEach-Object {
    $thisLine = $_
    $Fields | ForEach-Object {
            ($thisLine).($_.Name) = ($thisLine).($_.Name) `
                -replace '(?<!\x0d)\x0a',' ' `
                -replace "[‘’´]","'" `
                -replace '[“”]','""' `
                -replace "\xa0"," " `
                -replace '[•·]','*'
            }
    $thisLine | Export-Csv $OutputFile -NoTypeInformation -Encoding ASCII -Append
}

#将CSV导入变量
$InputFile=导入Csv$InputFilePath
#获取所有字段名，存储在$Fields中
$InputFile |获取成员-成员类型NoteProperty |
选择对象名称|设置变量字段
#更新每个字段条目
$InputFile | ForEach对象{
$thisLine=$_
$Fields | ForEach对象{
（$thisLine）。（$.Name）=（$thisLine）。（$.Name）`
-替换“（？第一个答案可能比这个好，因为我不确定PS是否需要以这种方式将所有内容加载到内存中（尽管我认为它需要），但是，从上面开始，我一直在思考这一点
# Import CSV into a variable
$InputFile = Import-Csv $InputFilePath

# Gets all field names, stores in $Fields
$InputFile | Get-Member -MemberType NoteProperty | 
Select-Object Name | Set-Variable Fields

# Updates each field entry
$InputFile | ForEach-Object {
    $thisLine = $_
    $Fields | ForEach-Object {
            ($thisLine).($_.Name) = ($thisLine).($_.Name) `
                -replace '(?<!\x0d)\x0a',' ' `
                -replace "[‘’´]","'" `
                -replace '[“”]','""' `
                -replace "\xa0"," " `
                -replace '[•·]','*'
            }
    $thisLine | Export-Csv $OutputFile -NoTypeInformation -Encoding ASCII -Append
} 

#将CSV导入变量
$InputFile=导入Csv$InputFilePath
#获取所有字段名，存储在$Fields中
$InputFile |获取成员-成员类型NoteProperty |
选择对象名称|设置变量字段
#更新每个字段条目
$InputFile | ForEach对象{
$thisLine=$_
$Fields | ForEach对象{
（$thisLine）。（$.Name）=（$thisLine）。（$.Name）`
-替换“（？这里是另一个“逐行”尝试，有点类似于mklement0的答案。它假设没有“行继续”行以“开头。希望它的性能更好
# Clear contents of file (Not sure if you need/want this...)
if (Test-Path -type leaf $OutputFile) { Clear-Content $OutputFile }

# Flag for first entry, since no data manipulation needed there
$firstEntry = $true

foreach($line in [System.IO.File]::ReadLines($InputFile)) {
    if ($firstEntry) {
        Add-Content -Path $OutputFile -Value $line -NoNewline
        $firstEntry = $false
    }
    else {
        if ($line[0] -eq '"') { Add-Content -Path $OutputFile "`r`n" -NoNewline}
        else { Add-Content -Path $OutputFile " " -NoNewline}
        $sanitizedLine = $line -replace '(?<!\x0d)\x0a',' ' `
                               -replace "[‘’´]","'" `
                               -replace '[“”]','""' `
                               -replace "\xa0"," " `
                               -replace '[•·]','*'
        Add-Content -Path $OutputFile -Value $sanitizedLine -NoNewline
    }
}

#清除文件内容（不确定是否需要/想要此…）
if（测试路径-类型叶$OutputFile）{Clear Content$OutputFile}
#第一个条目的标志，因为这里不需要数据操作
$firstEntry=$true
foreach（[System.IO.File]中的行）：:ReadLines（$InputFile））{
如果（$firstEntry）{
添加内容-路径$OutputFile-值$line-非WLine
$firstEntry=$false
}
否则{
if（$line[0]-eq''）{Add Content-Path$OutputFile“`r`n”-NoNewline}
else{addcontent-Path$OutputFile”“-NoNewline}
$sanitizedLine=$line-replace'（？这里是另一个“逐行”尝试，有点类似于mklement0的答案。它假设没有“行继续”行以“开头。希望它的性能更好
# Clear contents of file (Not sure if you need/want this...)
if (Test-Path -type leaf $OutputFile) { Clear-Content $OutputFile }

# Flag for first entry, since no data manipulation needed there
$firstEntry = $true

foreach($line in [System.IO.File]::ReadLines($InputFile)) {
    if ($firstEntry) {
        Add-Content -Path $OutputFile -Value $line -NoNewline
        $firstEntry = $false
    }
    else {
        if ($line[0] -eq '"') { Add-Content -Path $OutputFile "`r`n" -NoNewline}
        else { Add-Content -Path $OutputFile " " -NoNewline}
        $sanitizedLine = $line -replace '(?<!\x0d)\x0a',' ' `
                               -replace "[‘’´]","'" `
                               -replace '[“”]','""' `
                               -replace "\xa0"," " `
                               -replace '[•·]','*'
        Add-Content -Path $OutputFile -Value $sanitizedLine -NoNewline
    }
}

#清除文件内容（不确定是否需要/想要此…）
if（测试路径-t