如何在Powershell中从$pattern中删除多行文本块

如何在Powershell中从$pattern中删除多行文本块,powershell,Powershell,我得到一个文本文件的内容,该文件部分由gsutil创建,我试图将其内容放入$body中,但我想省略一个包含特殊字符的文本块。问题是我无法匹配此文本块以便将其删除。所以,当我打印$body时,它仍然包含我试图省略的所有文本 以下是我代码的一部分: $pattern = @" ==> NOTE: You are uploading one or more large file(s), which would run significantly faster if you enable para

我得到一个文本文件的内容,该文件部分由gsutil创建,我试图将其内容放入$body中,但我想省略一个包含特殊字符的文本块。问题是我无法匹配此文本块以便将其删除。所以,当我打印$body时,它仍然包含我试图省略的所有文本

以下是我代码的一部分:

$pattern = @"
==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this you and any
users that download such composite files will need to have a compiled
crcmod installed (see "gsutil help crcmod").
"@

$pattern = ([regex]::Escape($pattern))

$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch
所以基本上我需要它来显示文本文件中的所有内容,除了$pattern中的文本块。我尝试了without-Raw和without([regex]::Escape($pattern)),但它无法删除整个文本块

这一定是因为特殊字符,可能是“,.()”,因为如果我让模式简单一些,比如:

$pattern = @"
NOTE: You are uploading one or more
"@
然后它会工作,这部分文本将从$body中删除


如果在“@”和“@”之间的$pattern内的所有内容都被逐字处理,那就太好了。我想要没有函数等的最简单解决方案。如果有人能帮我解决这个问题,我将不胜感激。

一个简单的方法来处理这个任务(没有正则表达式)将使用
-notin
运算符。由于
获取内容
字符串[]的形式返回文件内容,因此:

#requires -Version 4

$set = @('==> NOTE: You are uploading one or more large file(s), which would run'
'significantly faster if you enable parallel composite uploads. This'
'feature can be enabled by editing the'
'"parallel_composite_upload_threshold" value in your .boto'
'configuration file. However, note that if you do this you and any'
'users that download such composite files will need to have a compiled'
'crcmod installed (see "gsutil help crcmod").')

$filteredContent = @(Get-Content -Path $path).
    Where({ $_.Trim() -notin $set }) # trim added for misc whitespace

v2兼容解决方案:

@(Get-Content -Path $path) |
    Where-Object { $set -notcontains $_.Trim() }

问题的完整文本存储在文件
\SO\u 55538262.txt

此脚本包含手动转义模式:

$pattern = '(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.'

$body = (Get-Content .\SO_55538262.txt -raw) -replace $pattern
$body
返回此处:

I'm getting the contents of a text file which is partly created by gsutil and I'm trying to put its contents in $body but I want to omit a block of text that contains special characters. The problem is that I'm not able to match this block of text in order for it to be removed. So when I print out $body it still contains all the text that I'm trying to omit.

Here's a part of my code:

$pattern = @"

"@

$pattern = ([regex]::Escape($pattern))

$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch

So basically I need it to display everything inside the text file except for the block of text in $pattern. I tried without -Raw and without ([regex]::Escape($pattern)) but it won't remove that entire block of text.

It has to be because of the special characters, probably the " , . () because if I make the pattern simple such as:

$pattern = @" NOTE: You are uploading one or more "@

then it works and this part of text is removed from $body.

It'd be nice if everything inside $pattern between the @" and "@ was treated literally. I'd like the simplest solution without functions, etc.
正则表达式的解释来自:


我不确定,因为这部分是由gsutil命令编写的。它在$pattern中的显示方式与您所看到的完全相同。谢谢。如果我在Notepad++中打开文件并启用View->Show End of Line,那么我确实会看到每一行都以CR | lf结尾。不幸的是,这不起作用。我仍然可以看到完整的文本块。此外,我不能在每一行后面手动添加一个。The文本文件保持原样。我无法修改它。我使用的是PS v5。1@yorkman我根本没有修改文本文件。而且,它的工作原理与您在问题中所描述的完全相同。好的,但是为什么我看到每一行都以单引号开始,以单引号结束?我现在明白您的意思了。复制并粘贴文本,在开始处使用单引号每行中的每一行都有效…这对我的具体情况来说是有效的,但我不想删除一大块文本,并在每行上手动添加一个引号两次。无论如何,谢谢。我确实有第二块文本要从同一文本文件中删除…不知道如何指定删除两个模式。对不起,还有一件事。如果我想删除多个文本块,我如何将它们全部删除?因此,在第一个模式被删除并放入$body后,我如何在$body中搜索另一个字符串模式并将其删除?做得很好!工作非常完美!您能解释它是如何工作的吗?如果需要,我可以将其与其他文本块一起使用?我猜您正在放入该部分以==>NOTE作为起点,以gsutil help crcmod作为终点,但还有所有奇怪的字符,如“(?sm^*?”\”。@yorkman basic regex减去第一位。我很难理解RegEx链接,并且在尝试匹配另一组行时将其付诸实践。但现在,假设我能够弄明白,如何匹配并从$body中删除多个模式?可能类似于:$body=(获取内容).\u 55538262.txt-raw)-替换$pattern$pattern 2,$pattern 3?您可以通过将多个模式与一个垂直条连接起来来组合它们,在这两个垂直条之间充当逻辑或响应。不幸的是,我不能把它付诸实践,因为我还没有掌握所有的正则表达式。如果我只需要匹配每一行,即使我必须用单引号指定模式,也会容易得多,正如InCorrigible1下面所示。请看,我想在一个文本文件中搜索多个模式块并删除它们。我想我可以从你的一个模式样本中找出它,但到目前为止我还不能。
(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.

(?sm) match the remainder of the pattern with the following effective flags: gms  
s modifier: single line. Dot matches newline characters  
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)   
^ asserts position at start of a line  
== matches the characters == literally (case sensitive)  
\> matches the character > literally (case sensitive)  
 NOTE: You matches the characters  NOTE: You literally (case sensitive)
.*?  
. matches any character 
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)  
"gsutil help crcmod" matches the characters "gsutil help crcmod" literally (case sensitive)  
\) matches the character ) literally (case sensitive)  
\. matches the character . literally (case sensitive)