Regex 删除与正则表达式和sed匹配的第一个文本块_Regex_Bash_Awk_Sed

Regex 删除与正则表达式和sed匹配的第一个文本块

regex bash awk sed

Regex 删除与正则表达式和sed匹配的第一个文本块,regex,bash,awk,sed,Regex,Bash,Awk,Sed,我有一篇这样的文章 # This configuration was generated by # `rubocop --auto-gen-config` # Offense count: 1 # Configuration parameters: Include. # Include: **/Gemfile, **/gems.rb Bundler/DuplicatedGem: Exclude: - 'Gemfile' # Offense count: 24 # Cop supp

我有一篇这样的文章

# This configuration was generated by
# `rubocop --auto-gen-config`

# Offense count: 1
# Configuration parameters: Include.
# Include: **/Gemfile, **/gems.rb
Bundler/DuplicatedGem:
  Exclude:
    - 'Gemfile'

# Offense count: 24
# Cop supports --auto-correct.
# Configuration parameters: Include, TreatCommentsAsGroupSeparators.
# Include: **/Gemfile, **/gems.rb
Bundler/OrderedGems:
  Exclude:
    - 'Gemfile'

# Offense count: 1
# Cop supports --auto-correct.
Layout/MultilineBlockLayout:
  Exclude:
    - 'test/unit/github_fetcher/issue_comments_test.rb'

# Offense count: 1
# Cop supports --auto-correct.
# Configuration parameters: EnforcedStyle, SupportedStyles.
# SupportedStyles: symmetrical, new_line, same_line
Layout/MultilineHashBraceLayout:
  Exclude:
    - 'config/environments/production.rb'

我希望只删除以

攻击计数开始的第一块文本。我有：/^#冒犯（[\s\s]+？）\n\n/m

如果我与sed一起使用，我有以下错误：
$ sed -e '/^# Offense([\s\S]+?)\n\n\/d' .rubocop_todo.yml
sed: 1: "/^# Offense([\s\S]+?)\n ...": unterminated regular expression

如果我将空字符串作为第一个参数，则它不会执行任何操作：
$ sed -e '' '/^# Offense([\s\S]+?)\n\n\/d' .rubocop_todo.yml

为什么会失败？我能做什么
我在osx上使用的是awk版本20070501
或GNU awk 4.1.4，API:1.1（GNU MPFR 3.1.5，GNU MP 6.1.2）
使用awk:
awk 'BEGIN{RS=ORS="\n\n"}!/^# Offense/||a++' file

详情：
BEGIN {             # before starting to read the records
    RS=ORS="\n\n"   # define the record separator(RS) and the output record
                    # separator(ORS) 
}

# condition: when it's true, the record is printed
!/^# Offense/ # doesn't start with "# Offense"
||            # OR
a++           # "a" is true ( at the first block that starts with "# Offense", "a"
              # isn't defined and evaluated as false, then it is incremented and
              # evaluated as true for the next blocks.)

使用awk：
awk 'BEGIN{RS=ORS="\n\n"}!/^# Offense/||a++' file

详情：
BEGIN {             # before starting to read the records
    RS=ORS="\n\n"   # define the record separator(RS) and the output record
                    # separator(ORS) 
}

# condition: when it's true, the record is printed
!/^# Offense/ # doesn't start with "# Offense"
||            # OR
a++           # "a" is true ( at the first block that starts with "# Offense", "a"
              # isn't defined and evaluated as false, then it is incremented and
              # evaluated as true for the next blocks.)

下面是一个awk
表达式，它将打印除第一个以#offence
开头并以空行结尾的块之外的所有内容：
awk '/# Offense/ {n++}  n!=1 {print}  n&&/^$/ {n++}' file

细分：

这里有三个表达式，每种形式：condition{command}
。条件可以是由与当前行匹配的正则表达式、变量测试等组成的复杂（逻辑）表达式
n
是一个块计数器，在块的开始和结束时递增。最初，n=0

/#进攻/{n++}
-匹配第一个#进攻
后，我们将其增加到n=1

<> >代码> N＆＆/$/{N++} < /代码> -在该块完成后（我们匹配空白链接，但仅在检测到第一个块之后，<代码> N> 0代码/代码>），我们将其再次增加到<代码> n＝2 < /C> > 。
n！=1{print}
-同时，当我们在第一个块之外时，我们逐字打印每一行
这里有一个awk
表达式，它将打印除第一个以#offence
开头并以空行结尾的块之外的所有内容：
awk '/# Offense/ {n++}  n!=1 {print}  n&&/^$/ {n++}' file

细分：

这里有三个表达式，每种形式：condition{command}
。条件可以是由与当前行匹配的正则表达式、变量测试等组成的复杂（逻辑）表达式
n
是一个块计数器，在块的开始和结束时递增。最初，n=0

/#进攻/{n++}
-匹配第一个#进攻
后，我们将其增加到n=1

<> >代码> N＆＆/$/{N++} < /代码> -在该块完成后（我们匹配空白链接，但仅在检测到第一个块之后，<代码> N> 0代码/代码>），我们将其再次增加到<代码> n＝2 < /C> > 。
n！=1{print}
-同时，当我们在第一个块之外时，我们逐字打印每一行
Sed说“unterminated regular expression”，因为最后一个斜杠前面有一个反斜杠：\/
将转义最后一个斜杠，并使字符串作为正则表达式无效
我认为您可以通过以下Perl一行程序来实现：
perl -0pe 's/# Offense.*?\n\n//s' test.yml

其中：-0
将记录分隔符设置为null，在一个字符串中有效地读取整个内容，-p
打印结果（如果要替换它，请添加-i
，即perl-i-0pe…
），并且-e
将下一个字符串视为正则表达式。*？
使其不贪婪，因此仅匹配第一部分。而/s
修饰符也将使点匹配换行符
输出：
# This configuration was generated by

# Offense count: 24
# Cop supports --auto-correct.
# Configuration parameters: Include, TreatCommentsAsGroupSeparators.
# Include: **/Gemfile, **/gems.rb
Bundler/OrderedGems:
  Exclude:
    - 'Gemfile'

Sed说“unterminated regular expression”，因为最后一个斜杠前面有一个反斜杠：\/
将转义最后一个斜杠，并使字符串作为正则表达式无效
我认为您可以通过以下Perl一行程序来实现：
perl -0pe 's/# Offense.*?\n\n//s' test.yml

其中：-0
将记录分隔符设置为null，在一个字符串中有效地读取整个内容，-p
打印结果（如果要替换它，请添加-i
，即perl-i-0pe…
），并且-e
将下一个字符串视为正则表达式。*？
使其不贪婪，因此仅匹配第一部分。而/s
修饰符也将使点匹配换行符
输出：
# This configuration was generated by

# Offense count: 24
# Cop supports --auto-correct.
# Configuration parameters: Include, TreatCommentsAsGroupSeparators.
# Include: **/Gemfile, **/gems.rb
Bundler/OrderedGems:
  Exclude:
    - 'Gemfile'

sed逐行工作，没有非贪婪量词（也没有\s
字符类）。换句话说，您的“工作”正则表达式是无用的，您必须找到其他方法。开始阅读更多关于sed的内容，以及如何使用标签和模式空间。还可以阅读BRE和ERE（sed中提供的两种regex口味）。但在我看来，sed不是一个很好的工具，请尝试使用awk或perl。谢谢，我将用awk或perl来看看awk。如果您将记录分隔符定义为\n\n
，您可以轻松地按块读取文件。然后，您所要做的就是检查当前块是否以“进攻”开头。sed逐行工作，并且没有非贪婪量词（也没有\s
字符类）。换句话说，您的“工作”正则表达式是无用的，您必须找到其他方法。开始阅读更多关于sed的内容，以及如何使用标签和模式空间。还可以阅读BRE和ERE（sed中提供的两种regex口味）。但在我看来，sed不是一个很好的工具，请尝试使用awk或perl。谢谢，我将用awk或perl来看看awk。如果您将记录分隔符定义为\n\n
，您可以轻松地按块读取文件。然后，您所要做的就是检查当前块是否以“进攻”开头。是的，我们不希望在第一个块之前增加n
。在这种情况下，写入/^$/&&n++
。只有当/^$/
成功时才测试并递增。@casimirethipplyte实际上无法工作，因为n
不能在任何空白行匹配上递增，而是仅在第一个块开始后。是，我们不想在第一个块之前增加n
值。在这种情况下，请在第一个块之前增加/^$/&&n++/code>。只有当/^$/
成功时，n才会被测试并递增。@casimirithippolyte，这实际上不起作用，因为n
不能在任何空行匹配上递增，只能在第一个块开始后递增。@Mi