Regex 搜索与字符串1匹配且未被字符串2分隔的多行文字_Regex_Awk_Sed_Grep

Regex 搜索与字符串1匹配且未被字符串2分隔的多行文字

regex awk sed grep

Regex 搜索与字符串1匹配且未被字符串2分隔的多行文字,regex,awk,sed,grep,Regex,Awk,Sed,Grep,我有一个这样的文件： abc|100|test|line|with|multiple|information|||in|different||fields abc|100|another|test|line|with|multiple|information|in||different|fields| abc|110|different|looking|line|with|some|supplementary|information abc|100|test|line|with|multiple

我有一个这样的文件：

abc|100|test|line|with|multiple|information|||in|different||fields
abc|100|another|test|line|with|multiple|information|in||different|fields|
abc|110|different|looking|line|with|some|supplementary|information
abc|100|test|line|with|multiple|information|||in|different||fields
abc|110|different|looking|line|with|some|other|supplementary|information
abc|110|different|looking|line|with|additional||information
abc|100|another|test|line|with|multiple|information|in||different|fields|
abc|110|different|looking|line|with|supplementary|information

我正在寻找一个与sed/awk/（e）grep一起使用的regexp（实际上对我来说，哪一个都可以）在上面提到的文本中找到以下内容：

abc|100|test|line|with|multiple|information|||in|different||fields
abc|110|different|looking|line|with|some|other|supplementary|information
abc|110|different|looking|line|with|additional||information

如果在另一行出现之前，后面至少有两行110行，我想返回一行100行。结果应包含初始的| 100 |行以及后面的所有| 110 |行，但不包括下面的| 100 |行

sed -ne '/|100|/,/|110|/p'

为我提供所有| 100 |行的列表，后面至少有一行| 110 |行。但是如果| 110 |行重复了不止一次，它不会检查。我得到了我不期待的结果

sed -ne '/|100|/,/|100|/p'

返回所有| 100 |行以及下一| 100 |行（包括下一| 100 |行）之间的内容的列表

sed -ne '/|100|/,/|110|/p'

试图找出搜索模式之间的界限对我来说总是一场噩梦。我在类似的问题上花了几个小时的反复尝试，终于奏效了。但我一直不明白为什么。我希望，s.o.这次能帮我省去头痛，也许能解释一下这种模式是如何起作用的。我很确定，我会再次面对这种问题，然后我终于可以自己解决了

谢谢你在这方面的帮助

问候

Manuel

在AWK中，字段分隔符设置为管道字符，第二个字段与每行100和110进行比较$0表示输入文件中的一行

BEGIN { FS = "|" }
{
    if($2 == 100) { 
        one_hundred = 1; 
        one_hundred_one = 0;
        var0 = $0
    } 

    if($2 == 110) { 
        one_hundred_one += 1; 
        if(one_hundred_one == 1 && one_hundred = 1) var1 = $0; 
        if(one_hundred_one == 2 && one_hundred = 1) var2 = $0;
    } 

    if(one_hundred == 1 && one_hundred_one == 2) {
        print var0
        print var1
        print var2
    }  
}

awk-f foo.awk input.txt

abc|100|test|line|with|multiple|information|||in|different||fields
abc|110|different|looking|line|with|some|other|supplementary|information
abc|110|different|looking|line|with|additional||information

这里有一个GNU awk特定的答案：使用

|100 |

作为记录分隔符，

|110 |

作为字段分隔符，并查找至少包含3个字段的记录

gawk '
    BEGIN {
        # a newline, the first pipe-delimited column, then the "100" value
        RS="(\n[^|]+[|]100[|])"
        FS="[|]110[|]"
    } 
    NF >= 3 {print RT $0}        # RT is the actual text matching the RS pattern
' file

我会在awk里做这件事

awk -F'|' '$2==100&&c>2{print b} $2==100{c=1;b=$0;next} $2==110&&c{c++;b=b RS $0;next} {c=0}' file

分门别类，便于阅读：

awk -F'|' '

  # If we're starting a new section and conditions have been met, print buffer
  $2==100 && c>2 {print b}

  # Start a section with a new count and a new buffer...
  $2==100 {c=1;b=$0;next}

  # Add to buffer
  $2==110 && c {c++;b=b RS $0}

  # Finally, zero everything if we encounter lines that don't fit the pattern
  {c=0;b=""}

' file

这不是使用正则表达式，而是使用指定的字段分隔符逐步遍历文件。看到“启动”条件后，它开始保留缓冲区。随着后续行与“continue”条件匹配，缓冲区将增长。一旦我们看到一个新部分的开始，如果计数器足够大，我们就打印缓冲区

您的示例数据对我有用。

欢迎使用StackOverflow。Stackoverflow不是免费的代码编写服务。如果你试图写一些东西，但遇到了一些困难，那么展示一下你所做的，并提出一个具体的问题。有关提示，请参阅。如果你只是想做免费的工作，那么StackOverflow就不适合这样的要求。至少尝试使用互联网提供的丰富文档。或者花钱请人替你写。我想，介绍一下我在这方面的基本知识是没有用的。已编辑问题。感谢添加您的尝试。至少它是某种东西。：）关闭投票被撤回。你希望

$2==100

避免匹配“41004”@glennjackman yea这样的值。从问题中我不太确定他是想要一个正则表达式匹配还是一个文本值，按照你的建议更新了我的回复，我想他最终必须决定他想要什么。搜索的值正好是100-字段中的信息限制在100到900之间的数字值。我已经检查过了，这个解决方案也有效。谢谢你的帮助！很乐意帮忙！我想到的一个附带条件是，如果文件末尾存在行模式（100后跟2个或更多110），则不会打印，因为只有当脚本看到

$2==100

时才会打印。如果需要，可以使用包含

if（）

的

END

部分来解决此问题。