Regex 使用多行正则表达式获取grep上下文_Regex_Bash_Grep

Regex 使用多行正则表达式获取grep上下文

regex bash grep

Regex 使用多行正则表达式获取grep上下文,regex,bash,grep,Regex,Bash,Grep,grep的-Pz和-C选项一起工作吗？我试着匹配一个相邻的行短语并打印它的上下文。扩展正则表达式和上下文选项分别工作，但不是像这样一起工作（打印整个文件）： file.txt的内容： line 1 line 2 line 3 line 4 line 5 word ...other text phrase ...yet another text line 6 line 7 line 8 line 9 line 10 预期结果： line 4 line 5 word ...other text p

grep的-Pz和-C选项一起工作吗？我试着匹配一个相邻的行短语并打印它的上下文。扩展正则表达式和上下文选项分别工作，但不是像这样一起工作（打印整个文件）：

file.txt的内容：

line 1
line 2
line 3
line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7
line 8
line 9
line 10

预期结果：

line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7

不，你不能。这是不允许的

-Pz

和

-C

彼此不喜欢。不要害怕，有一种方法可以做你想做的事：

grep-Pzo.*\n.*\n.*.*word.*\n.*短语.*\n.*\n.*”file.txt

或者你可以把它参数化

BEFORE=2
之后=2
grep-Pzo“（.*\n）{$BEFORE}.*word.*\n.*短语.*（\n.*）{$AFTER}”file.txt

使用

-Pzo

只打印与指定图案匹配的行

包括一些填充模式字符串的

*\n.*

您可能会发现此bash函数非常有用：

function pad_grep（）(
用法（）{echo“用法：$0[-ABC][EXPR][FILE]“1>&2；退出1；}
A=0
B=0
而getopts“A:B:C:”标志；do
中的大小写“$flag”
A） A=$OPTARG；；
B） B=$OPTARG；；
C） A=OPTARG美元；B=$OPTARG；；
*)用法；；
以撒
完成
EXPR=${@:$OPTIND:1}
文件=${@:$OPTIND+1:1}
#错误检查
[${EXPR}-eq 0]&用法
[${FILE}-ne 0&&！-f${FILE}]]&&usage
grep-Pzo“（.*\n）{$B}${EXPR}（\n.*{$A}”$文件
)
#自己动手
grep-Pzo.*\n.*\n.*\n.*.*word.*\n.*短语.*\n.*\n.*”file.txt
#使用函数
pad_grep-b3-a2'.*单词。*\n.*短语。*'file.txt
pad_grep-C 2.*word.*\n.*phrase.*'file.txt

您的问题很可能是GNU grep的

标志，该标志将行的定义更改为以

\0

易于演示。鉴于：

$ echo "$txt"
line 1
line 2
line 3
line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7
line 8
line 9
line 10

你可以做：

$ echo "$txt"  | ggrep --context=2  -Pz "word|phrase"
# prints all the lines

# $b=2 is equivalent to grep -B 2, or lines before
# $a=2 is equivalent to grep -A 2, or lines after
$ perl -lne 'BEGIN{$b=2; $a=2;}
             print join("\n", @a) if (/word/);
             print if (/word/../phrase/) || ($c && $c--);
             $c=$b if (/phrase/);
             shift @a if push(@a, $_)>$a;' file

或：

您可以通过实际为线路提供NUL端接来演示如何使用

：

$ echo "$txt" | tr '\n' '\0' | ggrep --context=2  -Pz "word|phrase" | tr '\0' '\n'
line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7

对于perl正则表达式以及前后和多行的逻辑，您最好只使用perl

鉴于：

$ cat file
line 1
line 2
line 3
line 4
line 5
word ...other text
betweener 1, line 7
betweener 2, line 8
phrase ...yet another text
line 10
line 11
line 12
line 13
line 14

你可以做：

$ echo "$txt"  | ggrep --context=2  -Pz "word|phrase"
# prints all the lines

# $b=2 is equivalent to grep -B 2, or lines before
# $a=2 is equivalent to grep -A 2, or lines after
$ perl -lne 'BEGIN{$b=2; $a=2;}
             print join("\n", @a) if (/word/);
             print if (/word/../phrase/) || ($c && $c--);
             $c=$b if (/phrase/);
             shift @a if push(@a, $_)>$a;' file

或者，您也可以使用POSIX或GNU awk：

$ awk 'BEGIN{b=2; a=2}
   /word/ { for (i=FNR-b;i<=FNR-1;i++) 
                 print arr[i]   # print the lines before the first match
            f=1}                # flag we are in the match
    f || (c && c--)             # print either if in the match or tail context
    /phrase/ {f=0; c=a}          # end match, start tail
    {for (ln in arr) 
         if (ln<FNR-b) delete arr[ln] # rolling line buffer
    arr[FNR]=$0}                # save current line
' file

即使没有“中间人”行，这种方法也有效。

从文件中提供示例输入。TXT提供的文本文件似乎有效，您的预期结果是什么？我已经更新了问题。用一个较低的-c参数试试。

扩展的regex和上下文选项分别工作

-OP显然在请求regex帮助，以便恢复

regex

标记。我不熟悉ggrep，但谢谢，我会研究它。

ggrep

只是我系统（Mac OS/BSD）上的GNU grep。如果你在Linux上——这也是你的grep！那么我认为正则表达式需要改变。如果我没有弄错的话，正则表达式中的管道表示“或”，这不是我需要的，因为它拾取任何关键字（我需要连续的行）。我无法将\0放入正则表达式中：（\0\x0\x00）都不起作用。我只是用

“word”短语“

演示了

不适用于

\n

终止行。对于一个真正有效的正则表达式，这可能是一个新问题。@OndrejSotolar:我用了一个适合您的Perl进行了更新。这对于小文件来说非常好，但我在实际文件中遇到了一个“grep:memory expensed”错误：）我相信这个问题是一个新问题的基础。根据一些有效的解决方法，您的内存问题可能是添加

--mmap

选项，或者尝试使用

truncate

line 4
line 5
word ...other text
betweener 1, line 7
betweener 2, line 8
phrase ...yet another text
line 10
line 11