Text 如何使用sed/awk从文件中删除文本块（模式）_Text_Sed_Awk_Pattern Matching

Text 如何使用sed/awk从文件中删除文本块（模式）

text sed awk

Text 如何使用sed/awk从文件中删除文本块（模式）,text,sed,awk,pattern-matching,Text,Sed,Awk,Pattern Matching,我导入了数千个文本文件，其中包含一段要删除的文本它不仅仅是一块文本，而是一种模式  如果出现该块，它将列出一个或多个用户及其电子邮件地址。对于此任务，您需要向前看，这通常由解析器完成另一个解决方案（但不是很有效）是： sed "s/-->/&\

我导入了数千个文本文件，其中包含一段要删除的文本

它不仅仅是一块文本，而是一种模式

<!--
# Translator(s):
#
# username1 <email1>
# username2 <email2>
# usernameN <emailN>
#
-->

如果出现该块，它将列出一个或多个用户及其电子邮件地址。

对于此任务，您需要向前看，这通常由解析器完成

另一个解决方案（但不是很有效）是：

sed "s/-->/&\n/;s/<!--/\n&/" file |  awk 'BEGIN {RS = "";FS = "\n"}/username/{print}'

sed“s/-->/&\n/；s/如果我正确理解了您的问题，以下是我的解决方案。请将以下内容保存到名为remove_blocks.awk的文件中：
# See the beginning of the block, mark it
/<!--/ {
    state = "block_started" 
}

# At the end of the block, if the block does not contain email, print
# out the whole block.
/^-->/ {
    if (!block_contains_user_email) {
        for (i = 0; i < count; i++) {
            print saved_line[i];
        }
        print
    }

    count = 0
    block_contains_user_email = 0
    state = ""
    next
}

# Encounter a block: save the lines and wait until the end of the block
# to decide if we should print it out
state == "block_started" {
    saved_line[count++] = $0
    if (NF>=3 && $3 ~ /@/) {
        block_contains_user_email = 1
    }
    next
}

# For everything else, print the line
1

上面的命令将打印文本文件中的所有内容，不包括包含用户电子邮件的块。
perl-i.orig-00-pe's///gs'file1 file2 file3
perl -i.orig -00 -pe 's/<!--\s+#\s*Translator.*?\s-->//gs' file1 file2 file3

此sed解决方案可能有效：
 sed '/^<!--/,/^-->/{/^<!--/{h;d};H;/^-->/{x;/^<!--\n# Translator(s):\n#\(\n# [^<]*<email[0-9]\+>\)\+\n#\n-->$/!p};d}' file

sed'/^/{/^/{x；/^$/！p}；d}文件

替代方案（或许更好的解决方案？）：
sed'/^/M！ba；/^/d}文件

这将收集以
开头的行，然后在集合上进行模式匹配，即第二行是#Translator（s）：
第三行是#
，第四行可能还有更多的行跟在#username
后面，倒数第二行是#
，最后一行是-->
。如果匹配完成，则整个集合将被删除，否则将正常打印。
我有另一个小awk程序，它以非常快的速度完成任务几行代码。它可以用来从文件中删除文本模式。可以设置Start和stop regexp
# This block is a range pattern and captures all lines between( and including )
# the start '<!--' to the end '-->' and stores the content in record $0. 
# Record $0 contains every line in the range pattern.
# awk -f remove_email.awk yourfile

# The if statement is not needed to accomplish the task, but may be useful.
# It says - if the range patterns in $0 contains a '@' then it will print
# the string "Found an email..." if uncommented.

# command 'next' will discard the content of the current record and search
# for the next record.
# At the same time the awk program begins from the beginning.


/<!--/, /-->/ {
    #if( $0 ~ /@/ ){
        # print "Found an email and removed that!"
    #}
next
}

# This line prints the body of the file to standard output - if not captured in
# the block above.
1 {
    print
}

#此块是一个范围模式，捕获（包括）之间的所有行
#启动“”并将内容存储在记录$0中。
#记录$0包含范围模式中的每一行。
#awk-f删除\u email.awk文件
#if语句不是完成任务所必需的，但可能很有用。
#它说-如果$0中的范围模式包含一个“@”，那么它将打印
#如果未注释，则字符串“发现电子邮件…”。
#命令“下一步”将放弃当前记录的内容并进行搜索
#下一张唱片。
#同时，awk计划从头开始。
// {
#如果（$0~/@/）{
#print“发现一封电子邮件并将其删除！”
#}
下一个
}
#这一行将文件正文打印到标准输出（如果未在中捕获）
#上面的街区。
1 {
打印
}

将代码保存在“remove_email.awk”中，并通过以下方式运行：
awk-f remove_email.awk yourfile
为什么你认为这是低效的？今天它的稀缺资源是程序员时间，而不是计算机效率。一行两条简单易懂的语句在我看来相当有效；解析器解决方案是什么样的；-）？祝你好运！你是对的。这个解决方案将是我的第一次尝试。但是对于我来说ny千个文件这可能不够高效。公平地说，我忽略了对数千个文件的要求。我想说，如果这是对数千个文件的1倍需求，那么您的解决方案仍然足够好，（在for循环中）。如果一天有数千个文件，那么解析器解决方案可能会很有用。@armenzg:运行时对您来说有多重要？P.s.Chris:我对您的答案投了赞成票，但我没有看到1（我确实看到一个黄色箭头）。可能稍后会出现。祝大家好运。-1这将删除任何以#Translator开头的评论块。示例：@Dogbane:为什么会：这不是要求的确切任务吗？你有什么抱怨吗？首先，它应该与“Translator”（译者）匹配：“。其次，你的解决方案没有考虑用户名和他们的电子邮件地址。@Dogbane:那又怎样？问题描述没有说明必须这样做。我按要求做了。“你也不是OP，”多班显然你和我对这些事情的理解不同。它说，如果它在那里，那么它将至少有一个用户。因此，您不必检查它们。你只要检查一下街区，我就是这么做的。如果你想为用户构建一个解析器，那么很好，但是没有一个BNF可以精确地描述内容。这个解决方案对我来说很有效，但是解析起来很困难。你能解释一下这是怎么做的吗？在我看来，这项任务通过awk可以更好地完成，如用户2178077的回答所示。@marbu sed等价物是sed'/^/d'文件啊，好的。这就是说，与sed解决方案相比，调整awk脚本将更加容易。
sed '/^<!--/{:a;N;/^-->/M!ba;/^<!--\n# Translator(s):\n#\(\n# \w\+ <[^>]\+>\)+\n#\n-->/d}' file

# This block is a range pattern and captures all lines between( and including )
# the start '<!--' to the end '-->' and stores the content in record $0. 
# Record $0 contains every line in the range pattern.
# awk -f remove_email.awk yourfile

# The if statement is not needed to accomplish the task, but may be useful.
# It says - if the range patterns in $0 contains a '@' then it will print
# the string "Found an email..." if uncommented.

# command 'next' will discard the content of the current record and search
# for the next record.
# At the same time the awk program begins from the beginning.


/<!--/, /-->/ {
    #if( $0 ~ /@/ ){
        # print "Found an email and removed that!"
    #}
next
}

# This line prints the body of the file to standard output - if not captured in
# the block above.
1 {
    print
}