Bash 删除除最后一行以相同字符串开头之外的所有行_Bash_Shell_Unix_Awk

Bash 删除除最后一行以相同字符串开头之外的所有行

bash shell unix awk

Bash 删除除最后一行以相同字符串开头之外的所有行,bash,shell,unix,awk,Bash,Shell,Unix,Awk,我正在使用awk处理一个文件，将行过滤到感兴趣的特定行。对于生成的输出，我希望能够删除除最后一行以相同字符串开头之外的所有行以下是生成内容的示例： this is a line duplicate remove me duplicate this should go too another unrelated line duplicate but keep me example remove this line example but keep this one more unrelated

我正在使用awk处理一个文件，将行过滤到感兴趣的特定行。对于生成的输出，我希望能够删除除最后一行以相同字符串开头之外的所有行

以下是生成内容的示例：

this is a line
duplicate remove me
duplicate this should go too
another unrelated line
duplicate but keep me
example remove this line
example but keep this one
more unrelated text

第2行和第3行应该删除，因为它们以重复开头，第5行也是如此。因此，应保留第5行，因为它是以

duplicate

开头的最后一行

第6行也是如此，因为它以

示例开始，第7行也是如此。因此，应保留第7行，因为它是以示例开始的最后一行
鉴于上述示例，我希望生成以下输出：
this is a line
another unrelated line
duplicate but keep me
example but keep this one
more unrelated text

我怎样才能做到这一点
我尝试了以下方法，但无法正常工作：
awk -f initialProcessing.awk largeFile | awk '{currentMatch=$1; line=$0; getline; nextMatch=$1; if (currentMatch != nextMatch) {print line}}' - 

为什么不从头到尾阅读文件并打印包含重复的第一行？这样你就不必担心打印的是什么，等等
tac file | awk '/duplicate/ {if (f) next; f=1}1' | tac

这将在第一次看到重复时设置标志f
。从第二次M开始，此标志将跳过该行
如果要使此通用化，使每个第一个单词都在最后一次打印，请使用数组方法：
tac file | awk '!seen[$1]++' | tac

这记录了迄今为止出现的第一个单词。它们存储在数组seen[]
中，因此通过说！看到[$1]+
我们就在$1
第一次出现时实现了它；从第二次开始，它的计算结果为False，并且不打印该行
试验
可以使用（关联）数组始终保留最后一次出现：
awk '{last[$1]=$0;} END{for (i in last) print last[i];}' file

您的示例是Unclarit，它不仅是重复的，而且是任何重复的字符串，请参见示例。@bkmoney噢，谢谢您的评论。通过使它更通用来修复它这正是我想要的。谢谢
awk '{last[$1]=$0;} END{for (i in last) print last[i];}' file