使用sed或awk删除文件每个段落中的重复行_Awk_Sed

使用sed或awk删除文件每个段落中的重复行

awk sed

使用sed或awk删除文件每个段落中的重复行,awk,sed,Awk,Sed,我想删除文件中以“SET CURRENT”开头的参数图中的重复行，这些重复行共享相同的第一行并具有相同的句子，并且我不删除属于不同段落的重复行，例如：如果我有以下文件： SET CURRENT = 'aaa' ; CREATE SYN file1 FOR 1000.file1 ; CREATE SYN file2 FOR 1000.file2 ; CREATE SYN file3 FOR 1001.file3 ; CREATE SYN file3 FOR 1001.file3 ; SET C

我想删除文件中以“SET CURRENT”开头的参数图中的重复行，这些重复行共享相同的第一行并具有相同的句子，并且我不删除属于不同段落的重复行，例如：

如果我有以下文件：

SET CURRENT = 'aaa' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file2 FOR 1000.file2 ;
CREATE SYN file3 FOR 1001.file3 ;
CREATE SYN file3 FOR 1001.file3 ;

SET CURRENT = 'aaa' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file2 FOR 1000.file2 ;
CREATE SYN file7 FOR 1000.file7 ;

SET CURRENT = 'bbb' ;
CREATE SYN file5 FOR 1002.file5 ;
CREATE SYN file6 FOR 1003.file6 ;

SET CURRENT = 'bbb' ;  
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file8 FOR 1002.file8 ;
CREATE SYN file6 FOR 1003.file6 ;

结果是

SET CURRENT = 'aaa' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file2 FOR 1000.file2 ;
CREATE SYN file3 FOR 1001.file3 ;

SET CURRENT = 'aaa' ;
CREATE SYN file7 FOR 1000.file7 ;

SET CURRENT = 'bbb' ;
CREATE SYN file5 FOR 1002.file5 ;
CREATE SYN file6 FOR 1003.file6 ;

SET CURRENT = 'bbb' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file8 FOR 1002.file8 ;

使用awk，您可以执行以下操作：

awk 'NF==0{print;next};/^SET CURRENT/{c=$4;print;next}!seen[c,$0]++' file

添加一些注释以使其更具可读性：

awk ' NF == 0 {       # If we find an empty line
          print       # print the line
          next        # and skip to the next record
      }
      /^SET CURRENT/{ # If we find a line beginning wiith "SET CURRENT"
          c = $4      # Store the value in the 4th field
          print       # Print the current line
          next        # and skip to the next record  
      }
      !seen[c,$0]++  # Print if the combination of the "c" value
                      # and the current line has not been stored 
                      # in array "seen", and then store the
                      # combination in the array
                      # (in order to prevent other lines to be printed)
      ' file

！参见[c，$0]+

的工作原理是这样的：当我们在数组索引中使用逗号时，两个标记组合成一个字符串，由

子集

字符连接。在本例中，我们使用c字符和当前行（$0）的组合作为索引，因为这是过滤后需要唯一的。使用

！参见[c，$0]

我们检查组合是否作为数组的索引存在。如果索引不存在，则表达式的计算结果为true，这将导致打印的行。如果存在索引，则表达式的计算结果为false，并且不打印该行。使用post fix增量运算符，我们计算索引的出现次数，以便仅在第一次出现时打印该行，而不打印后续匹配项。

示例输入中的第五行缺少分号（；）。这是故意的吗？不是故意的我编辑了输入文件你能解释一下吗？谢谢@Plouff：当然，在代码中添加了一些注释。如果您还不清楚，请告诉我。我了解前两种模式动作。但是

！所见[c，$0]+

仍有点不清楚。我不知道如何将它与

seen

中的字符串进行比较，在

seen

中添加一个新字符串（

c，$0

），如果不在

seen

中，最后使用命令中的那几个字符进行打印！！谢谢你提供的详细信息（顺便说一句：）@扑通一声：

！seen[$0]+

是防止打印重复行的常用习惯用法。在本例中，我们不使用$0，而是使用

c，$0

，因为我们希望行仅在同一块中是唯一的。好的！所以你必须知道，就这样。再次感谢：）。这里也一样：你能解释一下吗？谢谢：）！！

awk '/^SET/{s=$4; print; next} !a[s,$5]++' file