Linux 使用shell脚本获取文件中具有相同字段的1行_Linux_Shell_Sed_Awk

Linux 使用shell脚本获取文件中具有相同字段的1行

linux shell sed awk

Linux 使用shell脚本获取文件中具有相同字段的1行,linux,shell,sed,awk,Linux,Shell,Sed,Awk,我有一个文件，内容如下： onelab2.warsaw.rd.tp.pl 5 onelab3.warsaw.rd.tp.pl 5 lefthand.eecs.harvard.edu 7 righthand.eecs.harvard.edu 7 planetlab2.netlab.uky.edu 8 planet1.scs.cs.nyu.edu 9 planetx.scs.cs.nyu.edu 9 所以对于每一行，都有一个数字，我想要每个数字的第一行，所以以上内

我有一个文件，内容如下：

onelab2.warsaw.rd.tp.pl    5
onelab3.warsaw.rd.tp.pl    5
lefthand.eecs.harvard.edu  7
righthand.eecs.harvard.edu 7
planetlab2.netlab.uky.edu  8
planet1.scs.cs.nyu.edu     9
planetx.scs.cs.nyu.edu     9

所以对于每一行，都有一个数字，我想要每个数字的第一行，所以以上内容，我想得到：

onelab2.warsaw.rd.tp.pl    5
lefthand.eecs.harvard.edu  7
planetlab2.netlab.uky.edu  8
planet1.scs.cs.nyu.edu     9

我怎样才能做到这一点？我希望使用带有awk、sed等的shell脚本。

使用

awk

命令：

awk '{if(!a[$2]){a[$2]=1; print}}' file.dat

说明：

{
  # 'a' is a lookup table (array) which will contain all numbers
  # that have been printed so far. It will be initialized as an empty
  # array on its first usage by awk. So you don't have to care about.
  # $2 is the second 'column' in the line -> the number
  if(!a[$2]) 
  {
    # set index in the lookup table. This way the if statement will 
    # fail for the next line with the same number at the end
    a[$2]=1;
    # print the whole current line
    print
  }
}

这可能适合您（GNU排序）：

对

-k2

第二个字段

-n

进行数字排序，保持

-s

原始顺序，并

-u

删除重复项。

使用Sort和uniq：

sort -n -k2 input | uniq -f1

你的努力在哪里？我到了这里，不知所措，使用

比我的麻烦多了；它避免了结束块，并按行在文件中的顺序打印行。谢谢！我一次又一次地惊讶于自己，awk使用短的一行代码可以完成多么强大的任务。：）@JonathanLeffler检查来自（at）potong:）的解决方案如果使用一些隐式规则，可以缩短它。每个块前面都有一个条件，因此可以使用：

！a[$2]{a[$2]=1；print}

或

！a[$2]{a[$2]++；print}

。如果将增量移到外部：

！a[$2]+{print}

{print}

是条件为true时的默认块，因此您可以忽略：

！a[$2]+

。几乎比排序解决方案短：）。清晰比简洁更重要。这是awk，不是APL！呵呵：）这真的可以做得比awk还要短（+期望更快）<代码>排序我没想到+回答得好+1.您可以再短一点，至少使用GNU排序，当使用

-u

时，最后的比较将被隐式禁用。也就是说，

sort-nuk2

相当于

sort-nsuk2

。

sort -n -k2 input | uniq -f1

perl -ane 'print unless $a{$F[1]}++' file