Unix 如果行在其他方面相同，则在一个字段中保留值较高的行_Unix_Awk

Unix 如果行在其他方面相同，则在一个字段中保留值较高的行

unix awk

Unix 如果行在其他方面相同，则在一个字段中保留值较高的行,unix,awk,Unix,Awk,我有一个文件如下所示： cat f1.csv： col1,col2,col3 AK136742,BC051226,996 AK161599,Gm15417,4490 AK161599,Gm15417,6915 AK161599,Zbtb7b,1339 AK161599,Zbtb7b,1475 AK161599,Zbtb7b,1514 我想做的是，如果col3上的行数较大，则保留其中一个重复的行。因此，如果col1和col2相同，那么如果col3上的数字较大，则保留该行因此，所需的输出应为：

我有一个文件如下所示：

cat f1.csv：

col1,col2,col3
AK136742,BC051226,996
AK161599,Gm15417,4490
AK161599,Gm15417,6915
AK161599,Zbtb7b,1339
AK161599,Zbtb7b,1475
AK161599,Zbtb7b,1514

我想做的是，如果col3上的行数较大，则保留其中一个重复的行。因此，如果col1和col2相同，那么如果col3上的数字较大，则保留该行

因此，所需的输出应为：

col1,col2,col3
AK136742,BC051226,996
AK161599,Gm15417,6915
AK161599,Zbtb7b,1514

我使用了下面的命令，但无法解决问题：

cat f1.csv | sort -rnk3 | awk '!x[$3]++'

感谢您的帮助-谢谢

对于您展示的样本，请尝试以下内容

awk '
BEGIN{
  FS=OFS=","
}
{ ind = $1 FS $2 }
FNR==1{
  print
  next
}
{
  arr[ind]=(arr[ind]>$NF?arr[ind]:$NF)
}
END{
  for(i in arr){
     print i,arr[i]
  }
}
' Input_file

说明：添加上述内容的详细说明

awk '                        ##Starting awk program from here.
BEGIN{                       ##Starting BEGIN section of this program from here.
  FS=OFS=","                 ##Setting FS, OFS as comma here.
}
{ ind = $1 FS $2 }           ##Setting ind as 1st and 2nd field value here.
FNR==1{                      ##Checking if its first line.
  print                      ##Then print it.
  next                       ##next will skip all further statements from here.
}
{
  arr[ind]=(arr[ind]>$NF?arr[ind]:$NF)  ##Creating arr with index of ind and keeping only higher value after each line comparison of last field.
}
END{                         ##Starting END block of this program from here.
  for(i in arr){             ##Starting a for loop here.
     print i,arr[i]          ##Printing index and array arr value here.
  }
}
' Input_file                 ##Mentioning Input_file name here.

有了您展示的样品，请尝试以下内容

awk '
BEGIN{
  FS=OFS=","
}
{ ind = $1 FS $2 }
FNR==1{
  print
  next
}
{
  arr[ind]=(arr[ind]>$NF?arr[ind]:$NF)
}
END{
  for(i in arr){
     print i,arr[i]
  }
}
' Input_file

说明：添加上述内容的详细说明

awk '                        ##Starting awk program from here.
BEGIN{                       ##Starting BEGIN section of this program from here.
  FS=OFS=","                 ##Setting FS, OFS as comma here.
}
{ ind = $1 FS $2 }           ##Setting ind as 1st and 2nd field value here.
FNR==1{                      ##Checking if its first line.
  print                      ##Then print it.
  next                       ##next will skip all further statements from here.
}
{
  arr[ind]=(arr[ind]>$NF?arr[ind]:$NF)  ##Creating arr with index of ind and keeping only higher value after each line comparison of last field.
}
END{                         ##Starting END block of this program from here.
  for(i in arr){             ##Starting a for loop here.
     print i,arr[i]          ##Printing index and array arr value here.
  }
}
' Input_file                 ##Mentioning Input_file name here.

使用sort，您需要

sort -t, -k3,3nr file.csv | sort -t, -su -k1,2

第一个排序按降序第三列对输入进行数字排序。第二种排序是稳定的（并非所有的排序实现都支持这种排序），并通过前两列对输出进行唯一化，从而为每个组合保留最大值

我忽略了标题行。

使用排序，您需要

sort -t, -k3,3nr file.csv | sort -t, -su -k1,2

$ head -n 1 f1.csv; { tail -n +2 f1.csv | sort -t, -k1,2 -k3rn | awk -F, '!seen[$1,$2]++'; }
col1,col2,col3
AK136742,BC051226,996
AK161599,Gm15417,6915
AK161599,Zbtb7b,1514

我忽略了标题行

$ head -n 1 f1.csv; { tail -n +2 f1.csv | sort -t, -k1,2 -k3rn | awk -F, '!seen[$1,$2]++'; }
col1,col2,col3
AK136742,BC051226,996
AK161599,Gm15417,6915
AK161599,Zbtb7b,1514

或者避免对输入文件命名两次（例如，如果输入是管道，则可以使用）：

我觉得提供的答案有点复杂。这里有一个答案是完全正确的：

#! /usr/bin/awk -f

NR == 1 {
    heading = $0
    next
}

{
    key = $1 "," $2
    if( values[key] < $3 ) {
        values[key] = $3
    }
}

END {
    print heading
    for( k in values ) {
        print k "," values[k] | "sort -t, -k1,2"
    }
}

我觉得提供的答案有点复杂。这里有一个答案是完全正确的：

#! /usr/bin/awk -f

NR == 1 {
    heading = $0
    next
}

{
    key = $1 "," $2
    if( values[key] < $3 ) {
        values[key] = $3
    }
}

END {
    print heading
    for( k in values ) {
        print k "," values[k] | "sort -t, -k1,2"
    }
}

您应该提到需要对

-s

进行GNU排序，并且在对其余部分进行排序之前需要将标头与其余部分分开。您应该提到需要对

-s

进行GNU排序，并且在对其余部分进行排序之前需要将标头与其余部分分开。