用于根据相同的多个字段对字段进行排序的AWK

用于根据相同的多个字段对字段进行排序的AWK,awk,Awk,我有一个如下的文件: scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 S

我有一个如下的文件:

scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 1.025 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
我想打印第6个字段中值最高的行,而所有其他字段都是唯一的

期望输出:

scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD

在awk中有没有一种聪明的方法可以做到这一点?

如果您不希望字段在所需的输出上排列有序

awk '{if(uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] < $6) { uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] = $6} }END{for(i in uniqueSet){print i" "uniqueSet[i]} }' <input_file_name>
如果要保持字段的顺序

awk '{if(uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] < $6) { uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] = $6} }END{for(i in uniqueSet){ split(i, ar, " "); print ar[1]" "ar[2]" "ar[3]" "ar[4]" "ar[5]" "uniqueSet[i]" "ar[6]} }' <input_file_name>
在GNU awk中:

$ gawk ' {
    t=$6                                 # put $6 to temp
    $6="MARK"                            # replace it with a marker, use $0 as key
    if($0 in v==0 || t>v[$0]) {          # if $0 not in value hash or t>previous value
        a[$0]=NR                         # in a goes the record number for ordering
        v[$0]=t
    }
}
END {                                    # in the end
    PROCINFO["sorted_in"]="@val_num_asc" # traverse a in growing order of NRs stored
    for(i in a) {
        sub(/MARK/,v[i],i)               # replace mark with value
        print i                          # and output
    }
}' file
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD

明智的方法是使用sort+awk:

$ sort -k6,6nr file | awk '!seen[$1,$2,$3,$4,$5,$7]++'
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
但如果您只想使用awk,您可以:

$ awk '
    { orig=$0; $6=""; key=$0; $0=orig }
    NR==FNR{ if ( !(key in max) || $6 > max[key] ) { max[key]=$6; nr[key]=NR } next }
    nr[key]==FNR
' file file
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD

GNU datamash+切割工具的简短替代方案:


awesome:你介意对此也做一个解释吗?在每一行上迭代脚本将把行的内容放到一个映射中,键是需要分组的行的字段,在本例中,除了字段6之外的所有字段。该键的值是该组中第6行字段的最大值。迭代完成后,它只打印映射的[key-value]对。请注意,只有当所比较的值都为正值时,这才有效。我相信这是基于关联数组概念的。还有一些关于负数的线索吗?你可以通过添加空uniqueSet的测试使其成为负值友好型,不是吗?嗨,你能解释一下你的第一个答案吗?不客气,看看下一步怎么做。
$ sort -k6,6nr file | awk '!seen[$1,$2,$3,$4,$5,$7]++'
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
$ awk '
    { orig=$0; $6=""; key=$0; $0=orig }
    NR==FNR{ if ( !(key in max) || $6 > max[key] ) { max[key]=$6; nr[key]=NR } next }
    nr[key]==FNR
' file file
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
datamash -Wf -g1,2,3,4,5,7 max 6 <file | cut -f1-7 --output-delimiter=' '
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD