Bash 合并两个文件，同时在给定列awk中保留值较大的行_Bash_Shell_Awk_Grep

Bash 合并两个文件，同时在给定列awk中保留值较大的行

bash shell awk grep

Bash 合并两个文件，同时在给定列awk中保留值较大的行,bash,shell,awk,grep,Bash,Shell,Awk,Grep,我有两个以制表符分隔的文件 A 500 50 A 600 30 B 300 100 C 600 40 及我想合并这两个文件，对于第1列和第2列中的匹配行，我想在第3列中保留一个值更大的因此，输出将是： A 500 70 A 600 30 B 300 100 C 600 40 这些是真实值的样本 ==> cut125_beng_jointvcf_varcal_geno6.txt <== scaffold_3015 5910

我有两个以制表符分隔的文件

及

我想合并这两个文件，对于第1列和第2列中的匹配行，我想在第3列中保留一个值更大的

因此，输出将是：

这些是真实值的样本

==> cut125_beng_jointvcf_varcal_geno6.txt <==
scaffold_3015                   5910            44.88210969
scaffold_3015                   5912            67.86783682
scaffold_3015                   5916            79.02675660
scaffold_3015                   5926            18.41190163
scaffold_3015                   5930            42.07625795
scaffold_3015                   5931            52.63549142
scaffold_3015                   5954            37.34609103
scaffold_3015                   5983            47.36974946
scaffold_3015                   5991            41.45881125

==> cut125_wbm_jointvcf_varcal_geno6.txt <==
scaffold_3015                   5910            50.79731830
scaffold_3015                   5916            146.20529658
scaffold_3015                   5926            184.50309487
scaffold_3015                   5930            160.27435340
scaffold_3015                   5931            172.71907060
scaffold_3015                   5954            161.39740159
scaffold_3015                   5968            146.54839149
scaffold_3015                   5983            97.01874773
scaffold_3015                   5991            73.54761456

=>cut125\u beng\u jointvcf\u varcal\u geno6.txt cut125\u wbm\u jointvcf\u varcal\u geno6.txt请尝试以下内容
awk '
FNR==NR{
   a[$1,$2]=$3
   next
}
($1,$2) in a{
   $3=(a[$1,$2]>$3?a[$1,$2]:$3)
   b[$1,$2]
}
1
END{
   for(i in a){
      if(!(i in b)){
        print i,a[i]
      }
   }
}' SUBSEP=" "  Input_file1  Input_file2

这将处理那些在两个输入_文件中也不常见的元素，因此，如果输入_文件1和输入_文件2中不存在元素，则它将打印该元素，反之亦然
解释：也为上述代码添加解释
awk '
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first Input_file name Input_file1 is being read.
   a[$1,$2]=$3                  ##Creating array a whose index is $1,$2 and value is $3 of current line.
   next                         ##next is awk out of box keyword to skip all further statements.
}
($1,$2) in a{                   ##Checking conditoin here if Input_file2 $1,$2 of current line is coming in array a then do following.
   $3=(a[$1,$2]>$3?a[$1,$2]:$3)   ##Re-creating $3(3rd column) of current line where if value of a[$1,$2] is greater than $3 than change it to a[$1,$2] else keep it $3.
   b[$1,$2]                     ##Creating an array named b whose index is $1,$2 by this we are keeping track whichever line common in Input_file1 and Input_file2.
}
1                               ##BY mentioning 1 it will print the current line(edited or non-edited by $3).
END{                            ##Starting END block of current awk code here.
   for(i in a){                 ##Starting for loop to traverse through array a.
      if(!(i in b)){            ##Checking if index i is NOT present in array b means un-common lines which did not get print from Input-file1.
        print i,a[i]            ##Printing index i and array a value a[i] here.
      }
   }
}' SUBSEP=" " Input_file1  Input_file2      ##Mentioning SUBSEP value as space and mentioning Input_file1 and Input_file2 here.



编辑：根据OP，输出行的顺序应与输入文件2和输入文件1相同，然后添加以下解决方案
awk '
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first Input_file name Input_file1 is being read.
   a[$1,$2]=$3                  ##Creating array a whose index is $1,$2 and value is $3 of current line.
   if(!b[$1,$2]++){             ##Checking condition here if $1 and $2 is NOT having any index on array b then do following.
     d[++count]=$1 OFS $2}      ##Creating array named d whose index is increasing variable count with value of $1 OFS $2 in it.
   next                         ##next is awk out of box keyword to skip all further statements.
}
($1,$2) in a{                   ##Checking conditoin here if Input_file2 $1,$2 of current line is coming in array a then do following.
   $3=a[$1,$2]>$3?a[$1,$2]:$3   ##Re-creating $3(3rd column) of current line where if value of a[$1,$2] is greater than $3 than change it to a[$1,$2] else keep it $3.
   c[$1,$2]                     ##Creating an array named b whose index is $1,$2 by this we are keeping track whichever line common in Input_file1 and Input_file2.
}
1                               ##BY mentioning 1 it will print the current line(edited or non-edited by $3).
END{                            ##Starting END block of current awk code here.
   for(i=1;i<=count;i++){       ##Starting for loop to traverse through array a.
      if(!(d[i] in c)){         ##Checking if value of array d whose index is i NOT present in array c means un-common lines which did not get print from Input-file1.
        print d[i],a[d[i]]      ##Printing value of array d whose index is i and array a value a[i] here.
      }
   }
}' SUBSEP=" " FilE1  FilE2      ##Mentioning SUBSEP value as space and mentioning Input_file1 and Input_file2 here.

awk'
FNR==NR{{##检查条件FNR==NR，当读取第一个输入文件名Input#file1时，该条件为真。
a[$1，$2]=$3##创建一个索引为$1，$2，值为当前行$3的数组。
如果（！b[$1，$2]+）{##这里检查条件，如果$1和$2在数组b上没有任何索引，则执行以下操作。
d[++count]=$1 of s$2}##创建名为d的数组，该数组的索引是增加变量计数，其中的值为$1 of s$2。
next##next是awk开箱即用关键字，用于跳过所有进一步的语句。
}
（$1，$2）在{##检查条件中，如果当前行的输入文件2$1，$2出现在数组a中，则执行以下操作。
$3=a[$1，$2]>$3？a[$1，$2]：$3##重新创建当前行的$3（第三列），如果[$1，$2]的值大于$3，则将其更改为[$1，$2]，否则保留$3。
c[$1，$2]##创建一个名为b的数组，其索引为$1，$2。通过此操作，我们将跟踪输入文件1和输入文件2中的哪一行。
}
1##通过提及1，它将打印当前行（按$3编辑或未编辑）。
END{##当前awk代码的起始结束块。
对于（i=1；我回答您的问题是为了说明您试图解决的问题，并告诉我们您在哪里需要帮助才能取得进展。@Ed Morton，先生，您能告诉我我的代码对齐是否得到了改进并且现在看起来很好吗？先生？我将非常感谢您。毫无疑问，该结构现在是所有Algol派生语言中普遍使用的结构，将由任何C美化程序输出，并且非常易于阅读。感谢您修复它并询问！现在，如果我们可以让您将三元表达式括起来……：-）@EdMorton，酷，谢谢你的反馈，为这个括号而节拍。我有时会忘记它们，但我开始使用它们。老实说，你的指导总是有帮助的，你摇滚。顺便说一句，我还没有投票，因为我在阅读他们的问题之前正在等待OP将他们的尝试添加到他们的问题中，所以我不知道你的脚本是否有效，因为我不知道它应该做什么呢！我甚至不打算去想它，直到我在问题中看到一个努力。我实际上已经注意到你在插入三元表方面做得更好-干得好！
awk '
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first Input_file name Input_file1 is being read.
   a[$1,$2]=$3                  ##Creating array a whose index is $1,$2 and value is $3 of current line.
   if(!b[$1,$2]++){             ##Checking condition here if $1 and $2 is NOT having any index on array b then do following.
     d[++count]=$1 OFS $2}      ##Creating array named d whose index is increasing variable count with value of $1 OFS $2 in it.
   next                         ##next is awk out of box keyword to skip all further statements.
}
($1,$2) in a{                   ##Checking conditoin here if Input_file2 $1,$2 of current line is coming in array a then do following.
   $3=a[$1,$2]>$3?a[$1,$2]:$3   ##Re-creating $3(3rd column) of current line where if value of a[$1,$2] is greater than $3 than change it to a[$1,$2] else keep it $3.
   c[$1,$2]                     ##Creating an array named b whose index is $1,$2 by this we are keeping track whichever line common in Input_file1 and Input_file2.
}
1                               ##BY mentioning 1 it will print the current line(edited or non-edited by $3).
END{                            ##Starting END block of current awk code here.
   for(i=1;i<=count;i++){       ##Starting for loop to traverse through array a.
      if(!(d[i] in c)){         ##Checking if value of array d whose index is i NOT present in array c means un-common lines which did not get print from Input-file1.
        print d[i],a[d[i]]      ##Printing value of array d whose index is i and array a value a[i] here.
      }
   }
}' SUBSEP=" " FilE1  FilE2      ##Mentioning SUBSEP value as space and mentioning Input_file1 and Input_file2 here.