Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/shell/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Bash 合并两个文件,同时在给定列awk中保留值较大的行_Bash_Shell_Awk_Grep - Fatal编程技术网

Bash 合并两个文件,同时在给定列awk中保留值较大的行

Bash 合并两个文件,同时在给定列awk中保留值较大的行,bash,shell,awk,grep,Bash,Shell,Awk,Grep,我有两个以制表符分隔的文件 A 500 50 A 600 30 B 300 100 C 600 40 及 我想合并这两个文件,对于第1列和第2列中的匹配行,我想在第3列中保留一个值更大的 因此,输出将是: A 500 70 A 600 30 B 300 100 C 600 40 这些是真实值的样本 ==> cut125_beng_jointvcf_varcal_geno6.txt <== scaffold_3015 5910

我有两个以制表符分隔的文件

A 500 50
A 600 30
B 300 100
C 600 40

我想合并这两个文件,对于第1列和第2列中的匹配行,我想在第3列中保留一个值更大的

因此,输出将是:

A 500 70
A 600 30
B 300 100
C 600 40
这些是真实值的样本

==> cut125_beng_jointvcf_varcal_geno6.txt <==
scaffold_3015                   5910            44.88210969
scaffold_3015                   5912            67.86783682
scaffold_3015                   5916            79.02675660
scaffold_3015                   5926            18.41190163
scaffold_3015                   5930            42.07625795
scaffold_3015                   5931            52.63549142
scaffold_3015                   5954            37.34609103
scaffold_3015                   5983            47.36974946
scaffold_3015                   5991            41.45881125

==> cut125_wbm_jointvcf_varcal_geno6.txt <==
scaffold_3015                   5910            50.79731830
scaffold_3015                   5916            146.20529658
scaffold_3015                   5926            184.50309487
scaffold_3015                   5930            160.27435340
scaffold_3015                   5931            172.71907060
scaffold_3015                   5954            161.39740159
scaffold_3015                   5968            146.54839149
scaffold_3015                   5983            97.01874773
scaffold_3015                   5991            73.54761456

=>cut125\u beng\u jointvcf\u varcal\u geno6.txt cut125\u wbm\u jointvcf\u varcal\u geno6.txt请尝试以下内容

awk '
FNR==NR{
   a[$1,$2]=$3
   next
}
($1,$2) in a{
   $3=(a[$1,$2]>$3?a[$1,$2]:$3)
   b[$1,$2]
}
1
END{
   for(i in a){
      if(!(i in b)){
        print i,a[i]
      }
   }
}' SUBSEP=" "  Input_file1  Input_file2
这将处理那些在两个输入_文件中也不常见的元素,因此,如果输入_文件1和输入_文件2中不存在元素,则它将打印该元素,反之亦然

解释:也为上述代码添加解释

awk '
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first Input_file name Input_file1 is being read.
   a[$1,$2]=$3                  ##Creating array a whose index is $1,$2 and value is $3 of current line.
   next                         ##next is awk out of box keyword to skip all further statements.
}
($1,$2) in a{                   ##Checking conditoin here if Input_file2 $1,$2 of current line is coming in array a then do following.
   $3=(a[$1,$2]>$3?a[$1,$2]:$3)   ##Re-creating $3(3rd column) of current line where if value of a[$1,$2] is greater than $3 than change it to a[$1,$2] else keep it $3.
   b[$1,$2]                     ##Creating an array named b whose index is $1,$2 by this we are keeping track whichever line common in Input_file1 and Input_file2.
}
1                               ##BY mentioning 1 it will print the current line(edited or non-edited by $3).
END{                            ##Starting END block of current awk code here.
   for(i in a){                 ##Starting for loop to traverse through array a.
      if(!(i in b)){            ##Checking if index i is NOT present in array b means un-common lines which did not get print from Input-file1.
        print i,a[i]            ##Printing index i and array a value a[i] here.
      }
   }
}' SUBSEP=" " Input_file1  Input_file2      ##Mentioning SUBSEP value as space and mentioning Input_file1 and Input_file2 here.


编辑:根据OP,输出行的顺序应与输入文件2和输入文件1相同,然后添加以下解决方案

awk '
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first Input_file name Input_file1 is being read.
   a[$1,$2]=$3                  ##Creating array a whose index is $1,$2 and value is $3 of current line.
   if(!b[$1,$2]++){             ##Checking condition here if $1 and $2 is NOT having any index on array b then do following.
     d[++count]=$1 OFS $2}      ##Creating array named d whose index is increasing variable count with value of $1 OFS $2 in it.
   next                         ##next is awk out of box keyword to skip all further statements.
}
($1,$2) in a{                   ##Checking conditoin here if Input_file2 $1,$2 of current line is coming in array a then do following.
   $3=a[$1,$2]>$3?a[$1,$2]:$3   ##Re-creating $3(3rd column) of current line where if value of a[$1,$2] is greater than $3 than change it to a[$1,$2] else keep it $3.
   c[$1,$2]                     ##Creating an array named b whose index is $1,$2 by this we are keeping track whichever line common in Input_file1 and Input_file2.
}
1                               ##BY mentioning 1 it will print the current line(edited or non-edited by $3).
END{                            ##Starting END block of current awk code here.
   for(i=1;i<=count;i++){       ##Starting for loop to traverse through array a.
      if(!(d[i] in c)){         ##Checking if value of array d whose index is i NOT present in array c means un-common lines which did not get print from Input-file1.
        print d[i],a[d[i]]      ##Printing value of array d whose index is i and array a value a[i] here.
      }
   }
}' SUBSEP=" " FilE1  FilE2      ##Mentioning SUBSEP value as space and mentioning Input_file1 and Input_file2 here.
awk'
FNR==NR{{##检查条件FNR==NR,当读取第一个输入文件名Input#file1时,该条件为真。
a[$1,$2]=$3##创建一个索引为$1,$2,值为当前行$3的数组。
如果(!b[$1,$2]+){##这里检查条件,如果$1和$2在数组b上没有任何索引,则执行以下操作。
d[++count]=$1 of s$2}##创建名为d的数组,该数组的索引是增加变量计数,其中的值为$1 of s$2。
next##next是awk开箱即用关键字,用于跳过所有进一步的语句。
}
($1,$2)在{##检查条件中,如果当前行的输入文件2$1,$2出现在数组a中,则执行以下操作。
$3=a[$1,$2]>$3?a[$1,$2]:$3##重新创建当前行的$3(第三列),如果[$1,$2]的值大于$3,则将其更改为[$1,$2],否则保留$3。
c[$1,$2]##创建一个名为b的数组,其索引为$1,$2。通过此操作,我们将跟踪输入文件1和输入文件2中的哪一行。
}
1##通过提及1,它将打印当前行(按$3编辑或未编辑)。
END{##当前awk代码的起始结束块。

对于(i=1;我回答您的问题是为了说明您试图解决的问题,并告诉我们您在哪里需要帮助才能取得进展。@Ed Morton,先生,您能告诉我我的代码对齐是否得到了改进并且现在看起来很好吗?先生?我将非常感谢您。毫无疑问,该结构现在是所有Algol派生语言中普遍使用的结构,将由任何C美化程序输出,并且非常易于阅读。感谢您修复它并询问!现在,如果我们可以让您将三元表达式括起来……:-)@EdMorton,酷,谢谢你的反馈,为这个括号而节拍。我有时会忘记它们,但我开始使用它们。老实说,你的指导总是有帮助的,你摇滚。顺便说一句,我还没有投票,因为我在阅读他们的问题之前正在等待OP将他们的尝试添加到他们的问题中,所以我不知道你的脚本是否有效,因为我不知道它应该做什么呢!我甚至不打算去想它,直到我在问题中看到一个努力。我实际上已经注意到你在插入三元表方面做得更好-干得好!
awk '
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first Input_file name Input_file1 is being read.
   a[$1,$2]=$3                  ##Creating array a whose index is $1,$2 and value is $3 of current line.
   if(!b[$1,$2]++){             ##Checking condition here if $1 and $2 is NOT having any index on array b then do following.
     d[++count]=$1 OFS $2}      ##Creating array named d whose index is increasing variable count with value of $1 OFS $2 in it.
   next                         ##next is awk out of box keyword to skip all further statements.
}
($1,$2) in a{                   ##Checking conditoin here if Input_file2 $1,$2 of current line is coming in array a then do following.
   $3=a[$1,$2]>$3?a[$1,$2]:$3   ##Re-creating $3(3rd column) of current line where if value of a[$1,$2] is greater than $3 than change it to a[$1,$2] else keep it $3.
   c[$1,$2]                     ##Creating an array named b whose index is $1,$2 by this we are keeping track whichever line common in Input_file1 and Input_file2.
}
1                               ##BY mentioning 1 it will print the current line(edited or non-edited by $3).
END{                            ##Starting END block of current awk code here.
   for(i=1;i<=count;i++){       ##Starting for loop to traverse through array a.
      if(!(d[i] in c)){         ##Checking if value of array d whose index is i NOT present in array c means un-common lines which did not get print from Input-file1.
        print d[i],a[d[i]]      ##Printing value of array d whose index is i and array a value a[i] here.
      }
   }
}' SUBSEP=" " FilE1  FilE2      ##Mentioning SUBSEP value as space and mentioning Input_file1 and Input_file2 here.