Linux 在匹配前两列的基础上使用awk组合文件,然后输出第一个文件的文本和第二个文件的第三列

Linux 在匹配前两列的基础上使用awk组合文件,然后输出第一个文件的文本和第二个文件的第三列,linux,awk,Linux,Awk,我试图通过前两列匹配两个文件。因此,文件1的第1列和第2列应该与文件2的第1列和第2列相同。然后,如果它们匹配,则输出文件1的第1列和第2列,然后输出文件1的第3列和文件2的第3列 以前也有人问过类似的问题,但我花了数小时试图找出这段代码,这肯定反映了我对awk的实际理解不足 我有两个像这样的文件,前两行相同,第三行不同。在一个文件中,这些值重复,而在另一个文件中则不重复。 文件一: TE_00000002DNA/DTC SRR6323060.1 0.04 TE_00000161DN

我试图通过前两列匹配两个文件。因此,文件1的第1列和第2列应该与文件2的第1列和第2列相同。然后,如果它们匹配,则输出文件1的第1列和第2列,然后输出文件1的第3列和文件2的第3列

以前也有人问过类似的问题,但我花了数小时试图找出这段代码,这肯定反映了我对awk的实际理解不足

我有两个像这样的文件,前两行相同,第三行不同。在一个文件中,这些值重复,而在另一个文件中则不重复。 文件一:

 
 TE_00000002DNA/DTC SRR6323060.1    0.04
TE_00000161DNA/DTC  SRR6323074.1    152.38
文件2:

TE_00000002DNA/DTC  SRR6323074.1    4
TE_00000002DNA/DTC  SRR6323074.1    4
TE_00000002DNA/DTC  SRR6323074.1    4
TE_00000002DNA/DTC  SRR6323074.1    5
TE_00000002DNA/DTC  SRR6323074.1    6.5
TE_00000002DNA/DTC  SRR6323074.1    9
TE_00000161DNA/DTC  SRR6323074.1    24.16666667
TE_00000161DNA/DTC  SRR6323074.1    24.16666667
TE_00000161DNA/DTC  SRR6323074.1    29.2
TE_00000161DNA/DTC  SRR6323074.1    29.2
TE_00000161DNA/DTC  SRR6323074.1    29.2
TE_00000161DNA/DTC  SRR6323074.1    3.081081081
TE_00000161DNA/DTC  SRR6323074.1    3.194444444
TE_00000161DNA/DTC  SRR6323074.1    36.75
TE_00000161DNA/DTC  SRR6323074.1    5.565217391
TE_00000161DNA/DTC  SRR6323074.1    6.55
TE_00000161DNA/DTC  SRR6323074.1    7.882352941
TE_00000161DNA/DTC  SRR6323074.1    74.5
TE_00000161DNA/DTC  SRR6323074.1    9.066666667
TE_00000161DNA/DTC  SRR6323074.1    9.066666667
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA
TE_00000161DNA/DTC  SRR6323074.1    NA

输出应如下所示,例如:

TE_00000002DNA/DTC  SRR6323074.1    4 0.04
TE_00000002DNA/DTC  SRR6323074.1    4 0.04
TE_00000002DNA/DTC  SRR6323074.1    4 0.04
TE_00000002DNA/DTC  SRR6323074.1    5 0.04
TE_00000002DNA/DTC  SRR6323074.1    6.5 0.04
TE_00000002DNA/DTC  SRR6323074.1    9 0.04
TE_00000161DNA/DTC  SRR6323074.1 24.16666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 24.16666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 3.081081081 152.38
TE_00000161DNA/DTC  SRR6323074.1 3.194444444 152.38
TE_00000161DNA/DTC  SRR6323074.1 36.75 152.38
TE_00000161DNA/DTC  SRR6323074.1 5.565217391 152.38
TE_00000161DNA/DTC  SRR6323074.1 6.55 152.38
TE_00000161DNA/DTC  SRR6323074.1 7.882352941 152.38
TE_00000161DNA/DTC  SRR6323074.1 74.5 152.38
TE_00000161DNA/DTC  SRR6323074.1 9.066666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 9.066666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38

我尝试了以下方法:

awk 'NR==FNR{a[$1 FS $2] = $1 FS $2; next} {ind = $1 FS $2} ind in a {print a[$0], $0}'

awk 'NR==FNR{c[$1$2]++;next};c[$1$2] > 0 {print c[$0],$0}'

awk 'FNR==NR{a[$1]=$1;a[$2]=$2;next}($1==a[$1])&&($2==a[$2]){print a[$0],$0}'

awk 'FNR==NR{a[$1]=$1;b[$2]=$2;next} {print a[$0],$0}'
在这个主题的其他变体中。如果您能提供帮助,我们将不胜感激。

这应该可以完成这项工作(除非您只想让它显示w/
$3==N/A
(您自己的尝试不建议这样做):


如果文件按所示排序,则可以

$ join <(sed -E 's/ +/~/' file2) <(sed -E 's/ +/~/' file1) | sed 's/~/  /'

TE_00000161DNA/DTC  SRR6323074.1 24.16666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 24.16666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 3.081081081 152.38
TE_00000161DNA/DTC  SRR6323074.1 3.194444444 152.38
TE_00000161DNA/DTC  SRR6323074.1 36.75 152.38
TE_00000161DNA/DTC  SRR6323074.1 5.565217391 152.38
TE_00000161DNA/DTC  SRR6323074.1 6.55 152.38
TE_00000161DNA/DTC  SRR6323074.1 7.882352941 152.38
TE_00000161DNA/DTC  SRR6323074.1 74.5 152.38
TE_00000161DNA/DTC  SRR6323074.1 9.066666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 9.066666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38

$join底层逻辑是什么?为什么您只是从文件2中选取
NA
值,而不是数字值,为什么只选取了5次?这只是输出的一个简短示例,我必须手动进行,所以我不会对所有的值都这样做。如果是,请提供一个一致的输入,记录更少。目前没有没有任何关于标准的线索。@user2472414如果您发布的预期输出不是您发布的示例输入的预期输出,那么我们无法测试潜在的解决方案,看看它是否有效。请您的问题提供简明、可测试的示例输入和您在给定该输入时预期的输出,以便我们可以帮助您。提示:有任何问题吗我们需要一个滚动条来查看您问题中的任何内容的时间,然后是该内容(输入、输出或代码)太大了。请看。好的,我已尝试通过完成输入和输出来改进问题。这与第一列和第二列不匹配,但非常接近!输出为:TE_00000004DNA/Helitron SRR6322978.1 1 1.59 SRR6322966.1 NA TE_00000004DNA/Helitron sr6322978.1.59 sr6322966.1 NA TE_00000004DNA/Helitron sr6322978.1.59 sr6322966.1NAPlease请澄清不匹配的内容。'/Heliton'来自何处?注释不适合格式化数据。请使用示例输入和匹配输出更新您的问题。否则这是尝试/错误,不是很有成效。我正在尝试通过两个文件的前两列进行匹配。因此,文件1中的第1列和文件1中的第2列需要匹配c文件2中的第1列和文件2中的第2列。这只匹配它将出现的第1列。因为SRR6322978.1应该匹配相同的值,而不是SRR6322966.1在我的示例中,使用重复为文件2的片段驱动示例片段,它完全符合您的要求。如果您的文件与示例数据不同,请输入问题中的正确数据。你是对的,我尝试了你的代码,文件名颠倒了。它工作得很好。谢谢。
$ join <(sed -E 's/ +/~/' file2) <(sed -E 's/ +/~/' file1) | sed 's/~/  /'

TE_00000161DNA/DTC  SRR6323074.1 24.16666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 24.16666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 29.2 152.38
TE_00000161DNA/DTC  SRR6323074.1 3.081081081 152.38
TE_00000161DNA/DTC  SRR6323074.1 3.194444444 152.38
TE_00000161DNA/DTC  SRR6323074.1 36.75 152.38
TE_00000161DNA/DTC  SRR6323074.1 5.565217391 152.38
TE_00000161DNA/DTC  SRR6323074.1 6.55 152.38
TE_00000161DNA/DTC  SRR6323074.1 7.882352941 152.38
TE_00000161DNA/DTC  SRR6323074.1 74.5 152.38
TE_00000161DNA/DTC  SRR6323074.1 9.066666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 9.066666667 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38
TE_00000161DNA/DTC  SRR6323074.1 NA 152.38