Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Bash 如何比较两个不同文件中的两列,以及如何为多个连续列将file2中的列添加到file1中_Bash_Unix_Awk - Fatal编程技术网

Bash 如何比较两个不同文件中的两列,以及如何为多个连续列将file2中的列添加到file1中

Bash 如何比较两个不同文件中的两列,以及如何为多个连续列将file2中的列添加到file1中,bash,unix,awk,Bash,Unix,Awk,对编码和使用awk相对较新,所以如果这是一个愚蠢的问题,我道歉! 我需要将文件1中的$3与文件2中的$3进行比较,如果它们匹配,则打印文件1中的行与文件2中的$10对应的行条目。我有一个命令可以这样做 awk'NR==FNR{a[$3]=10;next}a[$3]{print$0”\t“a[$3]}文件2文件1 但是,file2有$10-647列,我需要对所有637列执行上述操作。有没有办法循环这个 示例文件1: 1 715348 rs3131984 T G 100 PASS

对编码和使用awk相对较新,所以如果这是一个愚蠢的问题,我道歉! 我需要将文件1中的$3与文件2中的$3进行比较,如果它们匹配,则打印文件1中的行与文件2中的$10对应的行条目。我有一个命令可以这样做

awk'NR==FNR{a[$3]=10;next}a[$3]{print$0”\t“a[$3]}文件2文件1

但是,file2有$10-647列,我需要对所有637列执行上述操作。有没有办法循环这个

示例文件1:

 1  715348  rs3131984   T   G   100 PASS    AC=5008;AF=1;AN=5008;NS=2504;DP=16986;EAS_AF=1;AMR_AF=1;AFR_AF=1;EUR_AF=1;SAS_AF=1;AA=.|||;VT=SNP   GT  1|1 1|1 1|1
 1  723798  rs34882115  CAG C   100 PASS    AC=4012;AF=0.801118;AN=5008;NS=2504;DP=24752;EAS_AF=0.7946;AMR_AF=0.8775;AFR_AF=0.5416;EUR_AF=0.9602;SAS_AF=0.9407;VT=INDEL GT  1|1 1|1 1|1
 1  723891  rs2977670   G   C   100 PASS    AC=3906;AF=0.779952;AN=5008;NS=2504;DP=22718;EAS_AF=0.7917;AMR_AF=0.8689;AFR_AF=0.4849;EUR_AF=0.9483;SAS_AF=0.9305;AA=.|||;VT=SNP   GT  1|1 1|1 1|1
 1  729679  rs4951859   C   G   100 PASS    AC=3205;AF=0.639976;AN=5008;NS=2504;DP=18762;EAS_AF=0.6875;AMR_AF=0.7536;AFR_AF=0.2905;EUR_AF=0.841;SAS_AF=0.7761;AA=.|||;VT=SNP    GT  1|0 1|1 1|0
 1  752566  rs3094315   G   A   100 PASS    AC=3597;AF=0.718251;AN=5008;NS=2504;DP=21293;EAS_AF=0.8839;AMR_AF=0.804;AFR_AF=0.3873;EUR_AF=0.84;SAS_AF=0.8088;AA=.|||;VT=SNP  GT  0|1 1|1 0|1
 1  752721  rs3131972   A   G   100 PASS    AC=3272;AF=0.653355;AN=5008;NS=2504;DP=22729;EAS_AF=0.7659;AMR_AF=0.7363;AFR_AF=0.2905;EUR_AF=0.839;SAS_AF=0.7781;AA=.|||;VT=SNP    GT  0|1 1|1 0|1
 1  754182  rs3131969   A   G   100 PASS    AC=3398;AF=0.678514;AN=5008;NS=2504;DP=16315;EAS_AF=0.7331;AMR_AF=0.7565;AFR_AF=0.3525;EUR_AF=0.8718;SAS_AF=0.8088;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
 1  754192  rs3131968   A   G   100 PASS    AC=3398;AF=0.678514;AN=5008;NS=2504;DP=16981;EAS_AF=0.7331;AMR_AF=0.7565;AFR_AF=0.3525;EUR_AF=0.8718;SAS_AF=0.8088;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
 1  754334  rs3131967   T   C   100 PASS    AC=3427;AF=0.684305;AN=5008;NS=2504;DP=21917;EAS_AF=0.7629;AMR_AF=0.755;AFR_AF=0.3525;EUR_AF=0.8718;SAS_AF=0.8088;AA=.|||;VT=SNP    GT  0|1 1|1 0|1
 1  754503  rs3115859   G   A   100 PASS    AC=3325;AF=0.663938;AN=5008;NS=2504;DP=19944;EAS_AF=0.7629;AMR_AF=0.7378;AFR_AF=0.3374;EUR_AF=0.839;SAS_AF=0.771;AA=.|||;VT=SNP GT  0|1 1|1 0|1
 1  754964  rs3131966   C   T   100 PASS    AC=3322;AF=0.663339;AN=5008;NS=2504;DP=19476;EAS_AF=0.7629;AMR_AF=0.7378;AFR_AF=0.3366;EUR_AF=0.837;SAS_AF=0.771;AA=.|||;VT=SNP GT  0|1 1|1 0|1
 1  755887  rs3131964   C   G   100 PASS    AC=4905;AF=0.979433;AN=5008;NS=2504;DP=22796;EAS_AF=1;AMR_AF=0.9914;AFR_AF=0.9304;EUR_AF=0.995;SAS_AF=1;AA=.|||;VT=SNP  GT  1|1 1|1 1|1
 1  755890  rs3115858   A   T   100 PASS    AC=3763;AF=0.751398;AN=5008;NS=2504;DP=23185;EAS_AF=0.8839;AMR_AF=0.8242;AFR_AF=0.4539;EUR_AF=0.8728;SAS_AF=0.8405;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
 1  756604  rs3131962   A   G   100 PASS    AC=3746;AF=0.748003;AN=5008;NS=2504;DP=28270;EAS_AF=0.8829;AMR_AF=0.8242;AFR_AF=0.4501;EUR_AF=0.8698;SAS_AF=0.8323;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
示例文件2:

1   742429  rs3094315   A   G   .   .   .   GT  0/0 0/0
1   1011278 rs3737728   G   A   .   .   .   GT  0/0 0/1
1   1077546 rs9442380   C   T   .   .   .   GT  0/0 0/0
1   1084601 rs4970362   G   A   .   .   .   GT  0/0 0/1
1   1089205 rs9660710   C   A   .   .   .   GT  0/0 0/0
1   1300787 rs2765033   C   T   .   .   .   GT  0/0 0/1
1   756604  rs3131962   A   G   100 PASS    AC=3746;AF=0.748003;AN=5008;NS=2504;DP=28270;EAS_AF=0.8829;AMR_AF=0.8242;AFR_AF=0.4501;EUR_AF=0.8698;SAS_AF=0.8323;AA=.|||;VT=SNP   GT  0|1 1|1
1   1303878 rs2649588   T   C   .   .   .   GT  0/0 0/1
1   1695996 rs6603811   C   T   .   .   .   GT  0/0 0/0
1   1782971 rs10907192  G   A   .   .   .   GT  0/0 0/0
1   1878053 rs3820011   C   A   .   .   .   GT  0/1 0/1
1   1882185 rs2803291   C   T   .   .   .   GT  0/0 0/0
awk是最好的方法吗?我真的不知道如何制作任何类型的循环。非常感谢所有的帮助和解释

我会:

$ column_file1=`awk '{print NF}' file1 | tail -1`
$ paste file1 file2 | awk -v c1=column_file1 '{if($3==$(3+c1)){for(i=1;i<=647;i++)if(i<=c1 || i>c1+10){printf "%s ", $i}; printf "\n"}}'  
然后运行它

$ awk -f <name_of_file> file2 file1
$awk-f文件2文件1
如果你有问题,告诉我

编辑:

在第一个示例中,我忘记了一个
tail-1
。这适用于您提供的示例

试试这个-

File1.txt

    #cat file1.txt
    1 715348  rs3131984   T   G   100 PASS    AC=5008;AF=1;AN=5008;NS=2504;DP=16986;EAS_AF=1;AMR_AF=1;AFR_AF=1;EUR_AF=1;SAS_AF=1;AA=.|||;VT=SNP   GT  1|1 1|1 1|1
    1 723798  rs34882115  CAG C   100 PASS    AC=4012;AF=0.801118;AN=5008;NS=2504;DP=24752;EAS_AF=0.7946;AMR_AF=0.8775;AFR_AF=0.5416;EUR_AF=0.9602;SAS_AF=0.9407;VT=INDEL GT  1|1 1|1 1|1
    1 723891  rs2977670   G   C   100 PASS    AC=3906;AF=0.779952;AN=5008;NS=2504;DP=22718;EAS_AF=0.7917;AMR_AF=0.8689;AFR_AF=0.4849;EUR_AF=0.9483;SAS_AF=0.9305;AA=.|||;VT=SNP   GT  1|1 1|1 1|1
    1 729679  rs4951859   C   G   100 PASS    AC=3205;AF=0.639976;AN=5008;NS=2504;DP=18762;EAS_AF=0.6875;AMR_AF=0.7536;AFR_AF=0.2905;EUR_AF=0.841;SAS_AF=0.7761;AA=.|||;VT=SNP    GT  1|0 1|1 1|0
    1 752566  rs3094315   G   A   100 PASS    AC=3597;AF=0.718251;AN=5008;NS=2504;DP=21293;EAS_AF=0.8839;AMR_AF=0.804;AFR_AF=0.3873;EUR_AF=0.84;SAS_AF=0.8088;AA=.|||;VT=SNP  GT  0|1 1|1 0|1
    1 752721  rs3131972   A   G   100 PASS    AC=3272;AF=0.653355;AN=5008;NS=2504;DP=22729;EAS_AF=0.7659;AMR_AF=0.7363;AFR_AF=0.2905;EUR_AF=0.839;SAS_AF=0.7781;AA=.|||;VT=SNP    GT  0|1 1|1 0|1
    1 754182  rs3131969   A   G   100 PASS    AC=3398;AF=0.678514;AN=5008;NS=2504;DP=16315;EAS_AF=0.7331;AMR_AF=0.7565;AFR_AF=0.3525;EUR_AF=0.8718;SAS_AF=0.8088;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
    1 754192  rs3131968   A   G   100 PASS    AC=3398;AF=0.678514;AN=5008;NS=2504;DP=16981;EAS_AF=0.7331;AMR_AF=0.7565;AFR_AF=0.3525;EUR_AF=0.8718;SAS_AF=0.8088;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
    1 754334  rs3131967   T   C   100 PASS    AC=3427;AF=0.684305;AN=5008;NS=2504;DP=21917;EAS_AF=0.7629;AMR_AF=0.755;AFR_AF=0.3525;EUR_AF=0.8718;SAS_AF=0.8088;AA=.|||;VT=SNP    GT  0|1 1|1 0|1
    1 754503  rs3115859   G   A   100 PASS    AC=3325;AF=0.663938;AN=5008;NS=2504;DP=19944;EAS_AF=0.7629;AMR_AF=0.7378;AFR_AF=0.3374;EUR_AF=0.839;SAS_AF=0.771;AA=.|||;VT=SNP GT  0|1 1|1 0|1
    1 754964  rs3131966   C   T   100 PASS    AC=3322;AF=0.663339;AN=5008;NS=2504;DP=19476;EAS_AF=0.7629;AMR_AF=0.7378;AFR_AF=0.3366;EUR_AF=0.837;SAS_AF=0.771;AA=.|||;VT=SNP GT  0|1 1|1 0|1
    1 755887  rs3131964   C   G   100 PASS    AC=4905;AF=0.979433;AN=5008;NS=2504;DP=22796;EAS_AF=1;AMR_AF=0.9914;AFR_AF=0.9304;EUR_AF=0.995;SAS_AF=1;AA=.|||;VT=SNP  GT  1|1 1|1 1|1
    1 755890  rs3115858   A   T   100 PASS    AC=3763;AF=0.751398;AN=5008;NS=2504;DP=23185;EAS_AF=0.8839;AMR_AF=0.8242;AFR_AF=0.4539;EUR_AF=0.8728;SAS_AF=0.8405;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
    1 756604  rs3131962   A   G   100 PASS    AC=3746;AF=0.748003;AN=5008;NS=2504;DP=28270;EAS_AF=0.8829;AMR_AF=0.8242;AFR_AF=0.4501;EUR_AF=0.8698;SAS_AF=0.8323;AA=.|||;VT=SNP   GT  0|1 1|1 0|1
file2.txt

#cat file2.txt
1   742429  rs3094315   A   G   .   .   .   GT  0/0 0/0
1   1011278 rs3737728   G   A   .   .   .   GT  0/0 0/1
1   1077546 rs9442380   C   T   .   .   .   GT  0/0 0/0
1   1084601 rs4970362   G   A   .   .   .   GT  0/0 0/1
1   1089205 rs9660710   C   A   .   .   .   GT  0/0 0/0
1   1300787 rs2765033   C   T   .   .   .   GT  0/0 0/1
1   756604  rs3131962   A   G   100 PASS    AC=3746;AF=0.748003;AN=5008;NS=2504;DP=28270;EAS_AF=0.8829;AMR_AF=0.8242;AFR_AF=0.4501;EUR_AF=0.8698;SAS_AF=0.8323;AA=.|||;VT=SNP   GT  0|1 1|1
1   1303878 rs2649588   T   C   .   .   .   GT  0/0 0/1
1   1695996 rs6603811   C   T   .   .   .   GT  0/0 0/0
1   1782971 rs10907192  G   A   .   .   .   GT  0/0 0/0
1   1878053 rs3820011   C   A   .   .   .   GT  0/1 0/1
1   1882185 rs2803291   C   T   .   .   .   GT  0/0 0/0
加入-

#awk 'NR==FNR {val[$3]=$10;next;} $3 in val {print $0,val[$3]}' file2.txt file1.txt
1 752566  rs3094315   G   A   100 PASS    AC=3597;AF=0.718251;AN=5008;NS=2504;DP=21293;EAS_AF=0.8839;AMR_AF=0.804;AFR_AF=0.3873;EUR_AF=0.84;SAS_AF=0.8088;AA=.|||;VT=SNP  GT  0|1 1|1 0|1 0/0
1 756604  rs3131962   A   G   100 PASS    AC=3746;AF=0.748003;AN=5008;NS=2504;DP=28270;EAS_AF=0.8829;AMR_AF=0.8242;AFR_AF=0.4501;EUR_AF=0.8698;SAS_AF=0.8323;AA=.|||;VT=SNP   GT  0|1 1|1 0|1 0|1

所以你想为每一行匹配打印637行?每个字段一个?我需要打印637列,基于两个文件中列的行匹配。@Hannah6746576-您检查过这个答案吗?这将是很好的,以获得一些反馈或一些标记为答案…抱歉的延误。我运行了awk-only命令,它已经运行了,但是没有将文件分开。非常感谢您的帮助和耐心!可以说,这些命令应该作为两个连续的命令运行,还是作为一个“脚本”运行?我不断得到“command not found”,我不确定这是否是因为我将它作为一个命令运行……您使用bash吗?可能是打字错误。这是两个命令:第一个命令设置第一个文件中的列数,第二个命令执行联接。。。您提供的文件是真实的示例?我使用的是bash,这些是真实的示例文件。我不断收到“未找到命令”错误。嗯,我会调查的!你能报告整个错误吗?(注意
=
周围的空格会产生错误)。顺便说一下,尝试使用awk脚本。。。这应该更加有力。文件2中的字段不超过12是正常的吗?@forumAdvisor-请您解释一下为什么上面的答案标记为否定,接受答案和我的答案的输出是相同的。
#awk 'NR==FNR {val[$3]=$10;next;} $3 in val {print $0,val[$3]}' file2.txt file1.txt
1 752566  rs3094315   G   A   100 PASS    AC=3597;AF=0.718251;AN=5008;NS=2504;DP=21293;EAS_AF=0.8839;AMR_AF=0.804;AFR_AF=0.3873;EUR_AF=0.84;SAS_AF=0.8088;AA=.|||;VT=SNP  GT  0|1 1|1 0|1 0/0
1 756604  rs3131962   A   G   100 PASS    AC=3746;AF=0.748003;AN=5008;NS=2504;DP=28270;EAS_AF=0.8829;AMR_AF=0.8242;AFR_AF=0.4501;EUR_AF=0.8698;SAS_AF=0.8323;AA=.|||;VT=SNP   GT  0|1 1|1 0|1 0|1