Unix 通过公共字符串组合来自不同文件的两列_Unix_Awk_Sed_Echo

Unix 通过公共字符串组合来自不同文件的两列

unix awk sed

Unix 通过公共字符串组合来自不同文件的两列,unix,awk,sed,echo,Unix,Awk,Sed,Echo,我有两个以制表符分隔的文件文件-1 节点_1_长度_59711_cov_84.026979_g0_i0_1 K02377 节点_1_长度_59711_cov_84.026979_g0_i0_2 节点\u 2 \u长度\u 39753\u cov\u 84.026979\u g0\u i0\u 1 K02377 节点_2_长度_49771_cov_84.026979_g0_i0_2 K16554 这可能适用于您（GNU join）：注意：有些shell可能不接受$'\t'，在这种情况下，可以使

我有两个以制表符分隔的文件

文件-1

节点_1_长度_59711_cov_84.026979_g0_i0_1 K02377 节点_1_长度_59711_cov_84.026979_g0_i0_2 节点\u 2 \u长度\u 39753\u cov\u 84.026979\u g0\u i0\u 1 K02377 节点_2_长度_49771_cov_84.026979_g0_i0_2 K16554 这可能适用于您（GNU join）：

注意：有些shell可能不接受

$'\t'

，在这种情况下，可以使用文本选项卡，这可以通过teminal cntrl-v tab键序列输入。

使用GNU awk：

awk 'NR==FNR { map[$1]=$2;next } { map1[$1]=$2 } END { PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { print i"\t"map[i]"\t"map1[i] } }' file-1 file2

说明：

awk 'NR==FNR { 
               map[$1]=$2;                                  # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
               next 
             } 
             { 
               map1[$1]=$2                                  # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
             } 
         END { 
               PROCINFO["sorted_in"]="@ind_str_asc";         # Set the index ordering
               for (i in map) { 
                 print i"\t"map[i]"\t"map1[i]                # Loop through the map array and print the values along with the values in map1.
               } 
              }' file-1 file2

谢谢这很顺利。谢谢你的解释。嗨，如果我的初始文件-2有3列（上面命令的输出文件），该怎么办。我想将它与另一个具有两列（两个文件的第一列相同）的文件（file-4）合并。最终输出应该有1个公共列，后跟文件2第2列和第3列的匹配行以及文件4第2列的匹配行。它仅适用于两列中都有字符串的行。因为在我的文件中有一些空白行，这个命令的输出不考虑空白。@ Shail文件FIL1和FIL2中的每一行都应该包含一个选项卡。如果没有，则需要在连接之前修复文件。使用

sed-i'/\t/！s/$/\t/'file1 file2

然后是上面的解决方案。

awk 'NR==FNR { 
               map[$1]=$2;                                  # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
               next 
             } 
             { 
               map1[$1]=$2                                  # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
             } 
         END { 
               PROCINFO["sorted_in"]="@ind_str_asc";         # Set the index ordering
               for (i in map) { 
                 print i"\t"map[i]"\t"map1[i]                # Loop through the map array and print the values along with the values in map1.
               } 
              }' file-1 file2