Unix 通过公共字符串组合来自不同文件的两列
我有两个以制表符分隔的文件 文件-1 节点_1_长度_59711_cov_84.026979_g0_i0_1 K02377 节点_1_长度_59711_cov_84.026979_g0_i0_2 节点\u 2 \u长度\u 39753\u cov\u 84.026979\u g0\u i0\u 1 K02377 节点_2_长度_49771_cov_84.026979_g0_i0_2 K16554 这可能适用于您(GNU join): 注意:有些shell可能不接受Unix 通过公共字符串组合来自不同文件的两列,unix,awk,sed,echo,Unix,Awk,Sed,Echo,我有两个以制表符分隔的文件 文件-1 节点_1_长度_59711_cov_84.026979_g0_i0_1 K02377 节点_1_长度_59711_cov_84.026979_g0_i0_2 节点\u 2 \u长度\u 39753\u cov\u 84.026979\u g0\u i0\u 1 K02377 节点_2_长度_49771_cov_84.026979_g0_i0_2 K16554 这可能适用于您(GNU join): 注意:有些shell可能不接受$'\t',在这种情况下,可以使
$'\t'
,在这种情况下,可以使用文本选项卡,这可以通过teminal cntrl-v tab键序列输入。使用GNU awk:
awk 'NR==FNR { map[$1]=$2;next } { map1[$1]=$2 } END { PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { print i"\t"map[i]"\t"map1[i] } }' file-1 file2
说明:
awk 'NR==FNR {
map[$1]=$2; # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
next
}
{
map1[$1]=$2 # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
}
END {
PROCINFO["sorted_in"]="@ind_str_asc"; # Set the index ordering
for (i in map) {
print i"\t"map[i]"\t"map1[i] # Loop through the map array and print the values along with the values in map1.
}
}' file-1 file2
谢谢这很顺利。谢谢你的解释。嗨,如果我的初始文件-2有3列(上面命令的输出文件),该怎么办。我想将它与另一个具有两列(两个文件的第一列相同)的文件(file-4)合并。最终输出应该有1个公共列,后跟文件2第2列和第3列的匹配行以及文件4第2列的匹配行。它仅适用于两列中都有字符串的行。因为在我的文件中有一些空白行,这个命令的输出不考虑空白。@ Shail文件FIL1和FIL2中的每一行都应该包含一个选项卡。如果没有,则需要在连接之前修复文件。使用
sed-i'/\t/!s/$/\t/'file1 file2
然后是上面的解决方案。
awk 'NR==FNR {
map[$1]=$2; # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
next
}
{
map1[$1]=$2 # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
}
END {
PROCINFO["sorted_in"]="@ind_str_asc"; # Set the index ordering
for (i in map) {
print i"\t"map[i]"\t"map1[i] # Loop through the map array and print the values along with the values in map1.
}
}' file-1 file2