使用符号正确匹配Awk中的列？_Awk_Multiple Columns

使用符号正确匹配Awk中的列？

awk

使用符号正确匹配Awk中的列？,awk,multiple-columns,Awk,Multiple Columns,我有两个独立的文件，Input_File1和Input_File2，每个文件都包含不同数量的列，我根据多列中的数据合并了这些列（）到目前为止，已向Input_File1添加了一列，以根据Input_File1的第1、2和3列以及Input_File2的第1、2和3列中的数据匹配创建新文件（file3）。总的来说，这很有效。但是，在某些情况下，Input_File1和Input_File2中第1、2和3列中的数据相同，但file3中的输出应该不同。这是基于Input_File1和Input_Fi

我有两个独立的文件，Input_File1和Input_File2，每个文件都包含不同数量的列，我根据多列中的数据合并了这些列（）

到目前为止，已向Input_File1添加了一列，以根据Input_File1的第1、2和3列以及Input_File2的第1、2和3列中的数据匹配创建新文件（file3）。总的来说，这很有效。但是，在某些情况下，Input_File1和Input_File2中第1、2和3列中的数据相同，但file3中的输出应该不同。这是基于Input_File1和Input_File2中的另一个特性，即存在“-”或“+”

输入文件1

VMNF01000007.1  6294425 6294650 .   .   +   Focub_B2_mimp_2
VMNF01000008.1  1441418 1441616 .   .   -   Focub_II5_mimp_3
VMNF01000008.1  1441418 1441616 .   .   -   Focub_B2_mimp_1
VMNF01000008.1  1441418 1441616 .   .   +   Focub_B2_mimp_2

输入文件2

VMNF01000007.1  6294425-6294650(+)  tacagtggggggcaataagtatgaataccctttggtgtactgacacacacctctt
VMNF01000008.1  1441418-1441616(-)  gggagtgtattgttttttctgccgctagcccattttaacatttagagtgtgcata
VMNF01000008.1  1441418-1441616(-)  gggagtgtattgttttttctgccgctagcccattttaacatttagagtgtgcata
VMNF01000008.1  1441418-1441616(+)  tacagtggggggcaataagtatgaataccctttgatgtactgacacacacctctt

如您所见，输入文件2的最后两行中的数据除了（-）和（+）之外是相同的，因此，下面的顺序不同

生成文件3时，第8列中的序列与输入文件2中的序列没有差异。这是因为在匹配列时只考虑数据

VMNF01000008.1 1441418 1441616

当前文件3（注意序列和+或-最后两行）：

文件3实际上应该如下所示（注意序列和+或-最后两行）：

其中，与输入_文件2中一样，当存在“-”或“+”时，序列不同

因此，它的操作方式与前面的代码基本相同，只是在Input_File1和Input_File2中添加了匹配的“-”或“+”，以确保后面的顺序是正确的。如何使用“-”或“+”来确定应该在第8列中添加到前面代码中的顺序

这是我正在使用的代码（）：

有什么建议吗？谢谢

请尝试以下内容

awk '
FNR==NR{
  split($2,array,"[-(]")
  key=$1 OFS array[1] OFS array[2]
  ++count1[key]
  mainarray[key OFS count1[key]]=$NF
  next
}
{
  key=$1 OFS $2 OFS $3
  ++count2[key]
}
((key OFS count2[key]) in mainarray){
  print $0,mainarray[key OFS count2[key]]
}
'  Input_file2  Input_file1

输出如下

VMNF01000007.1  6294425 6294650 .   .   +   Focub_B2_mimp_2 tacagtggggggcaataagtatgaataccctttggtgtactgacacacacctctt
VMNF01000008.1  1441418 1441616 .   .   -   Focub_II5_mimp_3 gggagtgtattgttttttctgccgctagcccattttaacatttagagtgtgcata
VMNF01000008.1  1441418 1441616 .   .   -   Focub_B2_mimp_1 gggagtgtattgttttttctgccgctagcccattttaacatttagagtgtgcata
VMNF01000008.1  1441418 1441616 .   .   +   Focub_B2_mimp_2 tacagtggggggcaataagtatgaataccctttgatgtactgacacacacctctt

说明：添加上述内容的详细说明

awk '                                          ##Starting awk program from here.
FNR==NR{                                       ##Checking condition FNR==NR which will be TRUE when file2 is being read.
  split($2,array,"[-(]")                       ##Splitting 2nd field into array named array with separator -( in it.
  key=$1 OFS array[1] OFS array[2]             ##Creating variable key whose value is $1 array 1st element and array 2nd element.
  ++count1[key]                                ##Creating array count1 with index key and keep increasing its value with 1 here.
  mainarray[key OFS count1[key]]=$NF           ##Creating array mainarray with index key OFS count1[key] value and its value is last column value.
  next                                         ##next will skip all further statements from here.
}
{
  key=$1 OFS $2 OFS $3                         ##Creating variable key with value of first, second and third field values.
  ++count2[key]                                ##Creating array count2 with index key and keepincreasing value with 1 here.
}
((key OFS count2[key]) in mainarray){          ##Checking condition if key OFS count2[key] is present in mainarray
  print $0,mainarray[key OFS count2[key]]      ##Printing current line and value of mainarray whose index is key OFS and value of count2  whose index is key.
}
'  Input_file2  Input_file1                    ##Mentioning Input_file names here.

awk '
FNR==NR{
  split($2,array,"[-(]")
  key=$1 OFS array[1] OFS array[2]
  ++count1[key]
  mainarray[key OFS count1[key]]=$NF
  next
}
{
  key=$1 OFS $2 OFS $3
  ++count2[key]
}
((key OFS count2[key]) in mainarray){
  print $0,mainarray[key OFS count2[key]]
}
'  Input_file2  Input_file1

VMNF01000007.1  6294425 6294650 .   .   +   Focub_B2_mimp_2 tacagtggggggcaataagtatgaataccctttggtgtactgacacacacctctt
VMNF01000008.1  1441418 1441616 .   .   -   Focub_II5_mimp_3 gggagtgtattgttttttctgccgctagcccattttaacatttagagtgtgcata
VMNF01000008.1  1441418 1441616 .   .   -   Focub_B2_mimp_1 gggagtgtattgttttttctgccgctagcccattttaacatttagagtgtgcata
VMNF01000008.1  1441418 1441616 .   .   +   Focub_B2_mimp_2 tacagtggggggcaataagtatgaataccctttgatgtactgacacacacctctt

awk '                                          ##Starting awk program from here.
FNR==NR{                                       ##Checking condition FNR==NR which will be TRUE when file2 is being read.
  split($2,array,"[-(]")                       ##Splitting 2nd field into array named array with separator -( in it.
  key=$1 OFS array[1] OFS array[2]             ##Creating variable key whose value is $1 array 1st element and array 2nd element.
  ++count1[key]                                ##Creating array count1 with index key and keep increasing its value with 1 here.
  mainarray[key OFS count1[key]]=$NF           ##Creating array mainarray with index key OFS count1[key] value and its value is last column value.
  next                                         ##next will skip all further statements from here.
}
{
  key=$1 OFS $2 OFS $3                         ##Creating variable key with value of first, second and third field values.
  ++count2[key]                                ##Creating array count2 with index key and keepincreasing value with 1 here.
}
((key OFS count2[key]) in mainarray){          ##Checking condition if key OFS count2[key] is present in mainarray
  print $0,mainarray[key OFS count2[key]]      ##Printing current line and value of mainarray whose index is key OFS and value of count2  whose index is key.
}
'  Input_file2  Input_file1                    ##Mentioning Input_file names here.