awk打印使用条件匹配的字段，以及两个文件中不匹配的默认值_Awk

awk打印使用条件匹配的字段，以及两个文件中不匹配的默认值
awk
awk打印使用条件匹配的字段，以及两个文件中不匹配的默认值,awk,Awk,试图使用AWK将文件中每行的内容与列表中的$2相匹配。两个文件都以制表符分隔，并且在列表中匹配的名称中可能有空格或特殊字符，例如，在文件中，名称是BRCA1，但在列表中，名称是BRCA1，或者在文件中，名称是BCR但在列表中，名称是BCR/ABL> 如果存在匹配项且列表的$4中包含完整的基因序列，则$2和$1由一个选项卡分隔。如果未找到匹配项，则打印不匹配的名称和14，并用选项卡隔开。下面的awk确实执行，但没有输出结果。谢谢：）文件 BRCA1 BCR SCN1A fbn1 列表 List
试图使用AWK将
文件中每行的内容与列表中的$2
相匹配。两个文件都以制表符分隔，并且在列表中匹配的名称中可能有空格或特殊字符，例如，在文件
中，名称是BRCA1
，但在列表
中，名称是BRCA1
，或者在文件
中，名称是BCR
但在列表
中，名称是BCR/ABL>
如果存在匹配项且列表的$4
中包含完整的基因序列
，则$2和$1由一个选项卡分隔。如果未找到匹配项，则打印不匹配的名称和14
，并用选项卡隔开。下面的awk确实执行，但没有输出结果。谢谢：）
文件
BRCA1
BCR
SCN1A
fbn1

列表
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

awk
awk -F'\t' -v OFS="\t" 'FNR==NR{A[$1]=$0;next} ($2 in A){if($4=="full gene sequence"){print A[$2],$1}} ELSE {print A[$2],"14"}' file list

所需输出
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

编辑
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

结果
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

因为只有这一行有完整的基因测序
，所以只打印这一行
awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file

输入
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

输出
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

输入
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

输出
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

85  fbn1    Fibrillin   full gene sequencing

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

你可以试试
awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
    if(NR>1){
        gsub(" ","",$2)       #removing white space
        n=split($2,v,"/")
        d[v[1]] = $1          #from split, first element as key
    } 
    next
}{print $1, ($1 in d?d[$1]:14)}' list file

你得到
BRCA1   811
BCR 71
SCN1A   14
BRCA1811
BCR 71
SCN1A 14
你可以试试
awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
    if(NR>1){
        gsub(" ","",$2)       #removing white space
        n=split($2,v,"/")
        d[v[1]] = $1          #from split, first element as key
    } 
    next
}{print $1, ($1 in d?d[$1]:14)}' list file

你得到
BRCA1   811
BCR 71
SCN1A   14
BRCA1811
BCR 71
SCN1A 14
定义匹配
：字符串还是regexp？部分还是全部？区分大小写/不区分大小写？如果没有这些信息，您可能会得到一个解决方案，该解决方案适用于某些特定的测试输入集，但6个月后在实际数据上失败。现在，您有两种不同的解决方案，每种解决方案对match
的含义都做出了非常不同的假设，并且每种解决方案在不同的输入集上都会表现出不同的行为，即使给定您提供的示例输入，它们会产生相同的输出。match是一个完整且不区分大小写的字符串。。。。也就是说，BRCA1
是匹配项，但可以是BRCA1
或BRCA1'。另外，我刚刚注意到
$4`或完整基因序列
不包括在内，因为同一匹配可能有多个条目，所以它是唯一的。我在帖子中也加入了一个例子。谢谢：）。文件中的名称将与列表的$2
中的字符串匹配。在列表中
匹配的名称可能是字符串的一部分，但它始终是文件
中的完整名称。这就是名称BCR
与list
，BCR/ABL
中的$2
字符串匹配。谢谢：）。定义匹配
：字符串还是regexp？部分还是全部？区分大小写/不区分大小写？如果没有这些信息，您可能会得到一个解决方案，该解决方案适用于某些特定的测试输入集，但6个月后在实际数据上失败。现在，您有两种不同的解决方案，每种解决方案对match
的含义都做出了非常不同的假设，并且每种解决方案在不同的输入集上都会表现出不同的行为，即使给定您提供的示例输入，它们会产生相同的输出。match是一个完整且不区分大小写的字符串。。。。也就是说，BRCA1
是匹配项，但可以是BRCA1
或BRCA1'。另外，我刚刚注意到
$4`或完整基因序列
不包括在内，因为同一匹配可能有多个条目，所以它是唯一的。我在帖子中也加入了一个例子。谢谢：）。文件中的名称将与列表的$2
中的字符串匹配。在列表中
匹配的名称可能是字符串的一部分，但它始终是文件
中的完整名称。这就是名称BCR
与list
，BCR/ABL
中的$2
字符串匹配。谢谢：）。