Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/neo4j/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
awk打印使用条件匹配的字段,以及两个文件中不匹配的默认值_Awk - Fatal编程技术网

awk打印使用条件匹配的字段,以及两个文件中不匹配的默认值

awk打印使用条件匹配的字段,以及两个文件中不匹配的默认值,awk,Awk,试图使用AWK将文件中每行的内容与列表中的$2相匹配。两个文件都以制表符分隔,并且在列表中匹配的名称中可能有空格或特殊字符,例如,在文件中,名称是BRCA1,但在列表中,名称是BRCA1,或者在文件中,名称是BCR但在列表中,名称是BCR/ABL> 如果存在匹配项且列表的$4中包含完整的基因序列,则$2和$1由一个选项卡分隔。如果未找到匹配项,则打印不匹配的名称和14,并用选项卡隔开。下面的awk确实执行,但没有输出结果。谢谢:) 文件 BRCA1 BCR SCN1A fbn1 列表 List

试图使用AWK将
文件中每行的内容与
列表中的
$2
相匹配。两个文件都以制表符分隔,并且在
列表中匹配的名称中可能有空格或特殊字符,例如,在
文件
中,名称是
BRCA1
,但在
列表
中,名称是
BRCA1
,或者在
文件
中,名称是
BCR
但在
列表
中,名称是
BCR/ABL>

如果存在匹配项且
列表的
$4
中包含
完整的基因序列
,则
$2和$1由一个选项卡分隔。如果未找到匹配项,则打印不匹配的名称和
14
,并用选项卡隔开。下面的awk确实执行,但没有输出结果。谢谢:)

文件

BRCA1
BCR
SCN1A
fbn1
列表

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
awk

awk -F'\t' -v OFS="\t" 'FNR==NR{A[$1]=$0;next} ($2 in A){if($4=="full gene sequence"){print A[$2],$1}} ELSE {print A[$2],"14"}' file list
所需输出

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
编辑

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
结果

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
因为只有这一行有
完整的基因测序
,所以只打印这一行

awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
输入

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
输出

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
输入

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
输出

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence
BRCA1   811
BCR 71
SCN1A   14
fbn1     85
List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup
85  fbn1    Fibrillin   full gene sequencing
$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14
你可以试试

awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
    if(NR>1){
        gsub(" ","",$2)       #removing white space
        n=split($2,v,"/")
        d[v[1]] = $1          #from split, first element as key
    } 
    next
}{print $1, ($1 in d?d[$1]:14)}' list file
你得到

BRCA1 811 BCR 71 SCN1A 14 BRCA1811 BCR 71 SCN1A 14 你可以试试

awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
    if(NR>1){
        gsub(" ","",$2)       #removing white space
        n=split($2,v,"/")
        d[v[1]] = $1          #from split, first element as key
    } 
    next
}{print $1, ($1 in d?d[$1]:14)}' list file
你得到

BRCA1 811 BCR 71 SCN1A 14 BRCA1811 BCR 71 SCN1A 14
定义
匹配
:字符串还是regexp?部分还是全部?区分大小写/不区分大小写?如果没有这些信息,您可能会得到一个解决方案,该解决方案适用于某些特定的测试输入集,但6个月后在实际数据上失败。现在,您有两种不同的解决方案,每种解决方案对
match
的含义都做出了非常不同的假设,并且每种解决方案在不同的输入集上都会表现出不同的行为,即使给定您提供的示例输入,它们会产生相同的输出。match是一个完整且不区分大小写的字符串。。。。也就是说,
BRCA1
是匹配项,但可以是
BRCA1
BRCA1'。另外,我刚刚注意到
$4`或
完整基因序列
不包括在内,因为同一匹配可能有多个条目,所以它是唯一的。我在帖子中也加入了一个例子。谢谢:)。文件
中的名称将与
列表的
$2
中的字符串匹配。在
列表中
匹配的名称可能是字符串的一部分,但它始终是
文件
中的完整名称。这就是名称
BCR
list
BCR/ABL
中的
$2
字符串匹配。谢谢:)。定义
匹配
:字符串还是regexp?部分还是全部?区分大小写/不区分大小写?如果没有这些信息,您可能会得到一个解决方案,该解决方案适用于某些特定的测试输入集,但6个月后在实际数据上失败。现在,您有两种不同的解决方案,每种解决方案对
match
的含义都做出了非常不同的假设,并且每种解决方案在不同的输入集上都会表现出不同的行为,即使给定您提供的示例输入,它们会产生相同的输出。match是一个完整且不区分大小写的字符串。。。。也就是说,
BRCA1
是匹配项,但可以是
BRCA1
BRCA1'。另外,我刚刚注意到
$4`或
完整基因序列
不包括在内,因为同一匹配可能有多个条目,所以它是唯一的。我在帖子中也加入了一个例子。谢谢:)。文件
中的名称将与
列表的
$2
中的字符串匹配。在
列表中
匹配的名称可能是字符串的一部分,但它始终是
文件
中的完整名称。这就是名称
BCR
list
BCR/ABL
中的
$2
字符串匹配。谢谢:)。