awk打印使用条件匹配的字段,以及两个文件中不匹配的默认值
试图使用AWK将awk打印使用条件匹配的字段,以及两个文件中不匹配的默认值,awk,Awk,试图使用AWK将文件中每行的内容与列表中的$2相匹配。两个文件都以制表符分隔,并且在列表中匹配的名称中可能有空格或特殊字符,例如,在文件中,名称是BRCA1,但在列表中,名称是BRCA1,或者在文件中,名称是BCR但在列表中,名称是BCR/ABL> 如果存在匹配项且列表的$4中包含完整的基因序列,则$2和$1由一个选项卡分隔。如果未找到匹配项,则打印不匹配的名称和14,并用选项卡隔开。下面的awk确实执行,但没有输出结果。谢谢:) 文件 BRCA1 BCR SCN1A fbn1 列表 List
文件中每行的内容与列表中的$2
相匹配。两个文件都以制表符分隔,并且在列表中匹配的名称中可能有空格或特殊字符,例如,在文件
中,名称是BRCA1
,但在列表
中,名称是BRCA1
,或者在文件
中,名称是BCR
但在列表
中,名称是BCR/ABL>
如果存在匹配项且列表的$4
中包含完整的基因序列
,则$2和$1由一个选项卡分隔。如果未找到匹配项,则打印不匹配的名称和14
,并用选项卡隔开。下面的awk确实执行,但没有输出结果。谢谢:)
文件
BRCA1
BCR
SCN1A
fbn1
列表
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
awk
awk -F'\t' -v OFS="\t" 'FNR==NR{A[$1]=$0;next} ($2 in A){if($4=="full gene sequence"){print A[$2],$1}} ELSE {print A[$2],"14"}' file list
所需输出
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
编辑
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
结果
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
因为只有这一行有完整的基因测序
,所以只打印这一行
awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
输入
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
输出
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
输入
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
输出
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
85 fbn1 Fibrillin full gene sequencing
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
你可以试试
awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
if(NR>1){
gsub(" ","",$2) #removing white space
n=split($2,v,"/")
d[v[1]] = $1 #from split, first element as key
}
next
}{print $1, ($1 in d?d[$1]:14)}' list file
你得到
BRCA1 811
BCR 71
SCN1A 14
BRCA1811
BCR 71
SCN1A 14
你可以试试
awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
if(NR>1){
gsub(" ","",$2) #removing white space
n=split($2,v,"/")
d[v[1]] = $1 #from split, first element as key
}
next
}{print $1, ($1 in d?d[$1]:14)}' list file
你得到
BRCA1 811
BCR 71
SCN1A 14
BRCA1811
BCR 71
SCN1A 14
定义匹配
:字符串还是regexp?部分还是全部?区分大小写/不区分大小写?如果没有这些信息,您可能会得到一个解决方案,该解决方案适用于某些特定的测试输入集,但6个月后在实际数据上失败。现在,您有两种不同的解决方案,每种解决方案对match
的含义都做出了非常不同的假设,并且每种解决方案在不同的输入集上都会表现出不同的行为,即使给定您提供的示例输入,它们会产生相同的输出。match是一个完整且不区分大小写的字符串。。。。也就是说,BRCA1
是匹配项,但可以是BRCA1
或BRCA1'。另外,我刚刚注意到
$4`或完整基因序列
不包括在内,因为同一匹配可能有多个条目,所以它是唯一的。我在帖子中也加入了一个例子。谢谢:)。文件中的名称将与列表的$2
中的字符串匹配。在列表中
匹配的名称可能是字符串的一部分,但它始终是文件
中的完整名称。这就是名称BCR
与list
,BCR/ABL
中的$2
字符串匹配。谢谢:)。定义匹配
:字符串还是regexp?部分还是全部?区分大小写/不区分大小写?如果没有这些信息,您可能会得到一个解决方案,该解决方案适用于某些特定的测试输入集,但6个月后在实际数据上失败。现在,您有两种不同的解决方案,每种解决方案对match
的含义都做出了非常不同的假设,并且每种解决方案在不同的输入集上都会表现出不同的行为,即使给定您提供的示例输入,它们会产生相同的输出。match是一个完整且不区分大小写的字符串。。。。也就是说,BRCA1
是匹配项,但可以是BRCA1
或BRCA1'。另外,我刚刚注意到
$4`或完整基因序列
不包括在内,因为同一匹配可能有多个条目,所以它是唯一的。我在帖子中也加入了一个例子。谢谢:)。文件中的名称将与列表的$2
中的字符串匹配。在列表中
匹配的名称可能是字符串的一部分,但它始终是文件
中的完整名称。这就是名称BCR
与list
,BCR/ABL
中的$2
字符串匹配。谢谢:)。