Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sqlite/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Awk 根据文件2中列的范围从文件1中提取行_Awk_Range - Fatal编程技术网

Awk 根据文件2中列的范围从文件1中提取行

Awk 根据文件2中列的范围从文件1中提取行,awk,range,Awk,Range,我有两个文件 文件1: chrom chromStart chromEnd clinSign geneId rcvAcc hgvsCod hgvsProt chr1 930187 930188 VUS SNV SAMD11 RCV001050361 NM_152486.3:c.106G>A NP_689699.2:p.Ala36Thr chr1 939398 939446 Benign delet

我有两个文件

文件1:

chrom   chromStart      chromEnd   clinSign    geneId  rcvAcc  hgvsCod hgvsProt
chr1    930187  930188      VUS  SNV      SAMD11   RCV001050361    NM_152486.3:c.106G>A    NP_689699.2:p.Ala36Thr
 chr1    939398  939446     Benign  deletion       SAMD11   RCV000948524    NM_152486.2:c.683_706+24delCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC
文件2:

CHROM   POS  REF  ALT  FILTER     GT     BD    
chr1    1609489 AAC     A       PASS    0/1     FP
chr1    930188 T       G       LowGQ  0/1     FP
chr1    939400 TGC     T       PASS    0/1     FP
我试图根据文件1(CHROM:chromStart:ChromEnd)中前三列的范围,根据CHROM:POS(第一列和第二列)查询文件2,然后得到一个输出

chrom   chromStart      chromEnd     clinSign         geneId  rcvAcc  hgvsCod hgvsProt  CHROM        POS  REF  ALT  FILTER      GT     BD     
chr1    930187  930188         VUS  SNV     SAMD11   RCV001050361    NM_152486.3:c.106G>A    NP_689699.2:p.Ala36Thr  chr1    930188 T       G       LowGQ  0/1     FP
chr1    939398  939446     Benign  deletion        SAMD11   RCV000948524    NM_152486.2:c.683_706+24delCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC  chr1    939400 TGC     T       PASS    0/1     FP
到目前为止我已经试过了

awk '
NR==FNR{ start[$1] = $2; end[$1] = $3; next }
(FNR==1) || ( ($1 in start) && ($2 >= start[$1]) && ($2 <= end[$1]) )
' file1 file2> test.txt 
awk'
NR==FNR{start[$1]=$2;end[$1]=$3;next}
(FNR==1)| |($1开始)和&($2>=开始[$1])&($2$2>低[$1]&&&$2<高[$1]{print}文件1 file2>test.txt
但两者都会产生一个空文件作为输出

谢谢你的建议

> cat test.awk    

FNR==NR {
    if (NR==1) { title = $0; next }
    positions[$1] = positions[$1]" "$2
    r[$1,$2] = $0
    next
}

FNR==1 { print $0, title; next }

{
    split(positions[$1], p)
    for (i in p)
        if (p[i]>=$2 && p[i]<=$3) {
            print $0, r[$1,p[i]]
            next
        }
}
第一步,我们将位置存储在伪2d数组中,这实际上是一个带有空格分隔的数字的字符串,索引是第一个字段。第二步,我们将其拆分为数字,并按行进行检查


编辑:感谢埃德·莫顿的帮助修复它

在这里,我们以与Ed的答案相反的顺序解析文件,但是最后一个
next
语句可以被删除,以防我们可以对同一个“色度”进行多个匹配,并且我们想要打印所有这些匹配

$ cat tst.awk
NR == 1 { hdr = $0 }
NR == FNR {
    c = ++cnt[$1]
    begs[$1,c] = $2
    ends[$1,c] = $3
    vals[$1,c] = $0
    next
}
FNR == 1 {
    print hdr, $0
    next
}
{
    for (c=1; c<=cnt[$1]; c++) {
        beg = begs[$1,c]
        end = ends[$1,c]
        if ( (beg <= $2) && ($2 <= end) ) {
            print vals[$1,c], $0
            next
        }
    }
}

如果给定色度可以有多个范围匹配,那么只需删除最后的
next
语句-如果始终只有1个匹配项,则该语句仅用于提高效率。

一个数组中只能有一个特定键的条目。因此,文件1的第二行将从第一行覆盖开始和结束数组。您需要为每个
色度值设置多个范围,并测试所有范围。
> awk -f test.awk file2 file1

chrom   chromStart      chromEnd   clinSign    geneId  rcvAcc  hgvsCod hgvsProt CHROM   POS  REF  ALT  FILTER     GT     BD    
chr1    930187  930188      VUS  SNV      SAMD11   RCV001050361    NM_152486.3:c.106G>A    NP_689699.2:p.Ala36Thr chr1    930188 T       G       LowGQ  0/1     FP
 chr1    939398  939446     Benign  deletion       SAMD11   RCV000948524    NM_152486.2:c.683_706+24delCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC chr1    939400 TGC     T       PASS    0/1     FP
$ cat tst.awk
NR == 1 { hdr = $0 }
NR == FNR {
    c = ++cnt[$1]
    begs[$1,c] = $2
    ends[$1,c] = $3
    vals[$1,c] = $0
    next
}
FNR == 1 {
    print hdr, $0
    next
}
{
    for (c=1; c<=cnt[$1]; c++) {
        beg = begs[$1,c]
        end = ends[$1,c]
        if ( (beg <= $2) && ($2 <= end) ) {
            print vals[$1,c], $0
            next
        }
    }
}
$ awk -f tst.awk file1 file2
chrom   chromStart      chromEnd   clinSign    geneId  rcvAcc  hgvsCod hgvsProt CHROM   POS  REF  ALT  FILTER     GT     BD
chr1    930187  930188      VUS  SNV      SAMD11   RCV001050361    NM_152486.3:c.106G>A    NP_689699.2:p.Ala36Thr chr1    930188 T       G       LowGQ  0/1     FP
 chr1    939398  939446     Benign  deletion       SAMD11   RCV000948524    NM_152486.2:c.683_706+24delCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC chr1    939400 TGC     T       PASS    0/1     FP