Awk 使用轴列提取线

Awk 使用轴列提取线,awk,Awk,填充 S 235 1365 * 0 * * * 15 1 c81 592 H 235 296 99.7 + 0 0 3I296M1066I 14 1 s15018 1 H 235 719 95.4 + 0 0 174D545M820I 15 1 c2664 10 H 235 764 99.1 + 0 0 55I764M546I 15 1 c6519 4 H

填充

S   235 1365    *   0   *   *   *   15  1   c81 592
H   235 296 99.7    +   0   0   3I296M1066I 14  1   s15018  1
H   235 719 95.4    +   0   0   174D545M820I    15  1   c2664   10
H   235 764 99.1    +   0   0   55I764M546I 15  1   c6519   4
H   235 792 100 +   0   0   180I792M393I    14  1   c407    107
S   236 1365    *   0   *   *   *   15  1   c474    152
H   236 279 95  +   0   0   765I279M321I    10-1    1   s7689   1
H   236 301 99.7    -   0   0   908I301M156I    15  1   s8443   1
H   236 563 95.2    -   0   0   728I563M74I 17  1   c1725   12
H   236 97  97.9    -   0   0   732I97M536I 17  1   s11472  1
S   237 1365    *   0   *   *   *   15  1   c474    152
H   237 279 95  +   0   0   765I279M321I    15    1   s7689   1
S   238 1365    *   0   *   *   *   12  1   c474    152
H   238 279 95  +   0   0   765I279M321I    10-1    1   s7689   1
H   238 301 99.7    -   0   0   908I301M156I    15  1   s8443   1
H   238 563 95.2    -   0   0   728I563M74I 17  1   c1725   12
H   238 97  97.9    -   0   0   732I97M536I 17  1   s11472  1
下面是我想要的文件

示例1通过指定第九列“10-1”、“15”和“17”

示例2指定第九列“14”和“15”

示例3指定第九列“15”

所以我想在第二列中提取一组具有相同值的行。此时,我只需要提取第9列中具有特定值的一组行。在这种情况下,行集合需要具有“所有指定值”

集合238在第九列中具有未指定的“12”。所以我不希望他们被提取出来

这个问题和这个问题很相似。

有许多可能的方法,但我认为最稳健、最容易在以后扩展的方法是创建一个所需值的哈希表(
goodVals[]
),然后只需测试当前的
$9
是否是该表中没有的值:

BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
!($9 in goodVals) { isBadSet=1 }
{ currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet() {
    if ( !isBadSet ) {
        printf "%s", currSet
    }
    currSet = ""
    isBadSet = 0
}
鉴于您评论中的新要求,以下是对该要求的一种可能解释的解决方案:

$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet(    val,allGoodPresent) {
    allGoodPresent = 1
    for (val in goodVals) {
        if ( !(val in seen) ) {
            allGoodPresent = 0
        }
    }
    if ( allGoodPresent ) {
        printf "%s", currSet
    }
    currSet = ""
    delete seen
}

$ awk -f tst.awk file
S   236 1365    *   0   *   *   *   15  1   c474    152
H   236 279 95  +   0   0   765I279M321I    10-1    1   s7689   1
H   236 301 99.7    -   0   0   908I301M156I    15  1   s8443   1
H   236 563 95.2    -   0   0   728I563M74I 17  1   c1725   12
H   236 97  97.9    -   0   0   732I97M536I 17  1   s11472  1
还有一个:

$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet(    val,allGoodPresent,someBadPresent) {
    allGoodPresent = 1
    for (val in goodVals) {
        if ( !(val in seen) ) {
            allGoodPresent = 0
        }
        delete seen[val]
    }
    someBadPresent = length(seen)
    if ( allGoodPresent && !someBadPresent ) {
        printf "%s", currSet
    }
    currSet = ""
    delete seen
}

$ awk -f tst.awk file
S   236 1365    *   0   *   *   *   15  1   c474    152
H   236 279 95  +   0   0   765I279M321I    10-1    1   s7689   1
H   236 301 99.7    -   0   0   908I301M156I    15  1   s8443   1
H   236 563 95.2    -   0   0   728I563M74I 17  1   c1725   12
H   236 97  97.9    -   0   0   732I97M536I 17  1   s11472  1

不幸的是,您发布的示例输入/输出不足以测试差异。

您好,现在,我想提取一组包含所有“10-1”、“15”和“17”的行。例如,上面内嵌中的最后两行只有“15”。所以我不想提取它们。然后编辑您的问题以澄清您的需求。在您的示例中包括所有期望值都存在但也存在不期望值的情况,以便我们可以看到您希望如何处理该情况。所以你应该有3个输入集-1)一些好的+一些坏的,2)所有好的+不坏的,3)所有好的+一些坏的。
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
!($9 in goodVals) { isBadSet=1 }
{ currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet() {
    if ( !isBadSet ) {
        printf "%s", currSet
    }
    currSet = ""
    isBadSet = 0
}
$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet(    val,allGoodPresent) {
    allGoodPresent = 1
    for (val in goodVals) {
        if ( !(val in seen) ) {
            allGoodPresent = 0
        }
    }
    if ( allGoodPresent ) {
        printf "%s", currSet
    }
    currSet = ""
    delete seen
}

$ awk -f tst.awk file
S   236 1365    *   0   *   *   *   15  1   c474    152
H   236 279 95  +   0   0   765I279M321I    10-1    1   s7689   1
H   236 301 99.7    -   0   0   908I301M156I    15  1   s8443   1
H   236 563 95.2    -   0   0   728I563M74I 17  1   c1725   12
H   236 97  97.9    -   0   0   732I97M536I 17  1   s11472  1
$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet(    val,allGoodPresent,someBadPresent) {
    allGoodPresent = 1
    for (val in goodVals) {
        if ( !(val in seen) ) {
            allGoodPresent = 0
        }
        delete seen[val]
    }
    someBadPresent = length(seen)
    if ( allGoodPresent && !someBadPresent ) {
        printf "%s", currSet
    }
    currSet = ""
    delete seen
}

$ awk -f tst.awk file
S   236 1365    *   0   *   *   *   15  1   c474    152
H   236 279 95  +   0   0   765I279M321I    10-1    1   s7689   1
H   236 301 99.7    -   0   0   908I301M156I    15  1   s8443   1
H   236 563 95.2    -   0   0   728I563M74I 17  1   c1725   12
H   236 97  97.9    -   0   0   732I97M536I 17  1   s11472  1