Awk 使用轴列提取线
填充Awk 使用轴列提取线,awk,Awk,填充 S 235 1365 * 0 * * * 15 1 c81 592 H 235 296 99.7 + 0 0 3I296M1066I 14 1 s15018 1 H 235 719 95.4 + 0 0 174D545M820I 15 1 c2664 10 H 235 764 99.1 + 0 0 55I764M546I 15 1 c6519 4 H
S 235 1365 * 0 * * * 15 1 c81 592
H 235 296 99.7 + 0 0 3I296M1066I 14 1 s15018 1
H 235 719 95.4 + 0 0 174D545M820I 15 1 c2664 10
H 235 764 99.1 + 0 0 55I764M546I 15 1 c6519 4
H 235 792 100 + 0 0 180I792M393I 14 1 c407 107
S 236 1365 * 0 * * * 15 1 c474 152
H 236 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 236 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 236 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 236 97 97.9 - 0 0 732I97M536I 17 1 s11472 1
S 237 1365 * 0 * * * 15 1 c474 152
H 237 279 95 + 0 0 765I279M321I 15 1 s7689 1
S 238 1365 * 0 * * * 12 1 c474 152
H 238 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 238 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 238 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 238 97 97.9 - 0 0 732I97M536I 17 1 s11472 1
下面是我想要的文件
示例1通过指定第九列“10-1”、“15”和“17”
示例2指定第九列“14”和“15”
示例3指定第九列“15”
所以我想在第二列中提取一组具有相同值的行。此时,我只需要提取第9列中具有特定值的一组行。在这种情况下,行集合需要具有“所有指定值”
集合238在第九列中具有未指定的“12”。所以我不希望他们被提取出来
这个问题和这个问题很相似。
有许多可能的方法,但我认为最稳健、最容易在以后扩展的方法是创建一个所需值的哈希表(
goodVals[]
),然后只需测试当前的$9
是否是该表中没有的值:
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
!($9 in goodVals) { isBadSet=1 }
{ currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet() {
if ( !isBadSet ) {
printf "%s", currSet
}
currSet = ""
isBadSet = 0
}
鉴于您评论中的新要求,以下是对该要求的一种可能解释的解决方案:
$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet( val,allGoodPresent) {
allGoodPresent = 1
for (val in goodVals) {
if ( !(val in seen) ) {
allGoodPresent = 0
}
}
if ( allGoodPresent ) {
printf "%s", currSet
}
currSet = ""
delete seen
}
$ awk -f tst.awk file
S 236 1365 * 0 * * * 15 1 c474 152
H 236 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 236 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 236 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 236 97 97.9 - 0 0 732I97M536I 17 1 s11472 1
还有一个:
$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet( val,allGoodPresent,someBadPresent) {
allGoodPresent = 1
for (val in goodVals) {
if ( !(val in seen) ) {
allGoodPresent = 0
}
delete seen[val]
}
someBadPresent = length(seen)
if ( allGoodPresent && !someBadPresent ) {
printf "%s", currSet
}
currSet = ""
delete seen
}
$ awk -f tst.awk file
S 236 1365 * 0 * * * 15 1 c474 152
H 236 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 236 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 236 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 236 97 97.9 - 0 0 732I97M536I 17 1 s11472 1
不幸的是,您发布的示例输入/输出不足以测试差异。您好,现在,我想提取一组包含所有“10-1”、“15”和“17”的行。例如,上面内嵌中的最后两行只有“15”。所以我不想提取它们。然后编辑您的问题以澄清您的需求。在您的示例中包括所有期望值都存在但也存在不期望值的情况,以便我们可以看到您希望如何处理该情况。所以你应该有3个输入集-1)一些好的+一些坏的,2)所有好的+不坏的,3)所有好的+一些坏的。
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
!($9 in goodVals) { isBadSet=1 }
{ currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet() {
if ( !isBadSet ) {
printf "%s", currSet
}
currSet = ""
isBadSet = 0
}
$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet( val,allGoodPresent) {
allGoodPresent = 1
for (val in goodVals) {
if ( !(val in seen) ) {
allGoodPresent = 0
}
}
if ( allGoodPresent ) {
printf "%s", currSet
}
currSet = ""
delete seen
}
$ awk -f tst.awk file
S 236 1365 * 0 * * * 15 1 c474 152
H 236 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 236 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 236 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 236 97 97.9 - 0 0 732I97M536I 17 1 s11472 1
$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet( val,allGoodPresent,someBadPresent) {
allGoodPresent = 1
for (val in goodVals) {
if ( !(val in seen) ) {
allGoodPresent = 0
}
delete seen[val]
}
someBadPresent = length(seen)
if ( allGoodPresent && !someBadPresent ) {
printf "%s", currSet
}
currSet = ""
delete seen
}
$ awk -f tst.awk file
S 236 1365 * 0 * * * 15 1 c474 152
H 236 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 236 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 236 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 236 97 97.9 - 0 0 732I97M536I 17 1 s11472 1