Unix 根据在另一个文件中指定的范围内的值筛选文件_Unix_Awk

Unix 根据在另一个文件中指定的范围内的值筛选文件

unix awk

Unix 根据在另一个文件中指定的范围内的值筛选文件,unix,awk,Unix,Awk,我想根据这两个标准筛选file1 （a）仅包括$1可以在文件2中找到与$1匹配的记录（在许多情况下会有多个匹配）（b）找到匹配项后，应检查file1中的$2，以确保它在file2中的$2和$3指定的范围内文件1: seq_100|rf001 298 01 11 01 11 seq_0442|rf76 6000 01 11 10 00 seq_9999|rf54 5098 01 01 01 01 seq_100|rf001 0 679 seq_100|rf001 700 800 seq_

我想根据这两个标准筛选

file1

（a）仅包括

$1

可以在

文件2

中找到与

$1

匹配的记录（在许多情况下会有多个匹配）

（b）找到匹配项后，应检查

file1

中的

$2

，以确保它在

file2

中的

$2

和

$3

指定的范围内

文件1:

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

文件2:

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

预期输出：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

你可以试试这个

awk

one liner

awk 'NR==FNR{ if($1 in a) a[$1]=a[$1]","$2" "$3; else a[$1]=$2" "$3;next;} {n=split(a[$1],arr,",");for(i=1;i<n;i++){split(arr[i],b," ");if( $2 > b[1] && $2 < b[2] ){ print $0;} }}' file2 file1

测试：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

你可以试试这个

awk

one liner

awk 'NR==FNR{ if($1 in a) a[$1]=a[$1]","$2" "$3; else a[$1]=$2" "$3;next;} {n=split(a[$1],arr,",");for(i=1;i<n;i++){split(arr[i],b," ");if( $2 > b[1] && $2 < b[2] ){ print $0;} }}' file2 file1

测试：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

你可以试试这个

awk

one liner

awk 'NR==FNR{ if($1 in a) a[$1]=a[$1]","$2" "$3; else a[$1]=$2" "$3;next;} {n=split(a[$1],arr,",");for(i=1;i<n;i++){split(arr[i],b," ");if( $2 > b[1] && $2 < b[2] ){ print $0;} }}' file2 file1

测试：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

你可以试试这个

awk

one liner

awk 'NR==FNR{ if($1 in a) a[$1]=a[$1]","$2" "$3; else a[$1]=$2" "$3;next;} {n=split(a[$1],arr,",");for(i=1;i<n;i++){split(arr[i],b," ");if( $2 > b[1] && $2 < b[2] ){ print $0;} }}' file2 file1

测试：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

以下是使用awk的另一种方法：

awk '
NR==FNR {
  line[$1,$2] = $0; 
  next
}
{
  for(key in line) {
    split(key, tmp, SUBSEP); 
    if(tmp[1] == $1 && tmp[2] > $2 && tmp[2] < $3) 
      print line[tmp[1],tmp[2]]
    }
}' file1 file2

说明：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

我们遍历file1并将整行存储在列1和列2索引的二维数组中
一旦整个文件1存储在内存中，我们将迭代数组行中的每个键
我们分割密钥并检查第二个文件的第1列是否等于密钥的第一部分，以及密钥的第二部分是否在范围内

如果一切都是金色的，我们就打印线条

awk '
NR==FNR {
  line[$1,$2] = $0; 
  next
}
{
  for(key in line) {
    split(key, tmp, SUBSEP); 
    if(tmp[1] == $1 && tmp[2] > $2 && tmp[2] < $3) 
      print line[tmp[1],tmp[2]]
    }
}' file1 file2

说明：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

我们遍历file1并将整行存储在列1和列2索引的二维数组中
一旦整个文件1存储在内存中，我们将迭代数组行中的每个键
我们分割密钥并检查第二个文件的第1列是否等于密钥的第一部分，以及密钥的第二部分是否在范围内

如果一切都是金色的，我们就打印线条

awk '
NR==FNR {
  line[$1,$2] = $0; 
  next
}
{
  for(key in line) {
    split(key, tmp, SUBSEP); 
    if(tmp[1] == $1 && tmp[2] > $2 && tmp[2] < $3) 
      print line[tmp[1],tmp[2]]
    }
}' file1 file2

说明：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

我们遍历file1并将整行存储在列1和列2索引的二维数组中
一旦整个文件1存储在内存中，我们将迭代数组行中的每个键
我们分割密钥并检查第二个文件的第1列是否等于密钥的第一部分，以及密钥的第二部分是否在范围内

如果一切都是金色的，我们就打印线条

awk '
NR==FNR {
  line[$1,$2] = $0; 
  next
}
{
  for(key in line) {
    split(key, tmp, SUBSEP); 
    if(tmp[1] == $1 && tmp[2] > $2 && tmp[2] < $3) 
      print line[tmp[1],tmp[2]]
    }
}' file1 file2

说明：

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

seq_100|rf001 298 01 11 01 11

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

seq_100|rf001 298 01 11 01 11

我们遍历file1并将整行存储在列1和列2索引的二维数组中
一旦整个文件1存储在内存中，我们将迭代数组行中的每个键
我们分割密钥并检查第二个文件的第1列是否等于密钥的第一部分，以及密钥的第二部分是否在范围内
如果一切都是金色的，我们就打印线条