perl：将数值与两个文件中的范围匹配_Perl

perl：将数值与两个文件中的范围匹配

perl

perl：将数值与两个文件中的范围匹配,perl,Perl,我有一个很大的文件，有起始位置和结束位置，但这里有一个片段： (A) 11897 11976 (B) 17024 18924 (C) 25687 25709 (ii) 11649 18924 (A) 11897 11976 (iii) 23145 31277 (C) 25687 25709 以及另一个具有开始和结束位置的文件（也是一个片段）： (ii) 11649 18924 (A) 11897

我有一个很大的文件，有起始位置和结束位置，但这里有一个片段：

(A)   11897   11976           
(B)   17024   18924         
(C)   25687  25709

(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709

以及另一个具有开始和结束位置的文件（也是一个片段）：

(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709

我想知道文件2中的值是否包含文件1中值的起始位置和结束位置在其范围内

(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709

我想要的结果文件如下所示：

(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709

我编写了一个perl代码：

(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709

open my $firstfile, '<', $ARGV[0] or die "$!";
open my $secondfile, '<', $ARGV[1] or die "$!";

while (<$firstfile>) {
    @col=split /\s+/;
    $start=$col[1];
    $end= $col[2];

    while (<$secondfile>) {
        @seccol=split /\s+/;
        $begin=$seccol[1];
        $finish=$seccol[2];     

        print join ("\t", @col, @seccol), "\n" if ($start>=$begin and $end<=$finish);
    }
}

有什么建议吗？

您需要每次倒带第二个文件，或者（可能更好，取决于其大小）将其加载到数组中

(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709

#!/usr/bin/perl
use strict;
use warnings;

my ($start,$end,$begin,$finish);

open my $firstfile, '<', $ARGV[0] or die "$!";
open my $secondfile, '<', $ARGV[1] or die "$!";

while (<$firstfile>) {
        my @col=split /\s+/;
        $start=$col[1];
        $end= $col[2];

        seek($secondfile,0,0);
        while (<$secondfile>) {
           my @seccol=split /\s+/;
           $begin=$seccol[1];
           $finish=$seccol[2];
           print join ("\t", @col, @seccol), "\n" if ($start>=$begin and $end<=$finish);
        }
}

#/usr/bin/perl
严格使用；
使用警告；
我的（$start，$end，$begin，$finish）；
打开我的$firstfile，“，因为您使用的是嵌套循环，所以第二个文件在外部循环的第一次迭代后已被完全使用。不必重新读取文件，您可以创建一个数组，其中包含第一个文件中的元素，然后将它们与第二个文件进行比较：
(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709          

use strict;
use warnings;
use autodie;

open my $firstfile, '<', $ARGV[0];
open my $secondfile, '<', $ARGV[1];

my @range;

while (<$firstfile>) {
    push @range, [ split ];
}

while (<$secondfile>) {
    my @col = split;
    my @matches = grep {
        $$_[1] >= $col[1] && $$_[2] <= $col[2]
    } @range;

    if (@matches > 0) {
        for my $ref (@matches) {
            print join("\t", @$ref, @col), "\n";
        }
    }
}

使用严格；
使用警告；
使用自动模具；
打开我的$firstfile，“这里是另一个perl one行程序：
(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709          

perl -lane '
BEGIN { 
    $x = pop;
    push @range, map[split], <>; 
    @ARGV = $x
}  
for (@range) {
    if ($F[1] <= $_->[1] && $F[2] >= $_->[2]) {
        print join " ", @F, @$_
    }
}' bigfile secondfile
(ii) 11649 13714 (A) 11897 11976
(iii) 23146 31227 (C) 25687 25709

perl-lane'
开始{
$x=流行音乐；
按@范围，映射[拆分]；
@ARGV=$x
}  
对于（@范围）{
如果（$F[1][1]&&$F[2]>=$\uU->[2]）{
打印加入“”@F@$_
}
}'bigfile secondfile
（ii）1164913714（A）1189711976
（iii）2314631227（C）2568725709

使用命令行选项：
(ii) 11649 18924 (A) 11897 11976      
(iii) 23145 31277 (C) 25687 25709          


-l
从每行中删除换行符，并在打印期间将其放回原处
-a
自动将行拆分为数组@F
-n
创建一个while（）{..}
循环来处理每一行
-e
执行代码块
在BEGIN
块中，我们遍历大文件，创建一个数组
在主体中，我们检查第二列和第三列是否在范围内，如果在范围内，则打印整个行和整个数组内容
我认为这不是解决方案，但输入错误：$begin=$secol[1]应为$seccol[1]
。还有一个机会告诉某人要严格使用；使用警告-a:autosplit
选项为第二个文件创建@F
数组。使用split
功能在BEGIN
块中拆分大文件。