如何在Perl中有效地计算覆盖给定范围的范围?

如何在Perl中有效地计算覆盖给定范围的范围?,perl,performance,range,Perl,Performance,Range,我有一个大约30k范围的数据库,每个范围都作为一对起点和终点给出: [12,80],[34,60],[34,9000],[76,743],... 我想编写一个Perl子例程,该子例程包含一个范围(不是来自数据库),并返回数据库中完全“包含”给定范围的范围数 例如,如果数据库中只有这4个范围,而查询范围是[38,70],那么子例程应该返回2,因为第一个和第三个范围都完全包含查询范围 问题:我希望查询尽可能“便宜”,如果有帮助的话,我不介意做很多预处理 几点注意: 我自由地使用了“数据库”这个词,

我有一个大约30k范围的数据库,每个范围都作为一对起点和终点给出:

[12,80],[34,60],[34,9000],[76,743],...
我想编写一个Perl子例程,该子例程包含一个范围(不是来自数据库),并返回数据库中完全“包含”给定范围的范围数

例如,如果数据库中只有这4个范围,而查询范围是
[38,70]
,那么子例程应该返回
2
,因为第一个和第三个范围都完全包含查询范围

问题:我希望查询尽可能“便宜”,如果有帮助的话,我不介意做很多预处理

几点注意:

  • 我自由地使用了“数据库”这个词,我不是指实际的数据库(例如SQL);这只是一个长长的范围列表

  • 我的世界是圆形的。。。有一个给定的
    max_length
    (例如
    9999
    ),像
    [8541,6]
    这样的范围是合法的(您可以将其视为一个单一范围,它是
    [85419999]
    [1,6]
    的联合体)

  • 谢谢, 戴夫

    更新 这是我的原始代码:

    use strict;
    use warnings;
    
    my $max_length = 200;
    my @ranges     = (
        { START => 10,   END => 100 },
        { START => 30,   END => 90 },
        { START => 50, END => 80 },
        { START => 180,  END => 30 }
    );
    
    sub n_covering_ranges($) {
        my ($query_h) = shift;
        my $start     = $query_h->{START};
        my $end       = $query_h->{END};
        my $count     = 0;
        if ( $end >= $start ) {
    
            # query range is normal
            foreach my $range_h (@ranges) {
                if (( $start >= $range_h->{START} and $end <= $range_h->{END} )
                    or (    $range_h->{END} <= $range_h->{START} and  $range_h->{START} <= $end )
                    or ( $range_h->{END} <= $range_h->{START} and  $range_h->{END} >= $end)
                    )
                {
                    $count++;
                }
            }
    
        }
    
        else {
    
            # query range is hanging over edge
            # only other hanging over edges can contain it
            foreach my $range_h (@ranges) {
                if ( $start >= $range_h->{START} and $end <= $range_h->{END} ) {
                    $count++;
                }
            }
    
        }
    
        return $count;
    }
    
    print n_covering_ranges( { START => 1, END => 10 } ), "\n";
    print n_covering_ranges( { START => 30, END => 70 } ), "\n";
    
    以及:

    祝贺亚里士多德·帕格尔茨!您的实现非常快! 然而,为了使用这个解决方案,我显然希望对对象进行一次预处理(创建)。创建此对象后,是否可以存储(
    nstore
    )此对象?我以前从未这样做过。我应该如何检索它?有什么特别的吗?希望检索会很快,这样就不会影响这个伟大的数据结构的整体性能

    更新3

    我尝试了一个简单的
    nstore
    并检索
    RangeMap
    对象。这似乎很有效。唯一的问题是生成的文件大约是1GB,我将有大约1000个这样的文件。为此,我可以忍受TB的存储空间,但我想知道是否还有其他方法可以更高效地存储它,而不会对检索性能造成太大的影响。另请参见此处:

    更新4-
    范围图
    错误

    不幸的是,
    RangeMap
    有一个bug。感谢来自帕尔蒙克斯的布劳瑟鲁克指出这一点。例如,创建一个具有
    $max_lenght=10
    且作为单个范围
    [6,2]
    的对象。然后查询
    [7,8]
    。答案应该是
    1
    ,而不是
    0

    我认为这个更新包应该可以完成以下工作:

    use strict;
    use warnings;
    
    package FastRanges;
    
    sub new($$$) {
        my $class      = shift;
        my $max_length = shift;
        my $ranges_a   = shift;
        my @lookup;
        for ( @{$ranges_a} ) {
            my ( $start, $end ) = @$_;
            my @idx
                = $end >= $start
                ? $start .. $end
                : ( $start .. $max_length, 1 .. $end );
            for my $i (@idx) { $lookup[$i] .= pack 'L', $end }
        }
        bless \@lookup, $class;
    }
    
    sub num_ranges_containing($$$) {
        my $self = shift;
        my ( $start, $end ) = @_;    # query range coordinates
    
        return 0
            unless ( defined $self->[$start] )
            ;    # no ranges overlap the start position of the query
    
        if ( $end >= $start ) {
    
            # query range is simple
            # any inverted range in {LOOKUP}[$start] must contain it,
            # and so does any simple range which ends at or after $end
            return 0 + grep { $_ < $start or $end <= $_ } unpack 'L*',
                $self->[$start];
        }
        else {
    
            # query range is inverted
            # only inverted ranges in {LOOKUP}[$start] which also end
            # at of after $end contain it. simple ranges can't contain
            # the query range
            return 0 + grep { $_ < $start and $end <= $_ } unpack 'L*',
                $self->[$start];
        }
    }
    
    1;
    
    使用严格;
    使用警告;
    包装快速范围;
    次新版本($$){
    我的$class=shift;
    我的$max_长度=班次;
    我的$ranges\u a=班次;
    我的@lookup;
    对于(@{$ranges_a}){
    我的($start,$end)=@$\;
    我的@idx
    =$end>=$start
    ?$start..$end
    :($start..$max_length,1..$end);
    对于我的$i(@idx){$lookup[$i].=pack'L',$end}
    }
    祝福\@lookup,$class;
    }
    子编号\u范围\u包含($$){
    我的$self=shift;
    我的($start,$end)=@35;查询范围坐标
    返回0
    除非(定义为$self->[$start])
    ##没有范围与查询的开始位置重叠
    如果($end>=$start){
    #查询范围很简单
    #{LOOKUP}[$start]中的任何反转范围都必须包含它,
    #任何在$end或$end之后结束的简单范围也是如此
    返回0+grep{$\<$start或$end[$start];
    }
    否则{
    #查询范围已反转
    #仅在{LOOKUP}[$start]中也结束的反转范围
    #在$end之后的,包含它。简单范围不能包含它
    #查询范围
    返回0+grep{$\<$start和$end[$start];
    }
    }
    1.
    

    欢迎您发表意见。

    您对哪一部分有问题?到目前为止您尝试了什么?这是一项相当简单的任务:

      * Iterate through the ranges
      * Foreach range, check if the test range is in it.
      * Profile and benchmark
    
    这相当简单:

     my $test = [ $n, $m ];
     my @contains = map { 
          $test->[0] >= $_->[0] 
             and 
          $test->[1] <= $_->[1]
          } @ranges
    
    my$test=[$n,$m];
    my@contains=map{
    $test->[0]>=$\uU->[0]
    及
    $test->[1][1]
    }@ranges
    
    对于环绕范围,诀窍是在查看它们之前将它们分解为单独的范围。这是蛮力工作


    而且,正如一个社会注意事项,你的提问率相当高:高于我对那些真正试图解决自己问题的人的期望。我认为你跑得太快了,而不是得到帮助,你实际上是在外包你的工作。这真的不是很好。我们根本没有得到报酬,而且特别是没有报酬去做分配给你的工作。如果你至少尝试了一个问题的实现,这可能会有很大的不同,但是你的许多问题似乎表明你甚至没有尝试过。

    非常肯定有更好的方法来做这件事,但这里有一个起点:

    预处理:

    • 创建两个列表,一个按范围的起始值排序,一个按范围的结束值排序
    一旦你获得了你的射程:

    • 使用二进制搜索在开始排序列表中匹配它的开始
    • 使用另一个二进制搜索来匹配其在结束排序列表中的结束
    • 查找两个列表中出现的范围(@start[0..$start_index]和@end[$end_index..$#end])

      • 以下是暴力解决方案的一种方法:

        use strict;
        use warnings;
        
        my @ranges = ([12,80],[34,60],[34,9000],[76,743]);
        
        # Split ranges between normal & wrapped:
        my (@normal, @wrapped);
        
        foreach my $r (@ranges) {
          if ($r->[0] <= $r->[1]) {
            push @normal, $r;
          } else {
            push @wrapped, $r;
          }
        }
        
        sub count_matches
        {
          my ($start, $end, $max_length, $normal, $wrapped) = @_;
        
          if ($start <= $end) {
            # This is a normal range
            return (grep { $_->[0] <= $start and $_->[1] >= $end } @$normal)
                +  (grep { $end <= $_->[1] or $_->[0] <= $start } @$wrapped);
          } else {
            # This is a wrapped range
            return (grep { $_->[0] <= $start and $_->[1] >= $end } @$wrapped)
                # This part should probably be calculated only once:
                +  (grep { $_->[0] == 1 and $_->[1] == $max_length } @$normal);
          }
        } # end count_matches
        
        print count_matches(38,70, 9999, \@normal, \@wrapped)."\n";
        
        使用严格;
        使用警告;
        my@ranges=([12,80]、[34,60]、[349000]、[76743]);
        #正常和包裹之间的分割范围:
        我的(@normal,@wrapped);
        每个我的$r(@ranges){
        如果($r->[0][1]){
        按@normal,$r;
        }否则{
        按@wrapped$r;
        }
        }
        子计数\u匹配
        {
        我的($start、$end、$max_length、$normal、$wrapped)=@;
        如果($start[0][1]>=$end}@$normal)
        +(grep{$end[1]或$\>[0][0][1]>=$end}@$wrapped)
        #该部分可能只应计算一次:
        +(grep{$\>[0]==1和$\>[1]=$max\u length}@$normal);
        }
        }#结束计数#匹配
        打印计数\u匹配(38,709999,\@正常,\@包装)。“\n”;
        
        你有很多
          * Iterate through the ranges
          * Foreach range, check if the test range is in it.
          * Profile and benchmark
        
         my $test = [ $n, $m ];
         my @contains = map { 
              $test->[0] >= $_->[0] 
                 and 
              $test->[1] <= $_->[1]
              } @ranges
        
        use strict;
        use warnings;
        
        my @ranges = ([12,80],[34,60],[34,9000],[76,743]);
        
        # Split ranges between normal & wrapped:
        my (@normal, @wrapped);
        
        foreach my $r (@ranges) {
          if ($r->[0] <= $r->[1]) {
            push @normal, $r;
          } else {
            push @wrapped, $r;
          }
        }
        
        sub count_matches
        {
          my ($start, $end, $max_length, $normal, $wrapped) = @_;
        
          if ($start <= $end) {
            # This is a normal range
            return (grep { $_->[0] <= $start and $_->[1] >= $end } @$normal)
                +  (grep { $end <= $_->[1] or $_->[0] <= $start } @$wrapped);
          } else {
            # This is a wrapped range
            return (grep { $_->[0] <= $start and $_->[1] >= $end } @$wrapped)
                # This part should probably be calculated only once:
                +  (grep { $_->[0] == 1 and $_->[1] == $max_length } @$normal);
          }
        } # end count_matches
        
        print count_matches(38,70, 9999, \@normal, \@wrapped)."\n";
        
        my $max_length = 9999;
        my @range = ( [12,80],[34,60],[34,9000] );
        
        my @lookup;
        
        for ( @range ) {
            my ( $start, $end ) = @$_;
            my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 0 .. $end );
            for my $i ( @idx ) { $lookup[$i] .= pack "L", $end }
        }
        
        sub num_ranges_containing {
            my ( $start, $end ) = @_;
        
            return 0 unless defined $lookup[$start];
        
            # simple ranges can be contained in inverted ranges,
            # but inverted ranges can only be contained in inverted ranges
            my $counter = ( $start <= $end )
                ? sub { 0 + grep { $_ < $start or  $end <= $_ } }
                : sub { 0 + grep { $_ < $start and $end <= $_ } };
        
            return $counter->( unpack 'L*', $lookup[$start] );
        }
        
        package RangeMap;
        
        sub new {
            my $class = shift;
            my $max_length = shift;
            my @lookup;
            for ( @_ ) {
                my ( $start, $end ) = @$_;
                my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 0 .. $end );
                for my $i ( @idx ) { $lookup[$i] .= pack 'L', $end }
            }
            bless \@lookup, $class;
        }
        
        sub num_ranges_containing {
            my $self = shift;
            my ( $start, $end ) = @_;
        
            return 0 unless defined $self->[$start];
        
            # simple ranges can be contained in inverted ranges,
            # but inverted ranges can only be contained in inverted ranges
            my $counter = ( $start <= $end )
                ? sub { 0 + grep { $_ < $start or  $end <= $_ } }
                : sub { 0 + grep { $_ < $start and $end <= $_ } };
        
            return $counter->( unpack 'L*', $self->[$start] );
        }
        
        package main;
        my $rm = RangeMap->new( 9999, [12,80],[34,60],[34,9000] );
        
        my @ranges     = (
            { START => 10,   END => 100 },
            { START => 30,   END => 90 },
            { START => 50, END => 80 },
            { START => 180,  END => 30 }
        );
        my @intervals;
        for my $range ( @ranges ) {
          my $int = new Number::Interval( Min => $range->{START},
                                          Max => $range->{END} );
          push @intervals, $int;
        }
        
        my $num_overlap = 0;
        my $checkinterval = new Number::Interval( Min => $min, Max => $max );
        for my $int ( @intervals ) {
          $num_overlap++ if $checkinterval->intersection( $int );
        }
        
        package SimpleRange;
        
        sub new {
            my $class = shift;
            my ($m, $n) = @_;
            bless { start => $m, end => $n }, $class;
        }
        
        sub start { shift->{start} }
        sub end   { shift->{end}   }
        
        sub covers {
            # Returns true if the range covers some other range.
            my ($self, $other) = @_;
            return 1 if $self->start <= $other->start
                    and $self->end   >= $other->end;
            return;
        }
        
        package WrappingRange;
        
        sub new {
            my $class = shift;
            my ($raw_range, $MIN, $MAX) = @_;
            my ($m, $n) = @$raw_range;
        
            # Handle special case: a range that wraps all the way around.
            ($m, $n) = ($MIN, $MAX) if $m == $n + 1;
        
            my $self = {min => $MIN, max => $MAX};
            if ($m <= $n){
                $self->{top}  = SimpleRange->new($m, $n);
                $self->{wrap} = undef;
            }
            else {
                $self->{top}  = SimpleRange->new($m, $MAX);
                $self->{wrap} = SimpleRange->new($MIN, $n);    
            }
            bless $self, $class;
        }
        
        sub top  { shift->{top}  }
        sub wrap { shift->{wrap} }
        sub is_simple { ! shift->{wrap} }
        
        sub simple_ranges {
            my $self = shift;
            return $self->is_simple ? $self->top : ($self->top, $self->wrap);
        }
        
        sub covers {
            my @selfR  = shift->simple_ranges;
            my @otherR = shift->simple_ranges;
            while (@selfR and @otherR){
                if ( $selfR[0]->covers($otherR[0]) ){
                    shift @otherR;
                }
                else {
                    shift @selfR;
                }
            }
            return if @otherR;
            return 1;
        }
        
        package main;
        main();
        
        sub main {
            my ($MIN, $MAX) = (0, 200);
        
            my @raw_ranges = (
                [10, 100], [30, 90], [50, 80], [$MIN, $MAX],
                [180, 30], 
                [$MAX, $MAX - 1], [$MAX, $MAX - 2],
                [50, 49], [50, 48],
            );
            my @wrapping_ranges = map WrappingRange->new($_, $MIN, $MAX), @raw_ranges;
        
            my @tests = ( [1, 10], [30, 70], [160, 10], [190, 5] );
            for my $t (@tests){
                $t = WrappingRange->new($t, $MIN, $MAX);
        
                my @covers = map $_->covers($t) ? 1 : 0, @wrapping_ranges;
        
                my $n;
                $n += $_ for @covers;
                print "@covers  N=$n\n";
            }
        }
        
        0 0 0 1 1 1 1 1 1  N=6
        1 1 0 1 0 1 1 1 0  N=6
        0 0 0 1 0 1 0 1 1  N=4
        0 0 0 1 1 1 0 1 1  N=5