如何在Perl中有效地计算覆盖给定范围的范围？_Perl_Performance_Range

如何在Perl中有效地计算覆盖给定范围的范围？

perl performance

如何在Perl中有效地计算覆盖给定范围的范围？,perl,performance,range,Perl,Performance,Range,我有一个大约30k范围的数据库，每个范围都作为一对起点和终点给出： [12,80],[34,60],[34,9000],[76,743],... 我想编写一个Perl子例程，该子例程包含一个范围（不是来自数据库），并返回数据库中完全“包含”给定范围的范围数例如，如果数据库中只有这4个范围，而查询范围是[38,70]，那么子例程应该返回2，因为第一个和第三个范围都完全包含查询范围问题：我希望查询尽可能“便宜”，如果有帮助的话，我不介意做很多预处理几点注意：我自由地使用了“数据库”这个词，

我有一个大约30k范围的数据库，每个范围都作为一对起点和终点给出：

[12,80],[34,60],[34,9000],[76,743],...

我想编写一个Perl子例程，该子例程包含一个范围（不是来自数据库），并返回数据库中完全“包含”给定范围的范围数

例如，如果数据库中只有这4个范围，而查询范围是

[38,70]

，那么子例程应该返回

，因为第一个和第三个范围都完全包含查询范围

问题：我希望查询尽可能“便宜”，如果有帮助的话，我不介意做很多预处理

几点注意：

我自由地使用了“数据库”这个词，我不是指实际的数据库（例如SQL）；这只是一个长长的范围列表

我的世界是圆形的。。。有一个给定的

max_length

（例如

），像

[8541,6]

这样的范围是合法的（您可以将其视为一个单一范围，它是

[85419999]

和

[1,6]

的联合体）

谢谢，戴夫

更新这是我的原始代码：

use strict;
use warnings;

my $max_length = 200;
my @ranges     = (
    { START => 10,   END => 100 },
    { START => 30,   END => 90 },
    { START => 50, END => 80 },
    { START => 180,  END => 30 }
);

sub n_covering_ranges($) {
    my ($query_h) = shift;
    my $start     = $query_h->{START};
    my $end       = $query_h->{END};
    my $count     = 0;
    if ( $end >= $start ) {

        # query range is normal
        foreach my $range_h (@ranges) {
            if (( $start >= $range_h->{START} and $end <= $range_h->{END} )
                or (    $range_h->{END} <= $range_h->{START} and  $range_h->{START} <= $end )
                or ( $range_h->{END} <= $range_h->{START} and  $range_h->{END} >= $end)
                )
            {
                $count++;
            }
        }

    }

    else {

        # query range is hanging over edge
        # only other hanging over edges can contain it
        foreach my $range_h (@ranges) {
            if ( $start >= $range_h->{START} and $end <= $range_h->{END} ) {
                $count++;
            }
        }

    }

    return $count;
}

print n_covering_ranges( { START => 1, END => 10 } ), "\n";
print n_covering_ranges( { START => 30, END => 70 } ), "\n";

以及：

祝贺亚里士多德·帕格尔茨！您的实现非常快！然而，为了使用这个解决方案，我显然希望对对象进行一次预处理（创建）。创建此对象后，是否可以存储（

nstore

）此对象？我以前从未这样做过。我应该如何检索它？有什么特别的吗？希望检索会很快，这样就不会影响这个伟大的数据结构的整体性能

更新3

我尝试了一个简单的

nstore

并检索

RangeMap

对象。这似乎很有效。唯一的问题是生成的文件大约是1GB，我将有大约1000个这样的文件。为此，我可以忍受TB的存储空间，但我想知道是否还有其他方法可以更高效地存储它，而不会对检索性能造成太大的影响。另请参见此处：

更新4-
范围图
错误

不幸的是，

RangeMap

有一个bug。感谢来自帕尔蒙克斯的布劳瑟鲁克指出这一点。例如，创建一个具有

$max_lenght=10

且作为单个范围

[6,2]

的对象。然后查询

[7,8]

。答案应该是

，而不是

我认为这个更新包应该可以完成以下工作：

use strict;
use warnings;

package FastRanges;

sub new($$$) {
    my $class      = shift;
    my $max_length = shift;
    my $ranges_a   = shift;
    my @lookup;
    for ( @{$ranges_a} ) {
        my ( $start, $end ) = @$_;
        my @idx
            = $end >= $start
            ? $start .. $end
            : ( $start .. $max_length, 1 .. $end );
        for my $i (@idx) { $lookup[$i] .= pack 'L', $end }
    }
    bless \@lookup, $class;
}

sub num_ranges_containing($$$) {
    my $self = shift;
    my ( $start, $end ) = @_;    # query range coordinates

    return 0
        unless ( defined $self->[$start] )
        ;    # no ranges overlap the start position of the query

    if ( $end >= $start ) {

        # query range is simple
        # any inverted range in {LOOKUP}[$start] must contain it,
        # and so does any simple range which ends at or after $end
        return 0 + grep { $_ < $start or $end <= $_ } unpack 'L*',
            $self->[$start];
    }
    else {

        # query range is inverted
        # only inverted ranges in {LOOKUP}[$start] which also end
        # at of after $end contain it. simple ranges can't contain
        # the query range
        return 0 + grep { $_ < $start and $end <= $_ } unpack 'L*',
            $self->[$start];
    }
}

1;

使用严格；
使用警告；
包装快速范围；
次新版本（$$）{
我的$class=shift；
我的$max_长度=班次；
我的$ranges\u a=班次；
我的@lookup；
对于（@{$ranges_a}）{
我的（$start，$end）=@$\；
我的@idx
=$end>=$start
？$start..$end
：（$start..$max_length，1..$end）；
对于我的$i（@idx）{$lookup[$i].=pack'L'，$end}
}
祝福\@lookup，$class；
}
子编号\u范围\u包含（$$）{
我的$self=shift；
我的（$start，$end）=@35;查询范围坐标
返回0
除非（定义为$self->[$start]）
##没有范围与查询的开始位置重叠
如果（$end>=$start）{
#查询范围很简单
#{LOOKUP}[$start]中的任何反转范围都必须包含它，
#任何在$end或$end之后结束的简单范围也是如此
返回0+grep{$\<$start或$end[$start]；
}
否则{
#查询范围已反转
#仅在{LOOKUP}[$start]中也结束的反转范围
#在$end之后的，包含它。简单范围不能包含它
#查询范围
返回0+grep{$\<$start和$end[$start]；
}
}
1.

欢迎您发表意见。

您对哪一部分有问题？到目前为止您尝试了什么？这是一项相当简单的任务：

  * Iterate through the ranges
  * Foreach range, check if the test range is in it.
  * Profile and benchmark

这相当简单：

 my $test = [ $n, $m ];
 my @contains = map { 
      $test->[0] >= $_->[0] 
         and 
      $test->[1] <= $_->[1]
      } @ranges

my$test=[$n，$m]；
my@contains=map{
$test->[0]>=$\uU->[0]
及
$test->[1][1]
}@ranges

对于环绕范围，诀窍是在查看它们之前将它们分解为单独的范围。这是蛮力工作

而且，正如一个社会注意事项，你的提问率相当高：高于我对那些真正试图解决自己问题的人的期望。我认为你跑得太快了，而不是得到帮助，你实际上是在外包你的工作。这真的不是很好。我们根本没有得到报酬，而且特别是没有报酬去做分配给你的工作。如果你至少尝试了一个问题的实现，这可能会有很大的不同，但是你的许多问题似乎表明你甚至没有尝试过。

非常肯定有更好的方法来做这件事，但这里有一个起点：

预处理：

创建两个列表，一个按范围的起始值排序，一个按范围的结束值排序

一旦你获得了你的射程：

使用二进制搜索在开始排序列表中匹配它的开始
使用另一个二进制搜索来匹配其在结束排序列表中的结束
查找两个列表中出现的范围（@start[0..$start_index]和@end[$end_index..$#end]）

use strict;
use warnings;

my @ranges = ([12,80],[34,60],[34,9000],[76,743]);

# Split ranges between normal & wrapped:
my (@normal, @wrapped);

foreach my $r (@ranges) {
  if ($r->[0] <= $r->[1]) {
    push @normal, $r;
  } else {
    push @wrapped, $r;
  }
}

sub count_matches
{
  my ($start, $end, $max_length, $normal, $wrapped) = @_;

  if ($start <= $end) {
    # This is a normal range
    return (grep { $_->[0] <= $start and $_->[1] >= $end } @$normal)
        +  (grep { $end <= $_->[1] or $_->[0] <= $start } @$wrapped);
  } else {
    # This is a wrapped range
    return (grep { $_->[0] <= $start and $_->[1] >= $end } @$wrapped)
        # This part should probably be calculated only once:
        +  (grep { $_->[0] == 1 and $_->[1] == $max_length } @$normal);
  }
} # end count_matches

print count_matches(38,70, 9999, \@normal, \@wrapped)."\n";

使用严格；
使用警告；
my@ranges=（[12,80]、[34,60]、[349000]、[76743]）；
#正常和包裹之间的分割范围：
我的（@normal，@wrapped）；
每个我的$r（@ranges）{
如果（$r->[0][1]）{
按@normal，$r；
}否则{
按@wrapped$r；
}
}
子计数\u匹配
{
我的（$start、$end、$max_length、$normal、$wrapped）=@；
如果（$start[0][1]>=$end}@$normal）
+（grep{$end[1]或$\>[0][0][1]>=$end}@$wrapped）
#该部分可能只应计算一次：
+（grep{$\>[0]==1和$\>[1]=$max\u length}@$normal）；
}
}#结束计数#匹配
打印计数\u匹配（38,709999，\@正常，\@包装）。“\n”；

  * Iterate through the ranges
  * Foreach range, check if the test range is in it.
  * Profile and benchmark

 my $test = [ $n, $m ];
 my @contains = map { 
      $test->[0] >= $_->[0] 
         and 
      $test->[1] <= $_->[1]
      } @ranges

use strict;
use warnings;

my @ranges = ([12,80],[34,60],[34,9000],[76,743]);

# Split ranges between normal & wrapped:
my (@normal, @wrapped);

foreach my $r (@ranges) {
  if ($r->[0] <= $r->[1]) {
    push @normal, $r;
  } else {
    push @wrapped, $r;
  }
}

sub count_matches
{
  my ($start, $end, $max_length, $normal, $wrapped) = @_;

  if ($start <= $end) {
    # This is a normal range
    return (grep { $_->[0] <= $start and $_->[1] >= $end } @$normal)
        +  (grep { $end <= $_->[1] or $_->[0] <= $start } @$wrapped);
  } else {
    # This is a wrapped range
    return (grep { $_->[0] <= $start and $_->[1] >= $end } @$wrapped)
        # This part should probably be calculated only once:
        +  (grep { $_->[0] == 1 and $_->[1] == $max_length } @$normal);
  }
} # end count_matches

print count_matches(38,70, 9999, \@normal, \@wrapped)."\n";

my $max_length = 9999;
my @range = ( [12,80],[34,60],[34,9000] );

my @lookup;

for ( @range ) {
    my ( $start, $end ) = @$_;
    my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 0 .. $end );
    for my $i ( @idx ) { $lookup[$i] .= pack "L", $end }
}

sub num_ranges_containing {
    my ( $start, $end ) = @_;

    return 0 unless defined $lookup[$start];

    # simple ranges can be contained in inverted ranges,
    # but inverted ranges can only be contained in inverted ranges
    my $counter = ( $start <= $end )
        ? sub { 0 + grep { $_ < $start or  $end <= $_ } }
        : sub { 0 + grep { $_ < $start and $end <= $_ } };

    return $counter->( unpack 'L*', $lookup[$start] );
}

package RangeMap;

sub new {
    my $class = shift;
    my $max_length = shift;
    my @lookup;
    for ( @_ ) {
        my ( $start, $end ) = @$_;
        my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 0 .. $end );
        for my $i ( @idx ) { $lookup[$i] .= pack 'L', $end }
    }
    bless \@lookup, $class;
}

sub num_ranges_containing {
    my $self = shift;
    my ( $start, $end ) = @_;

    return 0 unless defined $self->[$start];

    # simple ranges can be contained in inverted ranges,
    # but inverted ranges can only be contained in inverted ranges
    my $counter = ( $start <= $end )
        ? sub { 0 + grep { $_ < $start or  $end <= $_ } }
        : sub { 0 + grep { $_ < $start and $end <= $_ } };

    return $counter->( unpack 'L*', $self->[$start] );
}

package main;
my $rm = RangeMap->new( 9999, [12,80],[34,60],[34,9000] );

my @ranges     = (
    { START => 10,   END => 100 },
    { START => 30,   END => 90 },
    { START => 50, END => 80 },
    { START => 180,  END => 30 }
);
my @intervals;
for my $range ( @ranges ) {
  my $int = new Number::Interval( Min => $range->{START},
                                  Max => $range->{END} );
  push @intervals, $int;
}

my $num_overlap = 0;
my $checkinterval = new Number::Interval( Min => $min, Max => $max );
for my $int ( @intervals ) {
  $num_overlap++ if $checkinterval->intersection( $int );
}

package SimpleRange;

sub new {
    my $class = shift;
    my ($m, $n) = @_;
    bless { start => $m, end => $n }, $class;
}

sub start { shift->{start} }
sub end   { shift->{end}   }

sub covers {
    # Returns true if the range covers some other range.
    my ($self, $other) = @_;
    return 1 if $self->start <= $other->start
            and $self->end   >= $other->end;
    return;
}

package WrappingRange;

sub new {
    my $class = shift;
    my ($raw_range, $MIN, $MAX) = @_;
    my ($m, $n) = @$raw_range;

    # Handle special case: a range that wraps all the way around.
    ($m, $n) = ($MIN, $MAX) if $m == $n + 1;

    my $self = {min => $MIN, max => $MAX};
    if ($m <= $n){
        $self->{top}  = SimpleRange->new($m, $n);
        $self->{wrap} = undef;
    }
    else {
        $self->{top}  = SimpleRange->new($m, $MAX);
        $self->{wrap} = SimpleRange->new($MIN, $n);    
    }
    bless $self, $class;
}

sub top  { shift->{top}  }
sub wrap { shift->{wrap} }
sub is_simple { ! shift->{wrap} }

sub simple_ranges {
    my $self = shift;
    return $self->is_simple ? $self->top : ($self->top, $self->wrap);
}

sub covers {
    my @selfR  = shift->simple_ranges;
    my @otherR = shift->simple_ranges;
    while (@selfR and @otherR){
        if ( $selfR[0]->covers($otherR[0]) ){
            shift @otherR;
        }
        else {
            shift @selfR;
        }
    }
    return if @otherR;
    return 1;
}

package main;
main();

sub main {
    my ($MIN, $MAX) = (0, 200);

    my @raw_ranges = (
        [10, 100], [30, 90], [50, 80], [$MIN, $MAX],
        [180, 30], 
        [$MAX, $MAX - 1], [$MAX, $MAX - 2],
        [50, 49], [50, 48],
    );
    my @wrapping_ranges = map WrappingRange->new($_, $MIN, $MAX), @raw_ranges;

    my @tests = ( [1, 10], [30, 70], [160, 10], [190, 5] );
    for my $t (@tests){
        $t = WrappingRange->new($t, $MIN, $MAX);

        my @covers = map $_->covers($t) ? 1 : 0, @wrapping_ranges;

        my $n;
        $n += $_ for @covers;
        print "@covers  N=$n\n";
    }
}

0 0 0 1 1 1 1 1 1  N=6
1 1 0 1 0 1 1 1 0  N=6
0 0 0 1 0 1 0 1 1  N=4
0 0 0 1 1 1 0 1 1  N=5