如何在Perl中有效地计算覆盖给定范围的范围?
我有一个大约30k范围的数据库,每个范围都作为一对起点和终点给出:如何在Perl中有效地计算覆盖给定范围的范围?,perl,performance,range,Perl,Performance,Range,我有一个大约30k范围的数据库,每个范围都作为一对起点和终点给出: [12,80],[34,60],[34,9000],[76,743],... 我想编写一个Perl子例程,该子例程包含一个范围(不是来自数据库),并返回数据库中完全“包含”给定范围的范围数 例如,如果数据库中只有这4个范围,而查询范围是[38,70],那么子例程应该返回2,因为第一个和第三个范围都完全包含查询范围 问题:我希望查询尽可能“便宜”,如果有帮助的话,我不介意做很多预处理 几点注意: 我自由地使用了“数据库”这个词,
[12,80],[34,60],[34,9000],[76,743],...
我想编写一个Perl子例程,该子例程包含一个范围(不是来自数据库),并返回数据库中完全“包含”给定范围的范围数
例如,如果数据库中只有这4个范围,而查询范围是[38,70]
,那么子例程应该返回2
,因为第一个和第三个范围都完全包含查询范围
问题:我希望查询尽可能“便宜”,如果有帮助的话,我不介意做很多预处理
几点注意:
max_length
(例如9999
),像[8541,6]
这样的范围是合法的(您可以将其视为一个单一范围,它是[85419999]
和[1,6]
的联合体)use strict;
use warnings;
my $max_length = 200;
my @ranges = (
{ START => 10, END => 100 },
{ START => 30, END => 90 },
{ START => 50, END => 80 },
{ START => 180, END => 30 }
);
sub n_covering_ranges($) {
my ($query_h) = shift;
my $start = $query_h->{START};
my $end = $query_h->{END};
my $count = 0;
if ( $end >= $start ) {
# query range is normal
foreach my $range_h (@ranges) {
if (( $start >= $range_h->{START} and $end <= $range_h->{END} )
or ( $range_h->{END} <= $range_h->{START} and $range_h->{START} <= $end )
or ( $range_h->{END} <= $range_h->{START} and $range_h->{END} >= $end)
)
{
$count++;
}
}
}
else {
# query range is hanging over edge
# only other hanging over edges can contain it
foreach my $range_h (@ranges) {
if ( $start >= $range_h->{START} and $end <= $range_h->{END} ) {
$count++;
}
}
}
return $count;
}
print n_covering_ranges( { START => 1, END => 10 } ), "\n";
print n_covering_ranges( { START => 30, END => 70 } ), "\n";
以及:
祝贺亚里士多德·帕格尔茨!您的实现非常快!
然而,为了使用这个解决方案,我显然希望对对象进行一次预处理(创建)。创建此对象后,是否可以存储(nstore
)此对象?我以前从未这样做过。我应该如何检索它?有什么特别的吗?希望检索会很快,这样就不会影响这个伟大的数据结构的整体性能
更新3
我尝试了一个简单的nstore
并检索RangeMap
对象。这似乎很有效。唯一的问题是生成的文件大约是1GB,我将有大约1000个这样的文件。为此,我可以忍受TB的存储空间,但我想知道是否还有其他方法可以更高效地存储它,而不会对检索性能造成太大的影响。另请参见此处:
更新4-范围图
错误
不幸的是,RangeMap
有一个bug。感谢来自帕尔蒙克斯的布劳瑟鲁克指出这一点。例如,创建一个具有$max_lenght=10
且作为单个范围[6,2]
的对象。然后查询[7,8]
。答案应该是1
,而不是0
我认为这个更新包应该可以完成以下工作:
use strict;
use warnings;
package FastRanges;
sub new($$$) {
my $class = shift;
my $max_length = shift;
my $ranges_a = shift;
my @lookup;
for ( @{$ranges_a} ) {
my ( $start, $end ) = @$_;
my @idx
= $end >= $start
? $start .. $end
: ( $start .. $max_length, 1 .. $end );
for my $i (@idx) { $lookup[$i] .= pack 'L', $end }
}
bless \@lookup, $class;
}
sub num_ranges_containing($$$) {
my $self = shift;
my ( $start, $end ) = @_; # query range coordinates
return 0
unless ( defined $self->[$start] )
; # no ranges overlap the start position of the query
if ( $end >= $start ) {
# query range is simple
# any inverted range in {LOOKUP}[$start] must contain it,
# and so does any simple range which ends at or after $end
return 0 + grep { $_ < $start or $end <= $_ } unpack 'L*',
$self->[$start];
}
else {
# query range is inverted
# only inverted ranges in {LOOKUP}[$start] which also end
# at of after $end contain it. simple ranges can't contain
# the query range
return 0 + grep { $_ < $start and $end <= $_ } unpack 'L*',
$self->[$start];
}
}
1;
使用严格;
使用警告;
包装快速范围;
次新版本($$){
我的$class=shift;
我的$max_长度=班次;
我的$ranges\u a=班次;
我的@lookup;
对于(@{$ranges_a}){
我的($start,$end)=@$\;
我的@idx
=$end>=$start
?$start..$end
:($start..$max_length,1..$end);
对于我的$i(@idx){$lookup[$i].=pack'L',$end}
}
祝福\@lookup,$class;
}
子编号\u范围\u包含($$){
我的$self=shift;
我的($start,$end)=@35;查询范围坐标
返回0
除非(定义为$self->[$start])
##没有范围与查询的开始位置重叠
如果($end>=$start){
#查询范围很简单
#{LOOKUP}[$start]中的任何反转范围都必须包含它,
#任何在$end或$end之后结束的简单范围也是如此
返回0+grep{$\<$start或$end[$start];
}
否则{
#查询范围已反转
#仅在{LOOKUP}[$start]中也结束的反转范围
#在$end之后的,包含它。简单范围不能包含它
#查询范围
返回0+grep{$\<$start和$end[$start];
}
}
1.
欢迎您发表意见。您对哪一部分有问题?到目前为止您尝试了什么?这是一项相当简单的任务:
* Iterate through the ranges
* Foreach range, check if the test range is in it.
* Profile and benchmark
这相当简单:
my $test = [ $n, $m ];
my @contains = map {
$test->[0] >= $_->[0]
and
$test->[1] <= $_->[1]
} @ranges
my$test=[$n,$m];
my@contains=map{
$test->[0]>=$\uU->[0]
及
$test->[1][1]
}@ranges
对于环绕范围,诀窍是在查看它们之前将它们分解为单独的范围。这是蛮力工作
而且,正如一个社会注意事项,你的提问率相当高:高于我对那些真正试图解决自己问题的人的期望。我认为你跑得太快了,而不是得到帮助,你实际上是在外包你的工作。这真的不是很好。我们根本没有得到报酬,而且特别是没有报酬去做分配给你的工作。如果你至少尝试了一个问题的实现,这可能会有很大的不同,但是你的许多问题似乎表明你甚至没有尝试过。非常肯定有更好的方法来做这件事,但这里有一个起点: 预处理:
- 创建两个列表,一个按范围的起始值排序,一个按范围的结束值排序
- 使用二进制搜索在开始排序列表中匹配它的开始
- 使用另一个二进制搜索来匹配其在结束排序列表中的结束
- 查找两个列表中出现的范围(@start[0..$start_index]和@end[$end_index..$#end])
- 以下是暴力解决方案的一种方法:
use strict;
use warnings;
my @ranges = ([12,80],[34,60],[34,9000],[76,743]);
# Split ranges between normal & wrapped:
my (@normal, @wrapped);
foreach my $r (@ranges) {
if ($r->[0] <= $r->[1]) {
push @normal, $r;
} else {
push @wrapped, $r;
}
}
sub count_matches
{
my ($start, $end, $max_length, $normal, $wrapped) = @_;
if ($start <= $end) {
# This is a normal range
return (grep { $_->[0] <= $start and $_->[1] >= $end } @$normal)
+ (grep { $end <= $_->[1] or $_->[0] <= $start } @$wrapped);
} else {
# This is a wrapped range
return (grep { $_->[0] <= $start and $_->[1] >= $end } @$wrapped)
# This part should probably be calculated only once:
+ (grep { $_->[0] == 1 and $_->[1] == $max_length } @$normal);
}
} # end count_matches
print count_matches(38,70, 9999, \@normal, \@wrapped)."\n";
使用严格;
使用警告;
my@ranges=([12,80]、[34,60]、[349000]、[76743]);
#正常和包裹之间的分割范围:
我的(@normal,@wrapped);
每个我的$r(@ranges){
如果($r->[0][1]){
按@normal,$r;
}否则{
按@wrapped$r;
}
}
子计数\u匹配
{
我的($start、$end、$max_length、$normal、$wrapped)=@;
如果($start[0][1]>=$end}@$normal)
+(grep{$end[1]或$\>[0][0][1]>=$end}@$wrapped)
#该部分可能只应计算一次:
+(grep{$\>[0]==1和$\>[1]=$max\u length}@$normal);
}
}#结束计数#匹配
打印计数\u匹配(38,709999,\@正常,\@包装)。“\n”;
你有很多
* Iterate through the ranges
* Foreach range, check if the test range is in it.
* Profile and benchmark
my $test = [ $n, $m ];
my @contains = map {
$test->[0] >= $_->[0]
and
$test->[1] <= $_->[1]
} @ranges
use strict;
use warnings;
my @ranges = ([12,80],[34,60],[34,9000],[76,743]);
# Split ranges between normal & wrapped:
my (@normal, @wrapped);
foreach my $r (@ranges) {
if ($r->[0] <= $r->[1]) {
push @normal, $r;
} else {
push @wrapped, $r;
}
}
sub count_matches
{
my ($start, $end, $max_length, $normal, $wrapped) = @_;
if ($start <= $end) {
# This is a normal range
return (grep { $_->[0] <= $start and $_->[1] >= $end } @$normal)
+ (grep { $end <= $_->[1] or $_->[0] <= $start } @$wrapped);
} else {
# This is a wrapped range
return (grep { $_->[0] <= $start and $_->[1] >= $end } @$wrapped)
# This part should probably be calculated only once:
+ (grep { $_->[0] == 1 and $_->[1] == $max_length } @$normal);
}
} # end count_matches
print count_matches(38,70, 9999, \@normal, \@wrapped)."\n";
my $max_length = 9999;
my @range = ( [12,80],[34,60],[34,9000] );
my @lookup;
for ( @range ) {
my ( $start, $end ) = @$_;
my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 0 .. $end );
for my $i ( @idx ) { $lookup[$i] .= pack "L", $end }
}
sub num_ranges_containing {
my ( $start, $end ) = @_;
return 0 unless defined $lookup[$start];
# simple ranges can be contained in inverted ranges,
# but inverted ranges can only be contained in inverted ranges
my $counter = ( $start <= $end )
? sub { 0 + grep { $_ < $start or $end <= $_ } }
: sub { 0 + grep { $_ < $start and $end <= $_ } };
return $counter->( unpack 'L*', $lookup[$start] );
}
package RangeMap;
sub new {
my $class = shift;
my $max_length = shift;
my @lookup;
for ( @_ ) {
my ( $start, $end ) = @$_;
my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 0 .. $end );
for my $i ( @idx ) { $lookup[$i] .= pack 'L', $end }
}
bless \@lookup, $class;
}
sub num_ranges_containing {
my $self = shift;
my ( $start, $end ) = @_;
return 0 unless defined $self->[$start];
# simple ranges can be contained in inverted ranges,
# but inverted ranges can only be contained in inverted ranges
my $counter = ( $start <= $end )
? sub { 0 + grep { $_ < $start or $end <= $_ } }
: sub { 0 + grep { $_ < $start and $end <= $_ } };
return $counter->( unpack 'L*', $self->[$start] );
}
package main;
my $rm = RangeMap->new( 9999, [12,80],[34,60],[34,9000] );
my @ranges = (
{ START => 10, END => 100 },
{ START => 30, END => 90 },
{ START => 50, END => 80 },
{ START => 180, END => 30 }
);
my @intervals;
for my $range ( @ranges ) {
my $int = new Number::Interval( Min => $range->{START},
Max => $range->{END} );
push @intervals, $int;
}
my $num_overlap = 0;
my $checkinterval = new Number::Interval( Min => $min, Max => $max );
for my $int ( @intervals ) {
$num_overlap++ if $checkinterval->intersection( $int );
}
package SimpleRange;
sub new {
my $class = shift;
my ($m, $n) = @_;
bless { start => $m, end => $n }, $class;
}
sub start { shift->{start} }
sub end { shift->{end} }
sub covers {
# Returns true if the range covers some other range.
my ($self, $other) = @_;
return 1 if $self->start <= $other->start
and $self->end >= $other->end;
return;
}
package WrappingRange;
sub new {
my $class = shift;
my ($raw_range, $MIN, $MAX) = @_;
my ($m, $n) = @$raw_range;
# Handle special case: a range that wraps all the way around.
($m, $n) = ($MIN, $MAX) if $m == $n + 1;
my $self = {min => $MIN, max => $MAX};
if ($m <= $n){
$self->{top} = SimpleRange->new($m, $n);
$self->{wrap} = undef;
}
else {
$self->{top} = SimpleRange->new($m, $MAX);
$self->{wrap} = SimpleRange->new($MIN, $n);
}
bless $self, $class;
}
sub top { shift->{top} }
sub wrap { shift->{wrap} }
sub is_simple { ! shift->{wrap} }
sub simple_ranges {
my $self = shift;
return $self->is_simple ? $self->top : ($self->top, $self->wrap);
}
sub covers {
my @selfR = shift->simple_ranges;
my @otherR = shift->simple_ranges;
while (@selfR and @otherR){
if ( $selfR[0]->covers($otherR[0]) ){
shift @otherR;
}
else {
shift @selfR;
}
}
return if @otherR;
return 1;
}
package main;
main();
sub main {
my ($MIN, $MAX) = (0, 200);
my @raw_ranges = (
[10, 100], [30, 90], [50, 80], [$MIN, $MAX],
[180, 30],
[$MAX, $MAX - 1], [$MAX, $MAX - 2],
[50, 49], [50, 48],
);
my @wrapping_ranges = map WrappingRange->new($_, $MIN, $MAX), @raw_ranges;
my @tests = ( [1, 10], [30, 70], [160, 10], [190, 5] );
for my $t (@tests){
$t = WrappingRange->new($t, $MIN, $MAX);
my @covers = map $_->covers($t) ? 1 : 0, @wrapping_ranges;
my $n;
$n += $_ for @covers;
print "@covers N=$n\n";
}
}
0 0 0 1 1 1 1 1 1 N=6
1 1 0 1 0 1 1 1 0 N=6
0 0 0 1 0 1 0 1 1 N=4
0 0 0 1 1 1 0 1 1 N=5