Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 正则表达式来匹配具有重复模式的字符串_Ruby_Regex - Fatal编程技术网

Ruby 正则表达式来匹配具有重复模式的字符串

Ruby 正则表达式来匹配具有重复模式的字符串,ruby,regex,Ruby,Regex,我正试图找到一个正则表达式,它将URL与三个或更多重复段(可能包括任意数量的目录)相匹配,例如: s1='1〕http://www.foo.com/bar/bar/bar/“ s2=http://www.foo.com/baz/biz/baz/biz/baz/biz/etc“ s3='/foo/bar/foo/bar/foo/bar/' 并且不匹配URL,如: s4='/foo/bar/foo/bar/foo/barbaz' 首先,我尝试: re1 = /((.+\/)+)\1\1/

我正试图找到一个正则表达式,它将URL与三个或更多重复段(可能包括任意数量的目录)相匹配,例如:

  • s1='1〕http://www.foo.com/bar/bar/bar/“
  • s2=http://www.foo.com/baz/biz/baz/biz/baz/biz/etc“
  • s3='/foo/bar/foo/bar/foo/bar/'
并且不匹配URL,如:

  • s4='/foo/bar/foo/bar/foo/barbaz'
首先,我尝试:

re1 = /((.+\/)+)\1\1/
有效的方法是:

re1 === s1 #=> true
re1 === s2 #=> true
但随着段数的增加,正则表达式匹配所需的时间呈指数增长:

require 'benchmark'
Benchmark.bm do |b|
  (10..15).each do |num|
    str = '/foo/bar' * num
    puts str
    b.report("#{num} repeats:") { /((.+\/)+)\1\1/ === str }
  end
end

       user     system      total        real
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
    10 repeats:  0.060000   0.000000   0.060000 (  0.054839)
    /foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
    11 repeats:  0.210000   0.000000   0.210000 (  0.213492)
    /foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
    12 repeats:  0.870000   0.000000   0.870000 (  0.871879)
    /foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
    13 repeats:  3.370000   0.010000   3.380000 (  3.399224)
    /foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
    14 repeats: 13.580000   0.110000  13.690000 ( 13.790675)
    /foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
    15 repeats: 54.090000   0.210000  54.300000 ( 54.562672)
然后,我尝试了一个类似于给定的正则表达式:

它没有性能问题,并且匹配我想要匹配的字符串:

re2 === s3 #=> true
但也匹配我不希望匹配的字符串,例如:

re2 === s4 #=> true, but should be false

我和第二个正则表达式很接近。我缺少什么?

更改为
[^\/]
。这将降低正则表达式的复杂性,因为它不会试图匹配“任何”字符

require 'benchmark'

Benchmark.bm do |b|
  (10..15).each do |num|
    str = '/foo/bar' * num
    puts str
    b.report("#{num} repeats:") { /(([^\/]+\/)+)\1\1/ === str }
  end
end

10 repeats:  0.000000   0.000000   0.000000 (  0.000015)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
11 repeats:  0.000000   0.000000   0.000000 (  0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
12 repeats:  0.000000   0.000000   0.000000 (  0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
13 repeats:  0.000000   0.000000   0.000000 (  0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
14 repeats:  0.000000   0.000000   0.000000 (  0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
15 repeats:  0.000000   0.000000   0.000000 (  0.000005)

定义

假设:

str = 'http://www.example.com/dog/baz/biz/baz/biz/baz/biz/cat/'
我们可以将
'/dog'
'/baz'
'/biz'
等定义为段。一个组由一个或多个连续段组成,例如
'/dog'
'/baz'
'/dog/baz'
'/baz'
'/baz/biz'
'/baz/biz'
,等等

问题

我的理解是,问题在于确定给定字符串是否包含三个(或更多)连续且相等的组,后跟正斜杠<代码>s2通过以下子字符串满足此测试:

'/baz/biz/baz/biz/baz/biz/'
算法

我不相信可以用一个正则表达式来做这个决定,但是我们可以编写一个正则表达式来确定是否存在至少三个(或任意数量的)连续的、相等的组,给定每个组的段数。假设这是通过名为
连续\u固定\u组\u大小?
的方法完成的,其调用如下:

contiguous_fixed_group_size?(str, segments_per_group, nbr_groups)
并返回
true
false
。为了确保字符串至少有3个连续、相等的组(对于每个组
段的给定值
),我们将此方法称为
nbr\u组=3
。我认为最好暂时推迟这种方法的建设;就目前而言,假设我们可以使用它

我采用的方法是使用每个组
段的不同值调用此方法,并确定该方法是否至少为其中一个值返回
true

主要方法

第一步是确定字符串中的段数(其中
str
包含上述字符串):

因此:

segments_per_group <= nbr_segments/nbr_groups
因此,我们可以确定
str
是否包含(至少)
nbr\u组
相邻的相等组,如下所示:

(1..nbr_segments/nbr_groups).any? do |segs_per_group|
  contiguous_fixed_group_size?(str, segs_per_group, nbr_groups)
end
  #=> true
def contiguous_fixed_group_size?(str, segments_per_group, nbr_groups)
  r = /((?:\/[^\/]+){#{segments_per_group}})\1{#{nbr_groups-1}}/ 
  str.match?(r)
end
我们可以将其包装在一个方法中:

def contiguous?(str, nbr_groups)
  nbr_segments = str.scan(/(?<!\/)\/(?!\/)/).size - 1
  (1..nbr_segments/nbr_groups).any? do |segs_per_grp|
    contiguous_fixed_group_size?(str, segs_per_grp, nbr_groups)
  end
end
为了

正则表达式是:

r #=> /((?:\/[^\/]+){2})\1{2}\//
此处以自由间距模式写入:

segments_per_group <= 8/3 => 2
(1..nbr_segments/nbr_groups).any? do |segs_per_group|
  contiguous_fixed_group_size?(str, segs_per_group, nbr_groups)
end
  #=> true
def contiguous?(str, nbr_groups)
  nbr_segments = str.scan(/(?<!\/)\/(?!\/)/).size - 1
  (1..nbr_segments/nbr_groups).any? do |segs_per_grp|
    contiguous_fixed_group_size?(str, segs_per_grp, nbr_groups)
  end
end
def contiguous_fixed_group_size?(str, segments_per_group, nbr_groups)
  r = /((?:\/[^\/]+){#{segments_per_group}})\1{#{nbr_groups-1}}/ 
  str.match?(r)
end
str = s2
segments_per_group = 2
nbr_groups = 3
r #=> /((?:\/[^\/]+){2})\1{2}\//
r = /
    (?<!\/)                    # match is not to be preceded by a forward slash
                               # (negative lookbehind)    
    (                          # begin capture group 1
      (?:                      # begin non-capture group
        \/[^\/]+               # match '/' followed by 1+ char other than '/'
      )                        # end non-capture group 
      {#{segments_per_group}}  # execute non-capture group segments_per_group times
    )                          # end capture group 1
    \1{#{nbr_groups-1}}        # execute contents of capture group 1
                               # nbr_groups-1 times 
    \/                         # match '/'
    /x                         # free-spacing regex definition mode
contiguous?(str, 3) #=> true
contiguous?(str, 2) #=> true
contiguous?(str, 1) #=> true
contiguous?(str, 4) #=> false
str = 'http://www.example.com/dog/baz/biz/baz/bix/baz/biz/cat/'
contiguous?(str, 3) #=> false
contiguous?(str, 2) #=> false
contiguous?(str, 1) #=> true