Regex 用于数字范围的正则表达式生成器

Regex 用于数字范围的正则表达式生成器,regex,algorithm,Regex,Algorithm,我查看了stackExchange说明,算法问题是允许的主题之一。这就来了 给定一个范围的输入,其中开始数和结束数具有相同的位数(例如,2、3或4),我想编写代码来生成一组正则表达式,当依次检查一个数字时,告诉我该数字是否在原始范围内 例如:如果范围是145-387,那么146、200和280都将匹配生成的正则表达式之一,而144390(用于表示290)和445(用于表示345)将不匹配 我一直认为结果将是一个正则表达式列表,如: 14[5-9] // match 145

我查看了stackExchange说明,算法问题是允许的主题之一。这就来了

给定一个范围的输入,其中开始数和结束数具有相同的位数(例如,2、3或4),我想编写代码来生成一组正则表达式,当依次检查一个数字时,告诉我该数字是否在原始范围内

例如:如果范围是145-387,那么146、200和280都将匹配生成的正则表达式之一,而144390(用于表示290)和445(用于表示345)将不匹配

我一直认为结果将是一个正则表达式列表,如:

14[5-9]             // match 145-149
1[5-9]0-9]          // 150-199
2[0-9][0-9]         // 200-299
3[0-7][0-9]         // 300-379
38[0-7]             // 380-387
然后,软件会检查数字,看看被测试的3位数代码是否匹配其中任何一个

那么,生成表达式集的最佳方法是什么

我最近(在一系列中)想到的是:

  • 确定两个量程号不同的第一个数字(1145-1158,第一个不同的数字是第三个)
  • 对于不同的数字,确定它们的第一个数字是否相差一个以上——如果相差一个以上,则其范围会有自己的正则表达式(在我们的示例中为200-299)
  • 要获得较低的范围:对于每个其他数字:以范围开头的第一个数字作为前缀,将数字增加1,用0填充到相同的长度,并与数字和所有填充位置中有9的数字配对。在我们的示例中,增量4到5,pad得到150,生成正则表达式来处理150-199
  • 要获得更高的范围:对于每个其他数字:前缀为范围结束后的第一个数字,减量为1,将其余数字填充为0,在所有填充的0位和减量的数字中与9配对。在我们的示例中,正则表达式用于处理300-379

  • 我错过什么了吗?甚至在上面我也在掩饰一些细节,这似乎是一把算法之剑划破细节的好处。但是我想到的其他东西甚至比这更混乱。

    这里是python中的递归解决方案,它适用于任意范围的正数。其想法是将范围分为三个子范围:

    • 从开始到下一个10的倍数(如果开始不是10的倍数)
    • 从最后10的倍数到结束(如果结束不是10的倍数)
    • 这两个10的倍数之间的范围可以递归处理,方法是去掉最后一个数字,然后将正则表达式
      [0-9]
      添加到所有生成的正则表达式中
    下面的代码甚至优化了单个值的范围,如
    [1-1]
    1
    。要调用的函数是
    genrangergex
    (开始是包含的,结束是独占的):

    一个选项是(对于范围[n,m])生成regexp
    n | n+1 |……| m-1 | m
    。然而,我认为你在追求更优化的东西。您仍然可以执行基本相同的操作,使用不同的路径通过状态机生成与每个数字匹配的FSM,然后使用任何著名的FSM最小化算法生成较小的机器,然后将其转换为更精简的正则表达式(因为“正则表达式”如果没有Perl扩展,它与有限状态机同构)

    假设我们正在查看范围[107112]:

    state1:
      1 -> state2
      * -> NotOK
    state2:
      0 -> state2.0
      1 -> state2.1
      * -> NotOK
    state2.0:
      7 -> OK
      8 -> OK
      9 -> OK
      * -> NotOK
    state2.1:
      0 -> OK
      1 -> OK
      2 -> OK
      * -> NotOK
    
    我们真的不能再减少这台机器了。我们可以看到state2.0对应于RE
    [789]
    ,而state2.1对应于
    [012]
    。然后我们可以看到state2.0是
    (0[789])|(1[012])
    ,整个是
    1(0[789])|(1[012])


    关于的进一步阅读可以在维基百科上找到(以及从那里链接的页面)。

    您不能仅用字符组来满足您的需求。想象一下范围
    129-131
    。模式
    1[2-3][1-9]
    也将匹配超出范围的
    139

    因此,在本例中,您需要将最后一组更改为其他组:
    1[2-3](1 | 9)
    。现在,对于十位数和百位数,您也可以发现这种效果,这导致了一个问题,即基本上将每个有效数字表示为固定数字序列的aapattern是唯一有效的解决方案。(如果您不想使用需要跟踪溢出的算法来决定是使用
    [2-8]
    还是
    (8,9,0,1,2)

    如果自动生成图案,请保持简单:

    128-132
    
    可以写成(为了更好的可读性,我省略了不匹配的组添加
    ?:

    算法应该是ovious、for、数组、字符串连接和join

    这已经如预期的那样起作用了,但是如果您希望它更紧凑,您也可以对此进行一些“优化”:

    (128|129|130|131|132) <=>
    1(28|29|30|31|32) <=>
    1(2(8|9)|3(0|1|2))
    
    最后一步的算法在那里,寻找因式分解。一种简单的方法是根据角色位置将所有数字推送到树上:

    1
      2
        8
        9
      3
        0
        1
        2
    
    最后迭代三个,形成模式
    1(2(8 | 9)| 3(0 | 1 | 2))
    。最后一步,用
    [a-c]
    替换任何模式
    (a |(b |)*?c)

    这同样适用于
    11-29

    11-29 <=>
    (11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29) <=>   
    (1(1|2|3|4|5|7|8|9)|2(1|2|3|4|5|7|8|9)) <=>
    (1([1-9])|2([1-9]) 
    
    11-29
    (11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29)    
    (1(1|2|3|4|5|7|8|9)|2(1|2|3|4|5|7|8|9)) 
    (1([1-9])|2([1-9]) 
    
    作为补充,您现在可以继续进行因式分解:

    (1([1-9])|2([1-9]) <=>
    (1|2)[1-9] <=>
    [1-2][1-9]
    
    (1([1-9])| 2([1-9])
    (1|2)[1-9] 
    [1-2][1-9]
    
    这是我的解决方案和一个复杂度为O(logn)(n是范围的终点)的算法。我相信这是这里最简单的一个:

    基本上,将任务分为以下步骤:

  • 逐渐“减弱”范围的
    开始
  • 逐渐“减弱”范围的
    结束
  • 合并这两个
  • 所谓“弱化”,我的意思是找到这个规范中可以用简单正则表达式表示的范围的末尾
    1(2([8-9])|3([0-2]))
    
    1
      2
        8
        9
      3
        0
        1
        2
    
    11-29 <=>
    (11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29) <=>   
    (1(1|2|3|4|5|7|8|9)|2(1|2|3|4|5|7|8|9)) <=>
    (1([1-9])|2([1-9]) 
    
    (1([1-9])|2([1-9]) <=>
    (1|2)[1-9] <=>
    [1-2][1-9]
    
    145 -> 149,150 -> 199,200 -> 999,1000 -> etc.
    
    387 -> 380,379 -> 300,299 -> 0
    
    145, 149, 150, 199, 200, 299, 300, 379, 380, 387
    
    145-149, 150-199, 200-299, 300-379, 380-387
    
    14[5-9], 1[5-9][0-9], 2[0-9][0-9], 3[0-7][0-9], 38[0-7]
    
    public static int next(int num) {
        //Convert to String for easier operations
        final char[] chars = String.valueOf(num).toCharArray();
        //Go through all digits backwards
        for (int i=chars.length-1; i>=0;i--) {
            //Skip the 0 changing it to 9. For example, for 190->199
            if (chars[i]=='0') {
                chars[i] = '9';
            } else { //If any other digit is encountered, change that to 9, for example, 195->199, or with both rules: 150->199
                chars[i] = '9';
                break;
            }
        }
    
        return Integer.parseInt(String.valueOf(chars));
    }
    
    //Same thing, but reversed. 387 -> 380, 379 -> 300, etc
    public static int prev(int num) {
        final char[] chars = String.valueOf(num).toCharArray();
        for (int i=chars.length-1; i>=0;i--) {
            if (chars[i] == '9') {
                chars[i] = '0';
            } else {
                chars[i] = '0';
                break;
            }
        }
    
        return Integer.parseInt(String.valueOf(chars));
    }
    
    [1-9]
    [1-9][0-9]
    [1-9][0-9][0-9]
    [1-9][0-9][0-9][0-9]
    [1-9][0-9][0-9][0-9][0-9]
    [1-2][0-9][0-9][0-9][0-9][0-9]
    3[0-1][0-9][0-9][0-9][0-9]
    320[0-9][0-9][0-9]
    321[0-5][0-9][0-9]
    3216[0-4][0-9]
    32165[0-4]
    
    129
    13[0-1]
    
    package numbers;
    
    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.Iterator;
    import java.util.List;
    
    /**
     * Has methods for generating regular expressions to match ranges of numbers.
     */
    public class RangeRegexGenerator
    {
      public static void main(String[] args)
      {
        RangeRegexGenerator rrg = new RangeRegexGenerator();
    
    //    do
    //    {
    //      Scanner scanner = new Scanner(System.in);
    //      System.out.println("enter start, <return>, then end and <return>");
    //      int start = scanner.nextInt();
    //      int end = scanner.nextInt();
    //      System.out.println(String.format("for %d-%d", start, end));
    
          List<String> regexes = rrg.getRegex("0015", "0213");
          for (String s: regexes) { System.out.println(s); }
    //    } 
    //    while(true);
      }
    
      /**
       * Return a list of regular expressions that match the numbers
       * that fall within the range of the given numbers, inclusive.
       * Assumes the given strings are numbers of the the same length,
       * and 0-left-pads the resulting expressions, if necessary, to the
       * same length. 
       * @param begStr
       * @param endStr
       * @return
       */
      public List<String> getRegex(String begStr, String endStr)
      {
          int start = Integer.parseInt(begStr);
          int end   = Integer.parseInt(endStr);
          int stringLength = begStr.length();
          List<Integer> pairs = getRegexPairs(start, end);
          List<String> regexes = toRegex(pairs, stringLength);
          return regexes;
      }
    
      /**
       * Return a list of regular expressions that match the numbers
       * that fall within the range of the given numbers, inclusive.
       * @param beg
       * @param end
       * @return
       */
      public List<String> getRegex(int beg, int end)
      {
        List<Integer> pairs = getRegexPairs(beg, end);
        List<String> regexes = toRegex(pairs);
        return regexes;
      }
    
      /**
       * return the list of integers that are the paired integers
       * used to generate the regular expressions for the given
       * range. Each pair of integers in the list -- 0,1, then 2,3,
       * etc., represents a range for which a single regular expression
       * is generated.
       * @param start
       * @param end
       * @return
       */
      private List<Integer> getRegexPairs(int start, int end)
      {
          List<Integer> pairs = new ArrayList<>();
    
          ArrayList<Integer> leftPairs = new ArrayList<>();
          int middleStartPoint = fillLeftPairs(leftPairs, start, end);
          ArrayList<Integer> rightPairs = new ArrayList<>();
          int middleEndPoint = fillRightPairs(rightPairs, middleStartPoint, end);
    
          pairs.addAll(leftPairs);
          if (middleEndPoint > middleStartPoint)
          {
            pairs.add(middleStartPoint);
            pairs.add(middleEndPoint);
          }
          pairs.addAll(rightPairs);
          return pairs;
      }
    
      /**
       * print the given list of integer pairs - used for debugging.
       * @param list
       */
      @SuppressWarnings("unused")
      private void printPairList(List<Integer> list)
      {
        if (list.size() > 0)
        {
          System.out.print(String.format("%d-%d", list.get(0), list.get(1)));
          int i = 2;
          while (i < list.size())
          {
            System.out.print(String.format(", %d-%d", list.get(i), list.get(i + 1)));
            i = i + 2;
          }
          System.out.println();
        }
      }
    
      /**
       * return the regular expressions that match the ranges in the given
       * list of integers. The list is in the form firstRangeStart, firstRangeEnd, 
       * secondRangeStart, secondRangeEnd, etc.
       * @param pairs
       * @return
       */
      private List<String> toRegex(List<Integer> pairs)
      {
        return toRegex(pairs, 0);
      }
    
      /**
       * return the regular expressions that match the ranges in the given
       * list of integers. The list is in the form firstRangeStart, firstRangeEnd, 
       * secondRangeStart, secondRangeEnd, etc. Each regular expression is 0-left-padded,
       * if necessary, to match strings of the given width.
       * @param pairs
       * @param minWidth
       * @return
       */
      private List<String> toRegex(List<Integer> pairs, int minWidth)
      {
        List<String> list = new ArrayList<>();
        String numberWithWidth = String.format("%%0%dd", minWidth);
        for (Iterator<Integer> iterator = pairs.iterator(); iterator.hasNext();)
        {
          String start = String.format(numberWithWidth, iterator.next()); // String.valueOf(iterator.next());
          String end = String.format(numberWithWidth, iterator.next());
    
          list.add(toRegex(start, end));
        }
        return list;
      }
    
      /**
       * return a regular expression string that matches the range
       * with the given start and end strings.
       * @param start
       * @param end
       * @return
       */
      private String toRegex(String start, String end)
      {
        assert start.length() == end.length();
    
        StringBuilder result = new StringBuilder();
    
        for (int pos = 0; pos < start.length(); pos++)
        {
          if (start.charAt(pos) == end.charAt(pos))
          {
            result.append(start.charAt(pos));
          } else
          {
            result.append('[').append(start.charAt(pos)).append('-')
                .append(end.charAt(pos)).append(']');
          }
        }
        return result.toString();
      }
    
      /**
       * Return the integer at the end of the range that is not covered
       * by any pairs added to the list.
       * @param rightPairs
       * @param start
       * @param end
       * @return
       */
      private int fillRightPairs(List<Integer> rightPairs, int start, int end)
      {
        int firstBeginRange = end;    // the end of the range not covered by pairs
                                      // from this routine.
        int y = end;
        int x = getPreviousBeginRange(y);
    
        while (x >= start)
        {
          rightPairs.add(y);
          rightPairs.add(x);
          y = x - 1;
          firstBeginRange = y;
          x = getPreviousBeginRange(y);
        }
        Collections.reverse(rightPairs);
        return firstBeginRange;
      }
    
      /**
       * Return the integer at the start of the range that is not covered
       * by any pairs added to its list. 
       * @param leftInts
       * @param start
       * @param end
       * @return
       */
      private int fillLeftPairs(ArrayList<Integer> leftInts, int start, int end)
      {
        int x = start;
        int y = getNextLeftEndRange(x);
    
        while (y < end)
        {
          leftInts.add(x);
          leftInts.add(y);
          x = y + 1;
          y = getNextLeftEndRange(x);
        }
        return x;
      }
    
      /**
       * given a number, return the number altered such
       * that any 9s at the end of the number remain, and
       * one more 9 replaces the number before the other
       * 9s.
       * @param num
       * @return
       */
      private int getNextLeftEndRange(int num)
      {
        char[] chars = String.valueOf(num).toCharArray();
        for (int i = chars.length - 1; i >= 0; i--)
        {
          if (chars[i] == '0')
          {
            chars[i] = '9';
          } else
          {
            chars[i] = '9';
            break;
          }
        }
    
        return Integer.parseInt(String.valueOf(chars));
      }
    
      /**
       * given a number, return the number altered such that
       * any 9 at the end of the number is replaced by a 0,
       * and the number preceding any 9s is also replaced by
       * a 0.
       * @param num
       * @return
       */
      private int getPreviousBeginRange(int num)
      {
        char[] chars = String.valueOf(num).toCharArray();
        for (int i = chars.length - 1; i >= 0; i--)
        {
          if (chars[i] == '9')
          {
            chars[i] = '0';
          } else
          {
            chars[i] = '0';
            break;
          }
        }
    
        return Integer.parseInt(String.valueOf(chars));
      }
    }
    
    20-239 is covered by [2-9][0-9], 1[0-9][0-9], 2[0-3][0-9]
    
    20-239 is covered by [2-9][0-9], 1[0-9][0-9], 2[0-3][0-9]
    2 -23  is covered by [2-9],      1[0-9],      2[0-3]
    
    13-247 = 13-19, 20-239, 240-247
    20-247 =        20-239, 240-247
    13-239 = 13-19, 20-239
    20-239 =        20-239
    
    private static List<Integer> getRegexPairs(int start, int end)
    {
      List<Integer> pairs = new ArrayList<>();   
      if (start > end) return pairs; // empty range
      int firstEndingWith0 = 10*((start+9)/10); // first number ending with 0
      if (firstEndingWith0 > end) // not in range?
      {
        // start and end differ only at last digit
        pairs.add(start);
        pairs.add(end);
        return pairs;
      }
    
      if (start < firstEndingWith0) // start is not ending in 0
      {
        pairs.add(start);
        pairs.add(firstEndingWith0-1);
      }
    
      int lastEndingWith9 = 10*(end/10)-1; // last number in range ending with 9
      // all regex for the range [firstEndingWith0,lastEndingWith9] end with [0-9]
      List<Integer> pairsMiddle = getRegexPairs(firstEndingWith0/10, lastEndingWith9/10);
      for (int i=0; i<pairsMiddle.size(); i+=2)
      {
        // blow up each pair by adding all possibilities for appended digit
        pairs.add(pairsMiddle.get(i)  *10+0);
        pairs.add(pairsMiddle.get(i+1)*10+9);
      }
    
      if (lastEndingWith9 < end) // end is not ending in 9
      {
        pairs.add(lastEndingWith9+1);
        pairs.add(end);
      }
    
      return pairs;
    }
    
    ^0*(([5-9]([.][0-9]{1,2})?)|[1-9][0-9]{1}?([.][0-9]{1,2})?|[12][0-9][0-9]([.][0-9]{1,2})?|300([.]0{1,2})?)$
    
    ^0*([1-9][0-9]?([.][0-9]{1,2})?|[12][0-9][0-9]([.][0-9]{1,2})?|300([.]0{1,2})?)$
    
        // Find the next number that is advantageous for regular expressions.
        //
        // Starting at the right most decimal digit convert all zeros to nines. Upon
        // encountering the first non-zero convert it to a nine and stop. The output
        // always has the number of digits as the input.
        // examples: 100->999, 0->9, 5->9, 9->9, 14->19, 120->199, 10010->10099
        static int Next(int val)
        {
           assert(val >= 0);
    
           // keep track of how many nines to add to val.
           int addNines = 0;
    
           do {
              auto res = std::div(val, 10);
              val = res.quot;
              ++addNines;
              if (res.rem != 0) {
                 break;
              }
           } while (val != 0);
    
           // add the nines
           for (int i = 0; i < addNines; ++i) {
              val = val * 10 + 9;
           }
    
           return val;
        }
    
        // Find the previous number that is advantageous for regular expressions.
        //
        // If the number is a single digit number convert it to zero and stop. Else...
        // Starting at the right most decimal digit convert all trailing 9's to 0's
        // unless the digit is the most significant digit - change that 9 to a 1. Upon
        // encounter with first non-nine digit convert it to a zero (or 1 if most
        // significant digit) and stop. The output always has the same number of digits
        // as the input.
        // examples: 0->0, 1->0, 29->10, 999->100, 10199->10000, 10->10, 399->100
        static int Prev(int val)
        {
           assert(val >= 0);
    
           // special case all single digit numbers reduce to 0
           if (val < 10) {
              return 0;
           }
    
           // keep track of how many zeros to add to val.
           int addZeros = 0;
    
           for (;;) {
              auto res = std::div(val, 10);
              val = res.quot;
              ++addZeros;
              if (res.rem != 9) {
                 break;
              }
    
              if (val < 10) {
                 val = 1;
                 break;
              }
           }
    
           // add the zeros
           for (int i = 0; i < addZeros; ++i) {
              val *= 10;
           }
    
           return val;
        }
    
        // Create a vector of ranges that covers [start, end] that is advantageous for
        // regular expression creation. Must satisfy end>=start>=0.
        static std::vector<std::pair<int, int>> MakeRegexRangeVector(const int start,
                                                                     const int end)
        {
           assert(start <= end);
           assert(start >= 0);
    
           // keep track of the remaining portion of the range not yet placed into
           // the forward and reverse vectors.
           int remainingStart = start;
           int remainingEnd = end;
    
           std::vector<std::pair<int, int>> forward;
           while (remainingStart <= remainingEnd) {
              auto nextNum = Next(remainingStart);
              // is the next number within the range still needed.
              if (nextNum <= remainingEnd) {
                 forward.emplace_back(remainingStart, nextNum);
                 // increase remainingStart as portions of the numeric range are
                 // transfered to the forward vector.
                 remainingStart = nextNum + 1;
              } else {
                 break;
              }
           }
           std::vector<std::pair<int, int>> reverse;
           while (remainingEnd >= remainingStart) {
              auto prevNum = Prev(remainingEnd);
              // is the previous number within the range still needed.
              if (prevNum >= remainingStart) {
                 reverse.emplace_back(prevNum, remainingEnd);
                 // reduce remainingEnd as portions of the numeric range are transfered
                 // to the reverse vector.
                 remainingEnd = prevNum - 1;
              } else {
                 break;
              }
           }
    
           // is there any part of the range not accounted for in the forward and
           // reverse vectors?
           if (remainingStart <= remainingEnd) {
              // add the unaccounted for part - this is guaranteed to be expressable
              // as a single regex substring.
              forward.emplace_back(remainingStart, remainingEnd);
           }
    
           // Concatenate, in reverse order, the reverse vector to forward.
           forward.insert(forward.end(), reverse.rbegin(), reverse.rend());
    
           // Some sanity checks.
           // size must be non zero.
           assert(forward.size() > 0);
    
           // verify starting and ending points of the range
           assert(forward.front().first == start);
           assert(forward.back().second == end);
    
           return forward;
        }
    
    generateRegEx(String begStr, String endStr)
    generateRegEx(int beg, int end)
    
    regexArray - String Array where each element is a valid regular expression range.
    regexList  - List of String elements where each element is a valid regular expression range.
    
    000[6-9]
    00[1-9][0-9]
    0[1-8][0-9][0-9]
    09[0-6][0-9]
    097[0-7]