Regex 用于数字范围的正则表达式生成器
我查看了stackExchange说明,算法问题是允许的主题之一。这就来了 给定一个范围的输入,其中开始数和结束数具有相同的位数(例如,2、3或4),我想编写代码来生成一组正则表达式,当依次检查一个数字时,告诉我该数字是否在原始范围内 例如:如果范围是145-387,那么146、200和280都将匹配生成的正则表达式之一,而144390(用于表示290)和445(用于表示345)将不匹配 我一直认为结果将是一个正则表达式列表,如:Regex 用于数字范围的正则表达式生成器,regex,algorithm,Regex,Algorithm,我查看了stackExchange说明,算法问题是允许的主题之一。这就来了 给定一个范围的输入,其中开始数和结束数具有相同的位数(例如,2、3或4),我想编写代码来生成一组正则表达式,当依次检查一个数字时,告诉我该数字是否在原始范围内 例如:如果范围是145-387,那么146、200和280都将匹配生成的正则表达式之一,而144390(用于表示290)和445(用于表示345)将不匹配 我一直认为结果将是一个正则表达式列表,如: 14[5-9] // match 145
14[5-9] // match 145-149
1[5-9]0-9] // 150-199
2[0-9][0-9] // 200-299
3[0-7][0-9] // 300-379
38[0-7] // 380-387
然后,软件会检查数字,看看被测试的3位数代码是否匹配其中任何一个
那么,生成表达式集的最佳方法是什么
我最近(在一系列中)想到的是:
我错过什么了吗?甚至在上面我也在掩饰一些细节,这似乎是一把算法之剑划破细节的好处。但是我想到的其他东西甚至比这更混乱。这里是python中的递归解决方案,它适用于任意范围的正数。其想法是将范围分为三个子范围:
- 从开始到下一个10的倍数(如果开始不是10的倍数)
- 从最后10的倍数到结束(如果结束不是10的倍数)
- 这两个10的倍数之间的范围可以递归处理,方法是去掉最后一个数字,然后将正则表达式
添加到所有生成的正则表达式中[0-9]
[1-1]
到1
。要调用的函数是genrangergex
(开始是包含的,结束是独占的):
一个选项是(对于范围[n,m])生成regexpn | n+1 |……| m-1 | m
。然而,我认为你在追求更优化的东西。您仍然可以执行基本相同的操作,使用不同的路径通过状态机生成与每个数字匹配的FSM,然后使用任何著名的FSM最小化算法生成较小的机器,然后将其转换为更精简的正则表达式(因为“正则表达式”如果没有Perl扩展,它与有限状态机同构)
假设我们正在查看范围[107112]:
state1:
1 -> state2
* -> NotOK
state2:
0 -> state2.0
1 -> state2.1
* -> NotOK
state2.0:
7 -> OK
8 -> OK
9 -> OK
* -> NotOK
state2.1:
0 -> OK
1 -> OK
2 -> OK
* -> NotOK
我们真的不能再减少这台机器了。我们可以看到state2.0对应于RE[789]
,而state2.1对应于[012]
。然后我们可以看到state2.0是(0[789])|(1[012])
,整个是1(0[789])|(1[012])
关于的进一步阅读可以在维基百科上找到(以及从那里链接的页面)。您不能仅用字符组来满足您的需求。想象一下范围
129-131
。模式1[2-3][1-9]
也将匹配超出范围的139
因此,在本例中,您需要将最后一组更改为其他组:1[2-3](1 | 9)
。现在,对于十位数和百位数,您也可以发现这种效果,这导致了一个问题,即基本上将每个有效数字表示为固定数字序列的aapattern是唯一有效的解决方案。(如果您不想使用需要跟踪溢出的算法来决定是使用[2-8]
还是(8,9,0,1,2)
)
如果自动生成图案,请保持简单:
128-132
可以写成(为了更好的可读性,我省略了不匹配的组添加?:
)
算法应该是ovious、for、数组、字符串连接和join
这已经如预期的那样起作用了,但是如果您希望它更紧凑,您也可以对此进行一些“优化”:
(128|129|130|131|132) <=>
1(28|29|30|31|32) <=>
1(2(8|9)|3(0|1|2))
最后一步的算法在那里,寻找因式分解。一种简单的方法是根据角色位置将所有数字推送到树上:
1
2
8
9
3
0
1
2
最后迭代三个,形成模式1(2(8 | 9)| 3(0 | 1 | 2))
。最后一步,用[a-c]
替换任何模式(a |(b |)*?c)
这同样适用于11-29
:
11-29 <=>
(11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29) <=>
(1(1|2|3|4|5|7|8|9)|2(1|2|3|4|5|7|8|9)) <=>
(1([1-9])|2([1-9])
11-29
(11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29)
(1(1|2|3|4|5|7|8|9)|2(1|2|3|4|5|7|8|9))
(1([1-9])|2([1-9])
作为补充,您现在可以继续进行因式分解:
(1([1-9])|2([1-9]) <=>
(1|2)[1-9] <=>
[1-2][1-9]
(1([1-9])| 2([1-9])
(1|2)[1-9]
[1-2][1-9]
这是我的解决方案和一个复杂度为O(logn)(n是范围的终点)的算法。我相信这是这里最简单的一个:
基本上,将任务分为以下步骤:
开始
结束
1(2([8-9])|3([0-2]))
1
2
8
9
3
0
1
2
11-29 <=>
(11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29) <=>
(1(1|2|3|4|5|7|8|9)|2(1|2|3|4|5|7|8|9)) <=>
(1([1-9])|2([1-9])
(1([1-9])|2([1-9]) <=>
(1|2)[1-9] <=>
[1-2][1-9]
145 -> 149,150 -> 199,200 -> 999,1000 -> etc.
387 -> 380,379 -> 300,299 -> 0
145, 149, 150, 199, 200, 299, 300, 379, 380, 387
145-149, 150-199, 200-299, 300-379, 380-387
14[5-9], 1[5-9][0-9], 2[0-9][0-9], 3[0-7][0-9], 38[0-7]
public static int next(int num) {
//Convert to String for easier operations
final char[] chars = String.valueOf(num).toCharArray();
//Go through all digits backwards
for (int i=chars.length-1; i>=0;i--) {
//Skip the 0 changing it to 9. For example, for 190->199
if (chars[i]=='0') {
chars[i] = '9';
} else { //If any other digit is encountered, change that to 9, for example, 195->199, or with both rules: 150->199
chars[i] = '9';
break;
}
}
return Integer.parseInt(String.valueOf(chars));
}
//Same thing, but reversed. 387 -> 380, 379 -> 300, etc
public static int prev(int num) {
final char[] chars = String.valueOf(num).toCharArray();
for (int i=chars.length-1; i>=0;i--) {
if (chars[i] == '9') {
chars[i] = '0';
} else {
chars[i] = '0';
break;
}
}
return Integer.parseInt(String.valueOf(chars));
}
[1-9]
[1-9][0-9]
[1-9][0-9][0-9]
[1-9][0-9][0-9][0-9]
[1-9][0-9][0-9][0-9][0-9]
[1-2][0-9][0-9][0-9][0-9][0-9]
3[0-1][0-9][0-9][0-9][0-9]
320[0-9][0-9][0-9]
321[0-5][0-9][0-9]
3216[0-4][0-9]
32165[0-4]
129
13[0-1]
package numbers;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
/**
* Has methods for generating regular expressions to match ranges of numbers.
*/
public class RangeRegexGenerator
{
public static void main(String[] args)
{
RangeRegexGenerator rrg = new RangeRegexGenerator();
// do
// {
// Scanner scanner = new Scanner(System.in);
// System.out.println("enter start, <return>, then end and <return>");
// int start = scanner.nextInt();
// int end = scanner.nextInt();
// System.out.println(String.format("for %d-%d", start, end));
List<String> regexes = rrg.getRegex("0015", "0213");
for (String s: regexes) { System.out.println(s); }
// }
// while(true);
}
/**
* Return a list of regular expressions that match the numbers
* that fall within the range of the given numbers, inclusive.
* Assumes the given strings are numbers of the the same length,
* and 0-left-pads the resulting expressions, if necessary, to the
* same length.
* @param begStr
* @param endStr
* @return
*/
public List<String> getRegex(String begStr, String endStr)
{
int start = Integer.parseInt(begStr);
int end = Integer.parseInt(endStr);
int stringLength = begStr.length();
List<Integer> pairs = getRegexPairs(start, end);
List<String> regexes = toRegex(pairs, stringLength);
return regexes;
}
/**
* Return a list of regular expressions that match the numbers
* that fall within the range of the given numbers, inclusive.
* @param beg
* @param end
* @return
*/
public List<String> getRegex(int beg, int end)
{
List<Integer> pairs = getRegexPairs(beg, end);
List<String> regexes = toRegex(pairs);
return regexes;
}
/**
* return the list of integers that are the paired integers
* used to generate the regular expressions for the given
* range. Each pair of integers in the list -- 0,1, then 2,3,
* etc., represents a range for which a single regular expression
* is generated.
* @param start
* @param end
* @return
*/
private List<Integer> getRegexPairs(int start, int end)
{
List<Integer> pairs = new ArrayList<>();
ArrayList<Integer> leftPairs = new ArrayList<>();
int middleStartPoint = fillLeftPairs(leftPairs, start, end);
ArrayList<Integer> rightPairs = new ArrayList<>();
int middleEndPoint = fillRightPairs(rightPairs, middleStartPoint, end);
pairs.addAll(leftPairs);
if (middleEndPoint > middleStartPoint)
{
pairs.add(middleStartPoint);
pairs.add(middleEndPoint);
}
pairs.addAll(rightPairs);
return pairs;
}
/**
* print the given list of integer pairs - used for debugging.
* @param list
*/
@SuppressWarnings("unused")
private void printPairList(List<Integer> list)
{
if (list.size() > 0)
{
System.out.print(String.format("%d-%d", list.get(0), list.get(1)));
int i = 2;
while (i < list.size())
{
System.out.print(String.format(", %d-%d", list.get(i), list.get(i + 1)));
i = i + 2;
}
System.out.println();
}
}
/**
* return the regular expressions that match the ranges in the given
* list of integers. The list is in the form firstRangeStart, firstRangeEnd,
* secondRangeStart, secondRangeEnd, etc.
* @param pairs
* @return
*/
private List<String> toRegex(List<Integer> pairs)
{
return toRegex(pairs, 0);
}
/**
* return the regular expressions that match the ranges in the given
* list of integers. The list is in the form firstRangeStart, firstRangeEnd,
* secondRangeStart, secondRangeEnd, etc. Each regular expression is 0-left-padded,
* if necessary, to match strings of the given width.
* @param pairs
* @param minWidth
* @return
*/
private List<String> toRegex(List<Integer> pairs, int minWidth)
{
List<String> list = new ArrayList<>();
String numberWithWidth = String.format("%%0%dd", minWidth);
for (Iterator<Integer> iterator = pairs.iterator(); iterator.hasNext();)
{
String start = String.format(numberWithWidth, iterator.next()); // String.valueOf(iterator.next());
String end = String.format(numberWithWidth, iterator.next());
list.add(toRegex(start, end));
}
return list;
}
/**
* return a regular expression string that matches the range
* with the given start and end strings.
* @param start
* @param end
* @return
*/
private String toRegex(String start, String end)
{
assert start.length() == end.length();
StringBuilder result = new StringBuilder();
for (int pos = 0; pos < start.length(); pos++)
{
if (start.charAt(pos) == end.charAt(pos))
{
result.append(start.charAt(pos));
} else
{
result.append('[').append(start.charAt(pos)).append('-')
.append(end.charAt(pos)).append(']');
}
}
return result.toString();
}
/**
* Return the integer at the end of the range that is not covered
* by any pairs added to the list.
* @param rightPairs
* @param start
* @param end
* @return
*/
private int fillRightPairs(List<Integer> rightPairs, int start, int end)
{
int firstBeginRange = end; // the end of the range not covered by pairs
// from this routine.
int y = end;
int x = getPreviousBeginRange(y);
while (x >= start)
{
rightPairs.add(y);
rightPairs.add(x);
y = x - 1;
firstBeginRange = y;
x = getPreviousBeginRange(y);
}
Collections.reverse(rightPairs);
return firstBeginRange;
}
/**
* Return the integer at the start of the range that is not covered
* by any pairs added to its list.
* @param leftInts
* @param start
* @param end
* @return
*/
private int fillLeftPairs(ArrayList<Integer> leftInts, int start, int end)
{
int x = start;
int y = getNextLeftEndRange(x);
while (y < end)
{
leftInts.add(x);
leftInts.add(y);
x = y + 1;
y = getNextLeftEndRange(x);
}
return x;
}
/**
* given a number, return the number altered such
* that any 9s at the end of the number remain, and
* one more 9 replaces the number before the other
* 9s.
* @param num
* @return
*/
private int getNextLeftEndRange(int num)
{
char[] chars = String.valueOf(num).toCharArray();
for (int i = chars.length - 1; i >= 0; i--)
{
if (chars[i] == '0')
{
chars[i] = '9';
} else
{
chars[i] = '9';
break;
}
}
return Integer.parseInt(String.valueOf(chars));
}
/**
* given a number, return the number altered such that
* any 9 at the end of the number is replaced by a 0,
* and the number preceding any 9s is also replaced by
* a 0.
* @param num
* @return
*/
private int getPreviousBeginRange(int num)
{
char[] chars = String.valueOf(num).toCharArray();
for (int i = chars.length - 1; i >= 0; i--)
{
if (chars[i] == '9')
{
chars[i] = '0';
} else
{
chars[i] = '0';
break;
}
}
return Integer.parseInt(String.valueOf(chars));
}
}
20-239 is covered by [2-9][0-9], 1[0-9][0-9], 2[0-3][0-9]
20-239 is covered by [2-9][0-9], 1[0-9][0-9], 2[0-3][0-9]
2 -23 is covered by [2-9], 1[0-9], 2[0-3]
13-247 = 13-19, 20-239, 240-247
20-247 = 20-239, 240-247
13-239 = 13-19, 20-239
20-239 = 20-239
private static List<Integer> getRegexPairs(int start, int end)
{
List<Integer> pairs = new ArrayList<>();
if (start > end) return pairs; // empty range
int firstEndingWith0 = 10*((start+9)/10); // first number ending with 0
if (firstEndingWith0 > end) // not in range?
{
// start and end differ only at last digit
pairs.add(start);
pairs.add(end);
return pairs;
}
if (start < firstEndingWith0) // start is not ending in 0
{
pairs.add(start);
pairs.add(firstEndingWith0-1);
}
int lastEndingWith9 = 10*(end/10)-1; // last number in range ending with 9
// all regex for the range [firstEndingWith0,lastEndingWith9] end with [0-9]
List<Integer> pairsMiddle = getRegexPairs(firstEndingWith0/10, lastEndingWith9/10);
for (int i=0; i<pairsMiddle.size(); i+=2)
{
// blow up each pair by adding all possibilities for appended digit
pairs.add(pairsMiddle.get(i) *10+0);
pairs.add(pairsMiddle.get(i+1)*10+9);
}
if (lastEndingWith9 < end) // end is not ending in 9
{
pairs.add(lastEndingWith9+1);
pairs.add(end);
}
return pairs;
}
^0*(([5-9]([.][0-9]{1,2})?)|[1-9][0-9]{1}?([.][0-9]{1,2})?|[12][0-9][0-9]([.][0-9]{1,2})?|300([.]0{1,2})?)$
^0*([1-9][0-9]?([.][0-9]{1,2})?|[12][0-9][0-9]([.][0-9]{1,2})?|300([.]0{1,2})?)$
// Find the next number that is advantageous for regular expressions.
//
// Starting at the right most decimal digit convert all zeros to nines. Upon
// encountering the first non-zero convert it to a nine and stop. The output
// always has the number of digits as the input.
// examples: 100->999, 0->9, 5->9, 9->9, 14->19, 120->199, 10010->10099
static int Next(int val)
{
assert(val >= 0);
// keep track of how many nines to add to val.
int addNines = 0;
do {
auto res = std::div(val, 10);
val = res.quot;
++addNines;
if (res.rem != 0) {
break;
}
} while (val != 0);
// add the nines
for (int i = 0; i < addNines; ++i) {
val = val * 10 + 9;
}
return val;
}
// Find the previous number that is advantageous for regular expressions.
//
// If the number is a single digit number convert it to zero and stop. Else...
// Starting at the right most decimal digit convert all trailing 9's to 0's
// unless the digit is the most significant digit - change that 9 to a 1. Upon
// encounter with first non-nine digit convert it to a zero (or 1 if most
// significant digit) and stop. The output always has the same number of digits
// as the input.
// examples: 0->0, 1->0, 29->10, 999->100, 10199->10000, 10->10, 399->100
static int Prev(int val)
{
assert(val >= 0);
// special case all single digit numbers reduce to 0
if (val < 10) {
return 0;
}
// keep track of how many zeros to add to val.
int addZeros = 0;
for (;;) {
auto res = std::div(val, 10);
val = res.quot;
++addZeros;
if (res.rem != 9) {
break;
}
if (val < 10) {
val = 1;
break;
}
}
// add the zeros
for (int i = 0; i < addZeros; ++i) {
val *= 10;
}
return val;
}
// Create a vector of ranges that covers [start, end] that is advantageous for
// regular expression creation. Must satisfy end>=start>=0.
static std::vector<std::pair<int, int>> MakeRegexRangeVector(const int start,
const int end)
{
assert(start <= end);
assert(start >= 0);
// keep track of the remaining portion of the range not yet placed into
// the forward and reverse vectors.
int remainingStart = start;
int remainingEnd = end;
std::vector<std::pair<int, int>> forward;
while (remainingStart <= remainingEnd) {
auto nextNum = Next(remainingStart);
// is the next number within the range still needed.
if (nextNum <= remainingEnd) {
forward.emplace_back(remainingStart, nextNum);
// increase remainingStart as portions of the numeric range are
// transfered to the forward vector.
remainingStart = nextNum + 1;
} else {
break;
}
}
std::vector<std::pair<int, int>> reverse;
while (remainingEnd >= remainingStart) {
auto prevNum = Prev(remainingEnd);
// is the previous number within the range still needed.
if (prevNum >= remainingStart) {
reverse.emplace_back(prevNum, remainingEnd);
// reduce remainingEnd as portions of the numeric range are transfered
// to the reverse vector.
remainingEnd = prevNum - 1;
} else {
break;
}
}
// is there any part of the range not accounted for in the forward and
// reverse vectors?
if (remainingStart <= remainingEnd) {
// add the unaccounted for part - this is guaranteed to be expressable
// as a single regex substring.
forward.emplace_back(remainingStart, remainingEnd);
}
// Concatenate, in reverse order, the reverse vector to forward.
forward.insert(forward.end(), reverse.rbegin(), reverse.rend());
// Some sanity checks.
// size must be non zero.
assert(forward.size() > 0);
// verify starting and ending points of the range
assert(forward.front().first == start);
assert(forward.back().second == end);
return forward;
}
generateRegEx(String begStr, String endStr)
generateRegEx(int beg, int end)
regexArray - String Array where each element is a valid regular expression range.
regexList - List of String elements where each element is a valid regular expression range.
000[6-9]
00[1-9][0-9]
0[1-8][0-9][0-9]
09[0-6][0-9]
097[0-7]