Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/312.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 具有前瞻功能的正则表达式的性能/速度较差_Java_Regex_Performance_Lookahead_Regex Lookarounds - Fatal编程技术网

Java 具有前瞻功能的正则表达式的性能/速度较差

Java 具有前瞻功能的正则表达式的性能/速度较差,java,regex,performance,lookahead,regex-lookarounds,Java,Regex,Performance,Lookahead,Regex Lookarounds,我一直在观察带有几个lookahead的表达式的执行速度非常慢 我想这是由于底层的数据结构造成的,但这似乎非常极端,我想知道我是否做错了什么,或者是否有已知的解决方法 问题在于确定一组单词是否以任何顺序出现在字符串中。例如,我们想知道两个术语term1和term2是否在字符串中的某个位置。我使用表达式执行此操作: (?=.*\bterm1\b)(?=.*\bterm2\b) 但我观察到的是,这比刚开始检查要慢一个数量级 \bterm1\b 就在那时 \bterm2\b 这似乎表明我应该使用

我一直在观察带有几个lookahead的表达式的执行速度非常慢

我想这是由于底层的数据结构造成的,但这似乎非常极端,我想知道我是否做错了什么,或者是否有已知的解决方法

问题在于确定一组单词是否以任何顺序出现在字符串中。例如,我们想知道两个术语term1和term2是否在字符串中的某个位置。我使用表达式执行此操作:

(?=.*\bterm1\b)(?=.*\bterm2\b)
但我观察到的是,这比刚开始检查要慢一个数量级

\bterm1\b
就在那时

\bterm2\b
这似乎表明我应该使用一个模式数组,而不是带有lookaheads的单个模式。。。是这样吗?这似乎是错误的

下面是一个示例测试代码和结果时间:

public static void speedLookAhead() {
    Matcher m, m1, m2;
    boolean find;
    int its = 1000000;

    // create long non-matching string
    char[] str = new char[2000];
    for (int i = 0; i < str.length; i++) {
      str[i] = 'x';
    }
    String test = str.toString();

    // First method: use one expression with lookaheads
    m = Pattern.compile("(?=.*\\bterm1\\b)(?=.*\\bterm2\\b)").matcher(test);
    long time = System.currentTimeMillis();
    ;
    for (int i = 0; i < its; i++) {
      m.reset(test);
      find = m.find();
    }
    time = System.currentTimeMillis() - time;
    System.out.println(time);

    // Second method: use two expressions and AND the results
    m1 = Pattern.compile("\\bterm1\\b").matcher(test);
    m2 = Pattern.compile("\\bterm2\\b").matcher(test);
    time = System.currentTimeMillis();
    ;
    for (int i = 0; i < its; i++) {
      m1.reset(test);
      m2.reset(test);
      find = m1.find() && m2.find();
    }
    time = System.currentTimeMillis() - time;
    System.out.println(time);
  } 

您需要将每个循环放在一个单独的方法中,如果交换测试顺序,您将得到不同的结果


你能将其与test.indexOf'A'>=0和test.indexOf'B'>=0进行比较吗?正如我想象的那样,这可能要快得多?

你需要将每个循环放在一个单独的方法中,如果交换测试的顺序,你将得到不同的结果

你能将它与test.indexOf'A'>=0和test.indexOf'B'>=0进行比较吗?我想这会快得多?

你发布的正则表达式

(?=.\A\b)(?=.\B\b)
与代码中的不匹配

.(?=.*B)(?=.*A)
事实上,第一个正则表达式不可能与它看起来的匹配

你能给出一些应该匹配的东西和不匹配的东西的示例输入吗

这是解释代码中的正则表达式

Match any single character that is not a line break character «.»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*B)»
   Match any single character that is not a line break character «.*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Match the character “B” literally «B»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*A)»
   Match any single character that is not a line break character «.*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Match the character “A” literally «A»
你发布的正则表达式

(?=.\A\b)(?=.\B\b)
与代码中的不匹配

.(?=.*B)(?=.*A)
事实上,第一个正则表达式不可能与它看起来的匹配

你能给出一些应该匹配的东西和不匹配的东西的示例输入吗

这是解释代码中的正则表达式

Match any single character that is not a line break character «.»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*B)»
   Match any single character that is not a line break character «.*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Match the character “B” literally «B»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*A)»
   Match any single character that is not a line break character «.*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Match the character “A” literally «A»

这可能会缩短一些时间

贪心的 [AB].*\1[AB]

非贪婪 [AB].*\1[AB]

重做

在这个问题上我有自己的立场。一次匹配一个术语,如/term/ 与一个正则表达式中的两个术语相反,正则表达式总是花费更少的时间,因为它不需要 回溯。这很容易。然后分别做两个条件就可以了 快得多

如果您可以定义术语,使其不存在重叠的可能性,那么这就是 要走的路。ie/term1/&&/term2/

如果不调用回溯,就无法将术语组合到单个正则表达式中

也就是说,如果您真正关心重叠,那么有一些技术可以最小化重叠 回溯

/?=.*A?=.*B/与/A/&&/B/类似,只是它看起来慢了很多,两者都没有重叠

所以,如果你真的关心重叠,我强烈建议你这样做,那是有必要的 有两种方法可以结合使用以获得最大效率

/A | B。*\1?:A | B/

/A/&&/B/&&/A | B.*\1?:A | B/

最后一个将增加一小部分相对开销,但会抑制逻辑中的访问 链,要求A和B至少存在,然后检查重叠

并且,取决于A和B在字符串中的位置,/A | B.*\1?:A | B/ 也可能需要时间,但这仍然是最短的方式,有当一切 平均数

下面是一个Perl程序,它对一些可能的示例场景字符串进行基准测试

祝你好运

use strict;
use warnings;

use Benchmark ':hireswallclock';
my ($t0,$t1);

my ($term1, $term2) = ('term','m2a');
my  @samples = (
   ' xaaaaaaa term2ater  ',
   ' xaaaaaaa term2aterm ',
   ' xaaaaaaa ter2ater  ',
   ' Aaa term2ater ' . 'x 'x100 . 'xaaaaaaa mta ',
   ' Baa term      ' . 'x 'x100 . 'xaaaaaaa mta ',
   ' Caa m2a       ' . 'x 'x100 . 'xaaaaaaa term ',
   ' Daa term2a       ' . 'x 'x100 . 'xaaaaaaa term ',
);

my $rxA  = qr/$term1/;
my $rxB  = qr/$term2/;
my $rxAB = qr/ ($term1|$term2) .* (?!\1)(?:$term1|$term2) /x;


for (@samples)
{
    printf "Checking string:  '%.40s'\n-------------\n", $_;

    if (/$term1/ && /$term2/ ) {
       print "  Found possible candidates (A && B)\n";
    }
    if (/ ($term1|$term2) .* ((?!\1)(?:$term1|$term2)) /x) {
       print "  Found non-overlaped terms: '$1'  '$2'\n";
    }
    else {
       print "  No (A|B) .* (?!\\1)(A|B) terms found!\n";
    }
    print "\n   Bench\n";

    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
       /$rxA/  &&  /$rxB/;
    }
    $t1 = new Benchmark;
    print "    $rxA && $rxB\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";

    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
       /$rxAB/;
    }
    $t1 = new Benchmark;
    print "    $rxAB\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";

    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
       /$rxA/  &&  /$rxB/ && /$rxAB/;
    }
    $t1 = new Benchmark;
    print "    $rxA && $rxB &&\n    $rxAB\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";

}
输出

Checking string:  ' xaaaaaaa term2ater  '
-------------
  Found possible candidates (A && B)
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.46875 wallclock secs ( 1.47 usr +  0.00 sys =  1.47 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 3.3748 wallclock secs ( 3.34 usr +  0.00 sys =  3.34 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 5.0623 wallclock secs ( 5.06 usr +  0.00 sys =  5.06 CPU)

Checking string:  ' xaaaaaaa term2aterm '
-------------
  Found possible candidates (A && B)
  Found non-overlaped terms: 'm2a'  'term'

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.48403 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 3.89044 wallclock secs ( 3.89 usr +  0.00 sys =  3.89 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 5.40607 wallclock secs ( 5.38 usr +  0.00 sys =  5.38 CPU)

Checking string:  ' xaaaaaaa ter2ater  '
-------------
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 0.765321 wallclock secs ( 0.77 usr +  0.00 sys =  0.77 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 1.29674 wallclock secs ( 1.30 usr +  0.00 sys =  1.30 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 0.874842 wallclock secs ( 0.88 usr +  0.00 sys =  0.88 CPU)

Checking string:  ' Aaa term2ater x x x x x x x x x x x x x'
-------------
  Found possible candidates (A && B)
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.46842 wallclock secs ( 1.47 usr +  0.00 sys =  1.47 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 28.078 wallclock secs (28.08 usr +  0.00 sys = 28.08 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 29.4531 wallclock secs (29.45 usr +  0.00 sys = 29.45 CPU)

Checking string:  ' Baa term      x x x x x x x x x x x x x'
-------------
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.68716 wallclock secs ( 1.69 usr +  0.00 sys =  1.69 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 15.1563 wallclock secs (15.16 usr +  0.00 sys = 15.16 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 1.64033 wallclock secs ( 1.64 usr +  0.00 sys =  1.64 CPU)

Checking string:  ' Caa m2a       x x x x x x x x x x x x x'
-------------
  Found possible candidates (A && B)
  Found non-overlaped terms: 'm2a'  'term'

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.62448 wallclock secs ( 1.63 usr +  0.00 sys =  1.63 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 3.0154 wallclock secs ( 3.02 usr +  0.00 sys =  3.02 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 4.56226 wallclock secs ( 4.56 usr +  0.00 sys =  4.56 CPU)

Checking string:  ' Daa term2a       x x x x x x x x x x x '
-------------
  Found possible candidates (A && B)
  Found non-overlaped terms: 'm2a'  'term'

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.45252 wallclock secs ( 1.45 usr +  0.00 sys =  1.45 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 16.1404 wallclock secs (16.14 usr +  0.00 sys = 16.14 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 17.6719 wallclock secs (17.67 usr +  0.00 sys = 17.67 CPU)

这可能会缩短一些时间

贪心的 [AB].*\1[AB]

非贪婪 [AB].*\1[AB]

重做

在这个问题上我有自己的立场。一次匹配一个术语,如/term/ 与一个正则表达式中的两个术语相反,正则表达式总是花费更少的时间,因为它不需要 回溯。这很容易。然后分别做两个条件就可以了 快得多

如果您可以定义术语,使其不存在重叠的可能性,那么这就是 要走的路。ie/term1/&&/term2/

如果不调用回溯,就无法将术语组合到单个正则表达式中

也就是说,如果您真正关心重叠,那么有一些技术可以最小化重叠 回溯

/?=.*A?=.*B/与/A/&&/B/类似,只是它看起来慢了很多,两者都没有重叠

所以,如果你真的关心重叠,我强烈建议你这样做,那是有必要的 有两种方法可以结合使用以获得最大效率

/A | B。*\1?:A | B/

/A/&&/B/&&/A | B.*\1?:A | B/

最后一个将增加一小部分相对开销,但会抑制逻辑中的访问 链,要求A和B至少存在,然后检查重叠

并且,取决于A和B在字符串中的位置,/A | B.*\1?:A | B/ 也可能需要时间,但这仍然是最短的方式,有当一切 平均数

下面是一个Perl程序,它对一些可能的示例场景字符串进行基准测试

祝你好运

use strict;
use warnings;

use Benchmark ':hireswallclock';
my ($t0,$t1);

my ($term1, $term2) = ('term','m2a');
my  @samples = (
   ' xaaaaaaa term2ater  ',
   ' xaaaaaaa term2aterm ',
   ' xaaaaaaa ter2ater  ',
   ' Aaa term2ater ' . 'x 'x100 . 'xaaaaaaa mta ',
   ' Baa term      ' . 'x 'x100 . 'xaaaaaaa mta ',
   ' Caa m2a       ' . 'x 'x100 . 'xaaaaaaa term ',
   ' Daa term2a       ' . 'x 'x100 . 'xaaaaaaa term ',
);

my $rxA  = qr/$term1/;
my $rxB  = qr/$term2/;
my $rxAB = qr/ ($term1|$term2) .* (?!\1)(?:$term1|$term2) /x;


for (@samples)
{
    printf "Checking string:  '%.40s'\n-------------\n", $_;

    if (/$term1/ && /$term2/ ) {
       print "  Found possible candidates (A && B)\n";
    }
    if (/ ($term1|$term2) .* ((?!\1)(?:$term1|$term2)) /x) {
       print "  Found non-overlaped terms: '$1'  '$2'\n";
    }
    else {
       print "  No (A|B) .* (?!\\1)(A|B) terms found!\n";
    }
    print "\n   Bench\n";

    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
       /$rxA/  &&  /$rxB/;
    }
    $t1 = new Benchmark;
    print "    $rxA && $rxB\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";

    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
       /$rxAB/;
    }
    $t1 = new Benchmark;
    print "    $rxAB\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";

    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
       /$rxA/  &&  /$rxB/ && /$rxAB/;
    }
    $t1 = new Benchmark;
    print "    $rxA && $rxB &&\n    $rxAB\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";

}
输出

Checking string:  ' xaaaaaaa term2ater  '
-------------
  Found possible candidates (A && B)
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.46875 wallclock secs ( 1.47 usr +  0.00 sys =  1.47 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 3.3748 wallclock secs ( 3.34 usr +  0.00 sys =  3.34 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 5.0623 wallclock secs ( 5.06 usr +  0.00 sys =  5.06 CPU)

Checking string:  ' xaaaaaaa term2aterm '
-------------
  Found possible candidates (A && B)
  Found non-overlaped terms: 'm2a'  'term'

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.48403 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 3.89044 wallclock secs ( 3.89 usr +  0.00 sys =  3.89 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 5.40607 wallclock secs ( 5.38 usr +  0.00 sys =  5.38 CPU)

Checking string:  ' xaaaaaaa ter2ater  '
-------------
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 0.765321 wallclock secs ( 0.77 usr +  0.00 sys =  0.77 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 1.29674 wallclock secs ( 1.30 usr +  0.00 sys =  1.30 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 0.874842 wallclock secs ( 0.88 usr +  0.00 sys =  0.88 CPU)

Checking string:  ' Aaa term2ater x x x x x x x x x x x x x'
-------------
  Found possible candidates (A && B)
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.46842 wallclock secs ( 1.47 usr +  0.00 sys =  1.47 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 28.078 wallclock secs (28.08 usr +  0.00 sys = 28.08 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 29.4531 wallclock secs (29.45 usr +  0.00 sys = 29.45 CPU)

Checking string:  ' Baa term      x x x x x x x x x x x x x'
-------------
  No (A|B) .* (?!\1)(A|B) terms found!

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.68716 wallclock secs ( 1.69 usr +  0.00 sys =  1.69 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 15.1563 wallclock secs (15.16 usr +  0.00 sys = 15.16 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 1.64033 wallclock secs ( 1.64 usr +  0.00 sys =  1.64 CPU)

Checking string:  ' Caa m2a       x x x x x x x x x x x x x'
-------------
  Found possible candidates (A && B)
  Found non-overlaped terms: 'm2a'  'term'

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.62448 wallclock secs ( 1.63 usr +  0.00 sys =  1.63 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 3.0154 wallclock secs ( 3.02 usr +  0.00 sys =  3.02 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 4.56226 wallclock secs ( 4.56 usr +  0.00 sys =  4.56 CPU)

Checking string:  ' Daa term2a       x x x x x x x x x x x '
-------------
  Found possible candidates (A && B)
  Found non-overlaped terms: 'm2a'  'term'

   Bench
    (?-xism:term) && (?-xism:m2a)
    -took: 1.45252 wallclock secs ( 1.45 usr +  0.00 sys =  1.45 CPU)

    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 16.1404 wallclock secs (16.14 usr +  0.00 sys = 16.14 CPU)

    (?-xism:term) && (?-xism:m2a) &&
    (?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
    -took: 17.6719 wallclock secs (17.67 usr +  0.00 sys = 17.67 CPU)

看,领导需要回溯。要花很长时间

呃,在你的测试中,你忘了重置m2。顺便问一下,A.*B|B.*A怎么了?谢谢,我添加了m2.reset。A.*B|B.*A适用于两个术语,但对于n个术语,您需要检查所有排列。我将更改我的问题,以指定我对n-case感兴趣的字符。@biziclop A.*B|B.*A在A和B重叠时不起作用,但?=*A?*B起作用。@tchrist当问问题时,A和B实际上是A和B,只有一个字符。Look-aheads需要回溯。需要更长的时间。在测试中,您忘记重置m2。顺便问一下,A.*B|B.*A怎么了?谢谢,我添加了m2.reset。A.*B|B.*A适用于两个术语,但对于n个术语,您需要检查所有排列。我将更改我的问题,以指定我对n-case感兴趣的字符。@biziclop A.*B|B.*A在A和B重叠时不起作用,但?=.*A?*B起作用。@tchrist当提出问题时,A和B实际上是A和B,只有一个字符。正则表达式没有正确显示,因为他使用了标记来格式化它;星号被解释为标记@雨果:只需将代码块缩进四个空格,让软件进行格式化。正则表达式没有正确显示,因为他使用了标记来格式化它;星号被解释为标记@雨果:只要把代码块缩进四个空格,让软件来做格式化。谢谢,很高兴知道。但是,您仍然会得到一个数量级以上的前瞻。在这种情况下,它可能是一个精确的问题。对,这将更快,但我正在寻找一个更通用的解决方案,其中a和B可以是类似regex的\bterm\bHanks,很高兴知道。但是,您仍然会得到一个数量级以上的前瞻。在这种情况下,它可能是一个精确的问题。对,这将更快,但我正在寻找一个更通用的解决方案,其中a和B可以是类似regex的\B项\B项。不幸的是,我正在寻找n术语案例,我修改了我原来的问题以反映这一点。@Hugo-在我的帖子中添加了一些评论。谢谢。不幸的是,我正在寻找n术语的案例,我修改了我原来的问题以反映这一点。@Hugo-在我的帖子中添加了一些评论。