Java 匹配除“>；以外的任何字符除非前面有“%”字样？_Java_Regex

Java 匹配除“>；以外的任何字符除非前面有“%”字样？

java regex

Java 匹配除“>；以外的任何字符除非前面有“%”字样？,java,regex,Java,Regex,除了包含之外，匹配任何字符字符串的最快方式（处理时间）是什么？但是，如果该>前面有%（%>），则是好的我想匹配“dhg87y93..r，y9w”，“dhkajdah%>daadas%>”，“adsdsa%>/r/n（换行）%>”，甚至“但不是”adhajs>dadsadas“ 我尝试了（[^>]*（>）？[^>]*）*，但它需要太多的处理能力才能正常工作谢谢假设%>是单个字符： ([^>]|%>)* 试试^（[^%>]|%>？）*$。|前面的部分匹配所有内容，但%和除外。第二

除了包含

之外，匹配任何字符字符串的最快方式（处理时间）是什么？但是，如果该

前面有

（

%>

），则是好的

我想匹配“dhg87y93..r，y9w”，“dhkajdah%>daadas%>”，“adsdsa%>/r/n（换行）%>”，甚至“但不是”adhajs>dadsadas“

我尝试了

（[^>]*（>）？[^>]*）*

，但它需要太多的处理能力才能正常工作

谢谢

假设

%>

是单个字符：

([^>]|%>)*

试试

^（[^%>]|%>？）*$

。

前面的部分匹配所有内容，但

和

除外。第二部分匹配

，然后在其后面允许

。

您可以使用此正则表达式：

^[^>]*(?:%>[^>]*)*$

Java代码：

// anchors are implicit in String.matches(regex)

boolean isMatch = str.matches("[^>]*(?:%>[^>]*)*");

^                # start (Implicit in matches())
[^>]*            # match zero or more of any character except >
(?:              # start of non-capture group
   %>            # match %> 
   |             # OR
   [^>]*         # match zero or more of any character except >
)*               # end of non-capture group. Match zero or more of this group
$                # end (Implicit in matches())

完成所采取的步骤总数：85

正则表达式分解：

// anchors are implicit in String.matches(regex)

boolean isMatch = str.matches("[^>]*(?:%>[^>]*)*");

^                # start (Implicit in matches())
[^>]*            # match zero or more of any character except >
(?:              # start of non-capture group
   %>            # match %> 
   |             # OR
   [^>]*         # match zero or more of any character except >
)*               # end of non-capture group. Match zero or more of this group
$                # end (Implicit in matches())

我添加这个答案是为了提醒人们相信“这是最快的正则表达式…”这样的答案。同样，不，像regex101这样的网站提供的步骤数量只是一个指示，而不是保证与特定regex匹配速度的绝对数字。我已经整理了一个临时文件，它使用了问题中的所有示例和所有正则表达式的答案

import java.util.ArrayList;
import java.util.List;

public class scratch_5 {
    public static void main(String[] args) {

        List<String> tests = new ArrayList<String>() {{
            add("dhg87y93..r,y9w");
            add("dhkajdah%>daadas%>");
            add("adsdsa %>/r/n%>");
            add("but not \"adhajs>dadsadas");
        }};
        List<String> patterns = new ArrayList<String>() {{
            add("([^%>]|%>?)*");      // Leo Aso
            add("[^>]*(?:%>[^>]*)*"); // anubhava
            add("([^>]|%>)*");        // John Kugelman
        }};

        int i = 0;
        for (String test : tests) {
            System.out.println("string " + test);
            System.out.println(new String(new char[test.length() + 7]).replace("\0", "="));
            for (String pattern : patterns) {
                long startTime = System.nanoTime();
                boolean res = test.matches(pattern);
                long endTime = System.nanoTime();
                long duration = (endTime - startTime);
                System.out.format("with pattern %d: %s with duration %TQ\n", (i++ % 3) + 1, res, duration);
            }
            System.out.println();
        }
}

我们需要忽略第一个测试的长度优先结果，这是由于初始化。我们可以得出这样的结论——平均而言——第二个正则表达式是最快的，但情况并非总是如此。它表明，这取决于您匹配的字符串，哪个字符串更快。因此，这个问题的正确答案是：视情况而定
要绝对确定使用正则表达式解析特定字符串的速度，您应该知道解析器使用的策略
附录1：如果编译模式，甚至会出现不同的结果

Matcher matcher; Pattern cp; for (String test : tests) { System.out.println("string " + test); System.out.println(new String(new char[test.length() + 7]).replace("\0", "=")); for (String pattern : patterns) { cp = Pattern.compile(pattern); long startTime = System.nanoTime(); matcher = cp.matcher(test); long endTime = System.nanoTime(); long duration = (endTime - startTime); System.out.format("with pattern %d: %s with duration %TQ\n", (i++ % 3) + 1, matcher.find(), duration); } System.out.println(); }
结果如下：

string dhg87y93..r,y9w ====================== with pattern 1: true with duration 39342 with pattern 2: true with duration 2296 with pattern 3: true with duration 1520 string dhkajdah%>daadas%> ========================= with pattern 1: true with duration 2365 with pattern 2: true with duration 2428 with pattern 3: true with duration 2452 string adsdsa %>/r/n%> ====================== with pattern 1: true with duration 2449 with pattern 2: true with duration 2147 with pattern 3: true with duration 1505 string but not "adhajs>dadsadas =============================== with pattern 1: true with duration 1663 with pattern 2: true with duration 1569 with pattern 3: true with duration 2003

有一点很清楚：如果您需要加快速度，请编译您的模式。我想这就像是向唱诗班布道一样
@TimBiegeleisen会的。
%>？
部分与
%
和
%>
都匹配。我相信这个答案是“最快的方法”，因为它不需要任何回溯，尽管您应该使组不被捕获。如果与
matches（）
一起使用，
^
和
$
锚点是多余的。@Andreas:在regex101上，这个正则表达式需要187个步骤，John的rages需要203个步骤，我的正则表达式需要132个步骤。@MarcLambrichs:我进一步更新了我的正则表达式，现在只需要85个步骤。更少的步骤意味着更快地执行正则表达式。这些不仅仅是解析步骤。@anubhava精彩绝伦。我添加了一个答案来比较提供的答案。这些数字说明了问题。利奥的答案肯定不是最快的。我怀疑输入字符串中有换行符会让John的比你的更具优势。摆脱非捕获会让它更快。在所有测试用例中，非捕获都是为了节省在捕获组中花费的不必要内存。我知道它的用途。OP要求的是最快的，而不是最便宜的。当你需要解析非常大的字符串时，消耗内存也会降低执行速度。我们又来了：虽然你说的一切都是真的，但我的观点是你不给出任何数字，也不测量。无论何时你提出数字（nr步数），它们都不能解释（在某些情况下，缺乏）速度，甚至不能解释“”（空字符串），但不能解释“adhajs>dadsadas”：D