Java 正则表达式比它应该匹配的多_Java_Regex

Java 正则表达式比它应该匹配的多

java regex

Java 正则表达式比它应该匹配的多,java,regex,Java,Regex,我正在这样做： List<String> listOfLinks = new ArrayList<String>(); String regex = startMatch + "(.*)" + endMatch; Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(html); while (matcher.find()) { li

我正在这样做：

List<String> listOfLinks = new ArrayList<String>();

String regex = startMatch + "(.*)" + endMatch;
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(html);
    while (matcher.find()) {
        listOfLinks.add(matcher.group(1));
    }

我得到的结果是：

http://www.sportscraft.com.au/longline-vest--9344961510736.html" title="Longline Vest "> <img class="alpha" src="http://demandware.edgesuite.net/sits_pod19/dw/image/v2/AAJZ_PRD/on/demandware.static/Sites-Sportscraft-Site/Sites-sc-master/default/v1427554286311/images/hi-res/1102031_black_a.jpg?sw=180&amp;sh=215&amp;sm=fit" alt="Longline Vest , BLACK, hi-res" title="Longline Vest , BLACK" height="214" /> <img class="beta" src="http://demandware.edgesuite.net/sits_pod19/dw/image/v2/AAJZ_PRD/on/demandware.static/Sites-Sportscraft-Site/Sites-sc-master/default/v1427554286311/images/hi-res/1102031_black_b.jpg?sw=180&amp;sh=215&amp;sm=fit" alt="Longline Vest , BLACK, hi-res

这意味着，正则表达式的第一部分工作正常。但是第二部分titl在第一次匹配时并没有停止，它会一直运行直到找到另一个匹配

当我用同一个正则表达式测试这个问题时，我得到了正确的结果。我想我需要设置一些选项，以使这个“非贪婪”但不确定是哪个选项，因为我无法在正则表达式测试仪中重现错误。

尝试使用以下方法：

String regex = "^(.*?[^ ]) .*?";//remove ^, i have tried on your input string.
Output:
[http://www.sportscraft.com.au/longline-vest--9344961510736.html"]

非贪婪是

？

“。不过，您可能不应该使用regexp解析HTML。它容易出错。最好使用HTML解析库。@cbednarski-谢谢。事情是我需要的一些内容不是在HTML中，而是在各种JS脚本中。事实证明，Regex总体上是非常好的。只需要“先清理源html”，即删除任何长度超过2个空格的空白并删除换行符。@cbednarski-再次感谢这一点-我不知道为什么每次使用regex都会遇到这么多麻烦。我不确定如何在regex测试仪中得到“正确”的结果，除非键入了错误的内容。您没有提供原始HTML，但我尝试了一个您可能正在使用的HTML，它提供了与Java相同（不正确）的内容。@ankur在示例中，有人可以使用单引号，例如

href='example.com'

。这是有效的HTML，但会破坏您的regexp。HTML解析库将帮助您避免此类陷阱。如果您希望像浏览器一样查看网页，通过远程控制（phantom.js或selenium或类似工具）实际使用浏览器可能会更容易。谢谢，出于某种原因，我认为问题出在Java正则表达式，而不是正则表达式@cbednarski的解决方案运行良好。但我也会尝试一下

http://www.sportscraft.com.au/longline-vest--9344961510736.html

String regex = "^(.*?[^ ]) .*?";//remove ^, i have tried on your input string.
Output:
[http://www.sportscraft.com.au/longline-vest--9344961510736.html"]