Regex 正则表达式从HTML中提取链接_Regex

Regex 正则表达式从HTML中提取链接

regex

Regex 正则表达式从HTML中提取链接,regex,Regex,可能重复：给定包含HTML的字符串，例如：我只想使用regex获取历史数据的链接。然而，我的程序似乎没有找到链接，我也没有看到问题，因为正则表达式在测试仪上工作。你们能看看有什么问题吗我知道正则表达式不是与HTML一起使用的最佳工具，但我只是想尝试一下。谢谢大家 Pattern data=Pattern.compileArchives.*\s.*历史数据； Matcher test1=data.matcherinputHTML；而test1.find{ System.out.print

可能重复：

给定包含HTML的字符串，例如：

我只想使用regex获取历史数据的链接。然而，我的程序似乎没有找到链接，我也没有看到问题，因为正则表达式在测试仪上工作。你们能看看有什么问题吗

我知道正则表达式不是与HTML一起使用的最佳工具，但我只是想尝试一下。谢谢大家

Pattern data=Pattern.compileArchives.*\s.*历史数据； Matcher test1=data.matcherinputHTML；而test1.find{ System.out.printltest:现在匹配；//不打印 }

如果您只是想匹配“Archives\historicaldata”，那么您的正则表达式字符串应该是存档\/historicaldata事实上，您可以在模式存档中使用存档/historicaldata

。\s.*historicaldata

\s表示空白[1]，因为/Archives/directory.htm中没有空白，所以不匹配。试试看

Pattern data = Pattern.compile("Archives.*historicaldata");

[1] \s也不正确-要在模式中获得它，必须避开反斜杠，使其成为存档文件。*\s.*historicaldata

Duck，在ARRGGH之前！用正则表达式解析Html！人群向你扔鞋。太晚了，我将看到你用正则表达式解析html，并为布局、以颜色命名的css类和文件名中的.htm而不是.html提供表格

Pattern data = Pattern.compile("Archives.*\s.*historicaldata");

Pattern data = Pattern.compile("Archives.*historicaldata");