Java正则表达式:用空格和括号匹配URL
使用Java正则表达式,我无法匹配带有空格(和)括号的URL,下面是一个代码示例,请您提供帮助。只有最后一个URL的Java正则表达式:用空格和括号匹配URL,java,regex,url,Java,Regex,Url,使用Java正则表达式,我无法匹配带有空格(和)括号的URL,下面是一个代码示例,请您提供帮助。只有最后一个URL的E.jpeg有效 代码: public static void main(String[] args) { String content = "Lorem ipsum https://example.com/A B 123 4.pdf https://example.com/(C.jpeg https://example.com/D).jpeg https://
E.jpeg
有效
代码:
public static void main(String[] args) {
String content = "Lorem ipsum https://example.com/A B 123 4.pdf https://example.com/(C.jpeg https://example.com/D).jpeg https://example.com/E.jpeg";
extractUrls(content);
}
public static void extractUrls(String text) {
Pattern pat = Pattern.compile("(https?)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]", Pattern.CASE_INSENSITIVE);
Matcher matcher = pat.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
https://example.com/A
https://example.com/
https://example.com/D
https://example.com/E.jpeg
输出:
public static void main(String[] args) {
String content = "Lorem ipsum https://example.com/A B 123 4.pdf https://example.com/(C.jpeg https://example.com/D).jpeg https://example.com/E.jpeg";
extractUrls(content);
}
public static void extractUrls(String text) {
Pattern pat = Pattern.compile("(https?)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]", Pattern.CASE_INSENSITIVE);
Matcher matcher = pat.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
https://example.com/A
https://example.com/
https://example.com/D
https://example.com/E.jpeg
预期输出:
https://example.com/A B 123 4.pdf
https://example.com/(C.jpeg
https://example.com/D).jpeg
https://example.com/E.jpeg
https://example.com/A B 123 4.pdf
https://example.com/(C.jpeg
https://example.com/D).jpeg
https://example.com/E.jpeg
请看下面的代码:
import java.lang.Math;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MyClass {
public static void main(String[] args) {
String content = "Lorem ipsum https://example.com/A B 123 4.pdf https://example.com/(C.jpeg https://example.com/D).jpeg https://example.com/E.jpeg";
extractUrls(content);
}
public static void extractUrls(String text) {
Pattern pat = Pattern.compile("(https?)://(([\\S]+)(\\s)?)*", Pattern.CASE_INSENSITIVE);
Matcher matcher = pat.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
输出:
https://example.com/A B 123 4.pdf
https://example.com/(C.jpeg
https://example.com/D).jpeg
https://example.com/E.jpeg
https://example.com/A B 123 4.pdf
https://example.com/(C.jpeg
https://example.com/D).jpeg
https://example.com/E.jpeg
解释:
https://example.com/A B 123 4.pdf
https://example.com/(C.jpeg
https://example.com/D).jpeg
https://example.com/E.jpeg
https://example.com/A B 123 4.pdf
https://example.com/(C.jpeg
https://example.com/D).jpeg
https://example.com/E.jpeg
我假设文件名没有两个连续的空格,如示例所示
(https?://
标识子字符串http://
或https://
在这篇文章中我们有两组:([\\S]+)(\\S)?
。它标识1个或多个字符(除空格外),后面只有1个或0个空白字符
使用字符*
可以重复多次此过程
因此,我们的表达式理解,如果有2个或更多的空格,则是两个文件名之间的分隔
我希望它能有所帮助。来自“第四只鸟”用户的回答解决了这个问题,正则表达式应该是:
http.*?\.(?:pdf|jpe?g)
尝试使用非贪婪量词
http.*.\(?:pdf | jpe?g)
或使用字符类使
更具体。我认为URL使用“+”而不是spacesHello“第四只鸟”::对于文本:->我使用了https.*.\(?:jpg | jpeg | png | pdf | doc docx)->但它从“docx”中删除了“x”,并显示为-->使用docx | doc