java：正则表达式_Java_Regex - Fatal编程技术网

java：正则表达式

java regex

java：正则表达式,java,regex,Java,Regex,我有一个Html字符串，其中包括很多图像标签，我需要得到标签，并改变它。例如： String imageRegex = "(<img.+(src=\".+\").+/>){1}"; String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/c

我有一个Html字符串，其中包括很多图像标签，我需要得到标签，并改变它。例如：

String imageRegex = "(<img.+(src=\".+\").+/>){1}";
String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />";
Matcher matcher = Pattern.compile(imageRegex, Pattern.CASE_INSENSITIVE).matcher(msg);
int i = 0;
while (matcher.find()) {
    i++;
    Log.i("TAG", matcher.group());
}

String imageRegex=“（）{1}”；
String str=“hello world”；
Matcher Matcher=Pattern.compile（imageRegex，Pattern.CASE\u不区分大小写）.Matcher（msg）；
int i=0；
while（matcher.find（））{
i++；
Log.i（“TAG”，matcher.group（））；
}

结果是：

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />hello world<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

你好，世界

但这不是我想要的，我想要的是结果

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

我的正则表达式怎么了

试试

（hello world）；
System.Text.RegularExpressions.MatchCollection match=System.Text.RegularExpressions.Regex.Matches（str、imageRegex、System.Text.RegularExpressions.RegexOptions.IgnoreCase）；
StringBuilder sb=新的StringBuilder（）；
foreach（System.Text.RegularExpressions.Match中的m匹配）
{
sb.附加线（m.值）；
}
System.Windows.MessageBox.Show（sb.ToString（））；

结果:

        String imageRegex = "(<img)(.*?)(/>)";
        String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />";
        System.Text.RegularExpressions.MatchCollection match = System.Text.RegularExpressions.Regex.Matches(str, imageRegex, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
        StringBuilder sb = new StringBuilder();
        foreach (System.Text.RegularExpressions.Match m in match)
        {
            sb.AppendLine(m.Value);
        }
        System.Windows.MessageBox.Show(sb.ToString());

试试

（hello world）；
System.Text.RegularExpressions.MatchCollection match=System.Text.RegularExpressions.Regex.Matches（str、imageRegex、System.Text.RegularExpressions.RegexOptions.IgnoreCase）；
StringBuilder sb=新的StringBuilder（）；
foreach（System.Text.RegularExpressions.Match中的m匹配）
{
sb.附加线（m.值）；
}
System.Windows.MessageBox.Show（sb.ToString（））；

结果:

        String imageRegex = "(<img)(.*?)(/>)";
        String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />";
        System.Text.RegularExpressions.MatchCollection match = System.Text.RegularExpressions.Regex.Matches(str, imageRegex, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
        StringBuilder sb = new StringBuilder();
        foreach (System.Text.RegularExpressions.Match m in match)
        {
            sb.AppendLine(m.Value);
        }
        System.Windows.MessageBox.Show(sb.ToString());

David M是正确的，您确实不应该尝试这样做，但您的具体问题是正则表达式中的

量词是贪婪的，因此它将匹配可能匹配的最长子字符串

有关量词的更多详细信息，请参阅。

David M是正确的，您确实不应该尝试这样做，但您的具体问题是正则表达式中的

量词是贪婪的，因此它将匹配可能匹配的最长子字符串

<>请参阅关于量词的更多细节。

< P>我不推荐使用正则表达式解析HTML。请考虑JTAX或类似的解决方案

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" /> 
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

< P> >我不推荐使用正则表达式解析HTML。请考虑JTUP或类似的解决方案

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" /> 
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

我能告诉你这个答案吗：只重签标签有什么问题吗？是的，有问题。问题是HTML不是一种常规语言，因此它不是一种很好的正则表达式分析候选语言。有时你可以在必要时让它工作（这可能是其中之一），但这有点像用旧鞋子钉钉子。它可能完成任务，但它不是真正合适的工具。正如我链接的问题的评论所说，解析和匹配之间有很大的区别。我喜欢这个答案。正则表达式处理字符串，HTML是由字符串构造的，为什么不能使用正则表达式“HTML不是一种常规语言”to do语言没有什么问题，只有字符串，为什么不能呢？我能告诉你这个答案吗：只regexing出标记有什么问题吗？是的，有。问题是HTML不是一种常规语言，所以它不是一个很好的正则表达式分析候选。有时你可以在必要的时候让它工作（这可能是其中一种情况），但这有点像用旧鞋子钉钉子。它可能完成任务，但它不是真正合适的工具。正如我链接的问题的评论所说，解析和匹配之间有很大的区别。我喜欢这个答案。正则表达式处理字符串，HTML是由字符串构造的，为什么不能使用正则表达式锡安处理HTML？“HTML不是一种常规语言”与语言无关，只有字符串，为什么不能？