Java html标记中的文本,提供带有属性的标记名
我有一根像这样的绳子-Java html标记中的文本,提供带有属性的标记名,java,regex,Java,Regex,我有一根像这样的绳子- <h3 class="media__title"> <a class="media__link" href="/news/world-europe41644527" rev="video|headline"> The equestrian champion with no legs </a> </h3
<h3 class="media__title">
<a class="media__link" href="/news/world-europe41644527" rev="video|headline">
The equestrian champion with no legs
</a> </h3>
但仍然没有进展。有人能告诉我在这个正则表达式模式中应该更改什么吗?试试这个:
String regex = <h3 (.*)>((.|\s)+?)<\/h3>
String regex=((.|\s)+?)
您的方法的主要问题是。字符与行终止符不匹配
解释:
<h3 (.*)> matches an opening h3 tag together with all attributes contained (you could also use different patterns if you are interested in the attributes themselfs)
((.|\s)+?) match everything inside the h3 tag (.|s) means everything ("everything but line terminators or whitesaces")
<\/h3> the closing h3 tag (escaped because / is a regex delimiter)
将开始的h3标记与包含的所有属性一起匹配(如果您对属性本身感兴趣,也可以使用不同的模式)
((.|\s)+?)匹配h3标记内的所有内容(.| s)表示所有内容(“除行终止符或空白外的所有内容”)
结束h3标记(因/是正则表达式分隔符而转义)
请记住,现在您要查找的组是第二组,而不是第一组如何向此正则表达式模式提供html的属性和值。我想使用具有
class=“media\uu title”
属性的h3标签。谢谢像这样的事?((.|\s)+?)链接:,或者如果您希望匹配所有具有class=“media|u title”和/或其他属性的h3,请尝试以下操作:(.|\s)+?)
String regex = <h3 class=\"medial__title\">(.+?)</h3>
String regex = <h3 (.*)>((.|\s)+?)<\/h3>
<h3 (.*)> matches an opening h3 tag together with all attributes contained (you could also use different patterns if you are interested in the attributes themselfs)
((.|\s)+?) match everything inside the h3 tag (.|s) means everything ("everything but line terminators or whitesaces")
<\/h3> the closing h3 tag (escaped because / is a regex delimiter)