Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/367.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java html标记中的文本,提供带有属性的标记名_Java_Regex - Fatal编程技术网

Java html标记中的文本,提供带有属性的标记名

Java html标记中的文本,提供带有属性的标记名,java,regex,Java,Regex,我有一根像这样的绳子- <h3 class="media__title"> <a class="media__link" href="/news/world-europe41644527" rev="video|headline"> The equestrian champion with no legs </a> </h3

我有一根像这样的绳子-

  <h3 class="media__title"> 
  <a class="media__link" href="/news/world-europe41644527" rev="video|headline">
  The equestrian champion with no legs                                                         
  </a> </h3>
但仍然没有进展。有人能告诉我在这个正则表达式模式中应该更改什么吗?

试试这个:

String regex = <h3 (.*)>((.|\s)+?)<\/h3>
String regex=((.|\s)+?)
您的方法的主要问题是。字符与行终止符不匹配

解释:

<h3 (.*)> matches an opening h3 tag together with all attributes contained (you could also use different patterns if you are interested in the attributes themselfs)

((.|\s)+?) match everything inside the h3 tag (.|s) means everything ("everything but line terminators or whitesaces")

<\/h3> the closing h3 tag (escaped because / is a regex delimiter)
将开始的h3标记与包含的所有属性一起匹配(如果您对属性本身感兴趣,也可以使用不同的模式)
((.|\s)+?)匹配h3标记内的所有内容(.| s)表示所有内容(“除行终止符或空白外的所有内容”)
结束h3标记(因/是正则表达式分隔符而转义)

请记住,现在您要查找的组是第二组,而不是第一组

如何向此正则表达式模式提供html的属性和值。我想使用具有
class=“media\uu title”
属性的h3标签。谢谢像这样的事?((.|\s)+?)链接:,或者如果您希望匹配所有具有class=“media|u title”和/或其他属性的h3,请尝试以下操作:(.|\s)+?)
String regex = <h3 class=\"medial__title\">(.+?)</h3>
String regex = <h3 (.*)>((.|\s)+?)<\/h3>
<h3 (.*)> matches an opening h3 tag together with all attributes contained (you could also use different patterns if you are interested in the attributes themselfs)

((.|\s)+?) match everything inside the h3 tag (.|s) means everything ("everything but line terminators or whitesaces")

<\/h3> the closing h3 tag (escaped because / is a regex delimiter)