Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/laravel/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 从文本中删除HTML和XML_Java_String - Fatal编程技术网

Java 从文本中删除HTML和XML

Java 从文本中删除HTML和XML,java,string,Java,String,我有一系列的文本条目,我试图清除HTML和XML。我使用的是Java Apache Commons StringEscapeUtils,一般来说,如果在字符串s上使用,它们可以很好地处理问题,如: s = unescapeHtml(s); s = unescapeXml(s); 但如果我有这样的东西: This is text. So is this. <img alt="" height="0" width="0" border="0"style="display:none"

我有一系列的文本条目,我试图清除HTML和XML。我使用的是Java Apache Commons StringEscapeUtils,一般来说,如果在字符串s上使用,它们可以很好地处理问题,如:

s = unescapeHtml(s);
s = unescapeXml(s);
但如果我有这样的东西:

    This is text. So is this. <img alt="" height="0" width="0" border="0"style="display:none"
src="http://segment-pixel.invitemedia.com/pixel?code=TechBiz
    &partnerID=167&key=segment"/><img alt="" height="0" width="0" border="0" style="display:none" src="http://pixel.quantserve.com/pixel/p-8bUhLiluj0fAw.gif?labels=pub.28834.rss.TechBiz
    .7020,cat.TechBiz.rss"/>
这是文本。这也是。
apacheutil没有任何效果

有人能推荐一种替代方法吗?

您可以尝试使用:


这将剥离所有HTML

谢谢。我试试这个。JavaSwing还有一个javax.Swing.text.html.parser.*;这很有效。
String text = Jsoup.parse(html).text();