Java StringEscapeUtils.escapeXml正在转换不应转换的utf8字符
Java StringEscapeUtils.escapeXml正在转换不应转换的utf8字符,java,xml,stringescapeutils,Java,Xml,Stringescapeutils,escapeXml函数正在将ѭѮ转换为ѭ Ѯ我想它不应该这样。我读到的是它只支持五个基本的XML实体(gt,lt,quot,amp,apos) 是否有只转换这五个基本xml实体的函数?公共字符串转义xml(字符串s){ public String escapeXml(String s) { return s.replaceAll("&", "&").replaceAll(">", ">").replaceAl
escapeXml
函数正在将ѭѮ
转换为ѭ Ѯ代码>我想它不应该这样。我读到的是它只支持五个基本的XML实体(gt
,lt
,quot
,amp
,apos
)
是否有只转换这五个基本xml实体的函数?公共字符串转义xml(字符串s){
public String escapeXml(String s) {
return s.replaceAll("&", "&").replaceAll(">", ">").replaceAll("<", "<").replaceAll("\"", """).replaceAll("'", "'");
}
返回s.replaceAll(“&“,”&;”).replaceAll(“>”,”).replaceAll(“
请注意,大于0x7f的Unicode字符是从3.0开始的,不是
更长的转义。如果您仍然希望此功能,您可以实现
它可以通过以下命令执行:StringEscapeUtils.ESCAPE\u XML.with(
NumericEntityEscaper.between(0x7f,Integer.MAX_值))
因此,您可能使用较旧版本的库。更新依赖项(或自己重新实现escape:这不是火箭科学)在UTF-8时代,有时首选具有可读字符的XML文档。这应该可以工作,并且字符串的重新编译只发生一次
import java.util.regex.Matcher;
import java.util.regex.Pattern;
private static final Pattern ESCAPE_XML_CHARS = Pattern.compile("[\"&'<>]");
public static String escapeXml(String s) {
Matcher m = ESCAPE_XML_CHARS.matcher(s);
StringBuffer buf = new StringBuffer();
while (m.find()) {
switch (m.group().codePointAt(0)) {
case '"':
m.appendReplacement(buf, """);
break;
case '&':
m.appendReplacement(buf, "&");
break;
case '\'':
m.appendReplacement(buf, "'");
break;
case '<':
m.appendReplacement(buf, "<");
break;
case '>':
m.appendReplacement(buf, ">");
break;
}
}
m.appendTail(buf);
return buf.toString();
}
import java.util.regex.Matcher;
导入java.util.regex.Pattern;
私有静态最终模式ESCAPE\u XML\u CHARS=Pattern.compile(“[\”&']);
公共静态字符串escapeXml(字符串s){
Matcher m=ESCAPE\u XML\u CHARS.Matcher;
StringBuffer buf=新的StringBuffer();
while(m.find()){
开关(m.group().codePointAt(0)){
案例'':
m、 阑尾置换术(buf,“);
打破
案例“&”:
m、 附件替换(buf“&;”;
打破
案例'\'':
m、 附件替换(buf,“&apos;”);
打破
案例“”:
m、 附件替换(buf,“”);
打破
}
}
m、 附尾(buf);
返回buf.toString();
}
StringEscapeUtils.escapeXml的javadoc告诉我们必须使用
StringEscapeUtils.ESCAPE_XML.with( new UnicodeEscaper(Range.between(0x7f, Integer.MAX_VALUE)) );
但必须使用unicodescaper
而不是NumericEntityEscaper
。unicodescaper
将所有内容更改为\u1234
符号,但NumericEntityEscaper
将按预期的方式转义为&;123;
package mypackage;
import org.apache.commons.lang3.StringEscapeUtils;
import org.apache.commons.lang3.text.translate.CharSequenceTranslator;
import org.apache.commons.lang3.text.translate.NumericEntityEscaper;
public class XmlEscaper {
public static void main(final String[] args) {
final String xmlToEscape = "<hello>Hi</hello>" + "_ _" + "__ __" + "___ ___" + "after "; // the line cont
// no Unicode escape
final String escapedXml = StringEscapeUtils.escapeXml(xmlToEscape);
// escape Unicode as numeric codes. For instance, escape non-breaking space as  
final CharSequenceTranslator translator = StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );
final String escapedXmlWithUnicode = translator.translate(xmlToEscape);
System.out.println("xmlToEscape: " + xmlToEscape);
System.out.println("escapedXml: " + escapedXml); // does not escape Unicode characters like non-breaking space
System.out.println("escapedXml with unicode: " + escapedXmlWithUnicode); // escapes Unicode characters
}
}
package-mypackage;
导入org.apache.commons.lang3.StringEscapeUtils;
导入org.apache.commons.lang3.text.translate.CharSequenceTranslator;
导入org.apache.commons.lang3.text.translate.numericeEntityEscaper;
公共类XmlEscaper{
公共静态void main(最终字符串[]args){
最后一个字符串xmlToEscape=“Hi”+“\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
//无Unicode转义
最终字符串escapeXml=StringEscapeUtils.escapeXml(xmlToEscape);
//将Unicode转义为数字代码。例如,将不间断空格转义为 ;
final CharSequenceTranslator translator=StringEscapeUtils.ESCAPE_XML.with(NumericEntityEscaper.between(0x7f,Integer.MAX_值));
最终字符串转义xmlWithUnicode=translator.translate(xmlToEscape);
System.out.println(“xmlToEscape:+xmlToEscape”);
System.out.println(“escapedXml:+escapedXml);//不转义Unicode字符,如不间断空格
System.out.println(“带unicode的转义XML:+escapedXmlWithUnicode);//转义unicode字符
}
}
相关: