Java StringEscapeUtils.escapeXml正在转换不应转换的utf8字符

Java StringEscapeUtils.escapeXml正在转换不应转换的utf8字符,java,xml,stringescapeutils,Java,Xml,Stringescapeutils,escapeXml函数正在将ѭѮ转换为ѭ Ѯ我想它不应该这样。我读到的是它只支持五个基本的XML实体(gt,lt,quot,amp,apos) 是否有只转换这五个基本xml实体的函数?公共字符串转义xml(字符串s){ public String escapeXml(String s) { return s.replaceAll("&", "&").replaceAll(">", ">").replaceAl

escapeXml
函数正在将
ѭѮ
转换为
ѭ Ѯ我想它不应该这样。我读到的是它只支持五个基本的XML实体(
gt
lt
quot
amp
apos

是否有只转换这五个基本xml实体的函数?

公共字符串转义xml(字符串s){
public String escapeXml(String s) {
    return s.replaceAll("&", "&amp;").replaceAll(">", "&gt;").replaceAll("<", "&lt;").replaceAll("\"", "&quot;").replaceAll("'", "&apos;");
}
返回s.replaceAll(“&“,”&;”).replaceAll(“>”,”).replaceAll(“

请注意,大于0x7f的Unicode字符是从3.0开始的,不是 更长的转义。如果您仍然希望此功能,您可以实现 它可以通过以下命令执行:StringEscapeUtils.ESCAPE\u XML.with( NumericEntityEscaper.between(0x7f,Integer.MAX_值))


因此,您可能使用较旧版本的库。更新依赖项(或自己重新实现escape:这不是火箭科学)

在UTF-8时代,有时首选具有可读字符的XML文档。这应该可以工作,并且字符串的重新编译只发生一次

import java.util.regex.Matcher;
import java.util.regex.Pattern;

private static final Pattern ESCAPE_XML_CHARS = Pattern.compile("[\"&'<>]");

public static String escapeXml(String s) {
    Matcher m = ESCAPE_XML_CHARS.matcher(s);
    StringBuffer buf = new StringBuffer();
    while (m.find()) {
        switch (m.group().codePointAt(0)) {
            case '"':
                m.appendReplacement(buf, "&quot;");
            break;
            case '&':
                m.appendReplacement(buf, "&amp;");
            break;
            case '\'':
                m.appendReplacement(buf, "&apos;");
            break;
            case '<':
                m.appendReplacement(buf, "&lt;");
            break;
            case '>':
                m.appendReplacement(buf, "&gt;");
            break;
        }
    }
    m.appendTail(buf);
    return buf.toString();
}
import java.util.regex.Matcher;
导入java.util.regex.Pattern;
私有静态最终模式ESCAPE\u XML\u CHARS=Pattern.compile(“[\”&']);
公共静态字符串escapeXml(字符串s){
Matcher m=ESCAPE\u XML\u CHARS.Matcher;
StringBuffer buf=新的StringBuffer();
while(m.find()){
开关(m.group().codePointAt(0)){
案例'':
m、 阑尾置换术(buf,“);
打破
案例“&”:
m、 附件替换(buf“&;”;
打破
案例'\'':
m、 附件替换(buf,“&apos;”);
打破
案例“”:
m、 附件替换(buf,“”);
打破
}
}
m、 附尾(buf);
返回buf.toString();
}

StringEscapeUtils.escapeXml的javadoc告诉我们必须使用

StringEscapeUtils.ESCAPE_XML.with( new UnicodeEscaper(Range.between(0x7f, Integer.MAX_VALUE)) );
但必须使用
unicodescaper
而不是
NumericEntityEscaper
unicodescaper
将所有内容更改为
\u1234
符号,但
NumericEntityEscaper
将按预期的方式转义为
&;123;

package mypackage;

import org.apache.commons.lang3.StringEscapeUtils;
import org.apache.commons.lang3.text.translate.CharSequenceTranslator;
import org.apache.commons.lang3.text.translate.NumericEntityEscaper;

public class XmlEscaper {
    public static void main(final String[] args) {
        final String xmlToEscape = "<hello>Hi</hello>" + "_ _" + "__ __"  + "___ ___" + "after &nbsp;"; // the line cont

        // no Unicode escape
        final String escapedXml = StringEscapeUtils.escapeXml(xmlToEscape);

        // escape Unicode as numeric codes. For instance, escape non-breaking space as &#160;
        final CharSequenceTranslator translator = StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );
        final String escapedXmlWithUnicode = translator.translate(xmlToEscape);

        System.out.println("xmlToEscape: " + xmlToEscape);
        System.out.println("escapedXml: " + escapedXml); // does not escape Unicode characters like non-breaking space
        System.out.println("escapedXml with unicode: " + escapedXmlWithUnicode); // escapes Unicode characters
    }
}
package-mypackage;
导入org.apache.commons.lang3.StringEscapeUtils;
导入org.apache.commons.lang3.text.translate.CharSequenceTranslator;
导入org.apache.commons.lang3.text.translate.numericeEntityEscaper;
公共类XmlEscaper{
公共静态void main(最终字符串[]args){
最后一个字符串xmlToEscape=“Hi”+“\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
//无Unicode转义
最终字符串escapeXml=StringEscapeUtils.escapeXml(xmlToEscape);
//将Unicode转义为数字代码。例如,将不间断空格转义为 ;
final CharSequenceTranslator translator=StringEscapeUtils.ESCAPE_XML.with(NumericEntityEscaper.between(0x7f,Integer.MAX_值));
最终字符串转义xmlWithUnicode=translator.translate(xmlToEscape);
System.out.println(“xmlToEscape:+xmlToEscape”);
System.out.println(“escapedXml:+escapedXml);//不转义Unicode字符,如不间断空格
System.out.println(“带unicode的转义XML:+escapedXmlWithUnicode);//转义unicode字符
}
}
相关: