使用Java中的Apache POI将16位字符写入.xlsx文件_Java_Apache_Apache Poi_Xlsx

使用Java中的Apache POI将16位字符写入.xlsx文件

java apache

使用Java中的Apache POI将16位字符写入.xlsx文件,java,apache,apache-poi,xlsx,Java,Apache,Apache Poi,Xlsx,我在Apache POI中遇到问题。问题是，我试图将一个16位字符值（如CJK统一表意文字扩展名B）放入.xlsx文件。但是，单元格值在生成的.xlsx文件中变成了问号（如？？）有人知道如何用.xlsx格式处理Apache POI中的16位字符值吗我的POI版本是3.14 代码示例如下所示： XSSFWorkbook workbook = new XSSFWorkbook(); XSSFSheet sheet = workbook.createSheet("Test"); XSSFRow

我在Apache POI中遇到问题。问题是，我试图将一个16位字符值（如CJK统一表意文字扩展名B）放入.xlsx文件。但是，单元格值在生成的.xlsx文件中变成了问号（如？？）

有人知道如何用.xlsx格式处理Apache POI中的16位字符值吗

我的POI版本是3.14

代码示例如下所示：

XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("Test");

XSSFRow row1 = sheet.createRow(0);
XSSFCell r1c1 = row1.createCell(0);
r1c1.setCellValue("The problem exists. But not with 16 bit (2 byte) Unicode characters from 0x0000
 to  0xFFFF
. It is with characters which needs more than 2 byte in Unicode encoding. Those are the characters which where mentioned as Unicode code points
 in Java Character: "Unicode code point is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit is used for 16-bit char values that are code units of the UTF-16 encoding." The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters (Characters whose code points are greater than U+FFFF) are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF). 

The problem is with org.apache.xmlbeans.impl.store.Saver
.  This works with a private char[] _buf
. But since char
 max value is 0xFFFF
, Unicode codepoints from 0x10000
 to 0x10FFFF
 are not possible to store in char
. So the will be stored as a pair of char values.

There is a method

    /**
     * Test if a character is valid in xml character content. See
     * http://www.w3.org/TR/REC-xml#NT-Char
     */

    private boolean isBadChar ( char ch )
    {
        return ! (
            (ch >= 0x20 && ch <= 0xD7FF ) ||
            (ch >= 0xE000 && ch <= 0xFFFD) ||
            (ch >= 0x10000 && ch <= 0x10FFFF) ||
            (ch == 0x9) || (ch == 0xA) || (ch == 0xD)
            );
    }

xssf工作簿=新的xssf工作簿（）；
XSSFSheet sheet=workbook.createSheet（“测试”）；
XSSFRow row1=sheet.createRow（0）；
XSSFCell r1c1=row1.createCell（0）；
r1c1.setCellValue（“问题存在。但不是使用从0x0000
到0xFFFF
的16位（2字节）Unicode字符。而是使用Unicode编码中需要超过2字节的字符。这些字符在以下章节中称为Unicode代码点
：Unicode代码点用于U+0000和U+10FFFF之间的字符值，Unicode代码单位用于16位字符值，这些字符值是UTF-16编码的代码单位。“Java平台在字符数组、字符串和字符串缓冲区类中使用UTF-16表示法。在这种表示法中，补充字符（代码点大于U+FFFF的字符）表示为一对字符值，第一个来自高代理项范围（\uD800-\uDBFF），第二个来自低代理项范围（\uDC00-\uDFFF）
问题在于org.apache.xmlbeans.impl.store.Saver
但是由于char
的最大值是0xFFFF
，所以从0x10000
到0x10FFFF
的Unicode码点不可能存储在char
中。因此，将作为一对字符值存储
有一种方法
    /**
     * Test if a character is valid in xml character content. See
     * http://www.w3.org/TR/REC-xml#NT-Char
     */
    private boolean isBadChar ( char ch )
    {
        return ! (
            (ch >= 0x20 && ch <= 0xFFFD ) ||
            (ch == 0x9) || (ch == 0xA) || (ch == 0xD)
            );
    }

在static final class optimized for speedsaver
和static final class TextSaver
中
编译Saver.java

将xmlbeans-2.6.0.jar的备份存储在类路径之外的某个位置
将xmlbeans-2.6.0.jar
->/org/apache/xmlbeans/impl/store/
中的Saver$OptimizedForSpeedSaver.class
和Saver$TextSaver.class
替换为新的compiiled
现在U+10000以上的Unicode代码点将存储在sharedStrings.xml
中

免责声明：
这没有经过很好的测试。所以不要在生产中使用它。这里显示它只是为了描述问题。也许xmlbeans.apache.org
上的一些程序员会找到时间用org.apache.xmlbeans.impl.store.Saver
正确地解决问题

更新
现在有可用的修补程序。此修补程序已包含

更新
现在有可用的修补程序。此修补程序也已包含
它是：
/**
*测试字符在xml字符内容中是否有效。请参阅
* http://www.w3.org/TR/REC-xml#NT-煤焦
*/
静态布尔值isBadChar（char ch）
{
回来(
字符。isHighSurrogate（ch）||
字符。isLowSurrogate（ch）||
（ch>=0x20&&ch=0xE000&&ch=0x10000&&ch您能分享一些将这些字符写入excel的代码吗？谢谢您的回复。添加了示例代码。也可以尝试u-转义文本“\u4e03”
(七) - 因此java源代码编码不能干扰。@AxelRichter U+20000=“\ud840\udc00”
（2个UTF-16字符）==新字符串（新int[]{0x20000}，0，1）
。不过，谢谢，实际上可能有两个字符的错误：单字符处理等等。（然而，我的评论只是想简单地排除不太可能出现的错误，java编译器使用的是另一种编码，而不是编辑器。）@Joop Eggen：这是一个很好的观点，谢谢。我没有想到这一点。因此我们可以修补org.apache.xmlbeans.impl.store.Saver，它不应该排除高代理范围（\uD800-\uDBFF）以及低代理项范围（\uDC00-\uDFFF）作为坏字符。今天下午将尝试。嗨，Axel Ricter。谢谢你的解释！这是否意味着无法使用POI将16位字符放入.xlsx？@leonlai:16位字符没有问题。Unicode字符需要超过16位。请参阅我在你问题下面的最后一条评论。我将尝试一个修补程序is Welley.Hi Axel Ricter。谢谢您的解释，但是我们如何解决org.apache.xmlbeans.impl.store.Saver中的问题？很抱歉，我可以问一下如何将这些类替换为xmlbeans-2.6.0.jar？我试图替换这些文件，但它显示以下消息：0错误：重复文件名：1 org\apache\xmlbeans\xml\stream\Location.class 2 org\apache\xmlbeans\xml\stream\Location。class@Alex里克特：谢谢你的帮助。现在没关系了。我得到了xmlbeans源代码，更新了Saver.java并构建了新的jar。但是，我可以问一下，在更改了你建议的代码后，为什么这样可以解决问题吗？
/**
 * Test if a character is valid in xml character content. See
 * http://www.w3.org/TR/REC-xml#NT-Char
 */
static boolean isBadChar ( char ch )
{
    return ! (
        Character.isHighSurrogate(ch) ||
        Character.isLowSurrogate(ch) ||
        (ch >= 0x20 && ch <= 0xD7FF ) ||
        (ch >= 0xE000 && ch <= 0xFFFD) ||
        (ch >= 0x10000 && ch <= 0x10FFFF) ||
        (ch == 0x9) || (ch == 0xA) || (ch == 0xD)
    );
}