java.io.DataOutputStream中的writeUTF_Java_Utf 8

java.io.DataOutputStream中的writeUTF

java utf-8

java.io.DataOutputStream中的writeUTF,java,utf-8,Java,Utf 8,我知道在Java中，utf8符号需要1-4个字节。但是当我在java.io.DataInputStream/DataOutputStream中使用readUTF/writeUTF方法时，我发现该方法只解决了符号需要1-3字节的情况 static int writeUTF(String str, DataOutput out) throws IOException { int strlen = str.length(); int utflen = 0; int c, cou

我知道在Java中，utf8符号需要1-4个字节。但是当我在java.io.DataInputStream/DataOutputStream中使用readUTF/writeUTF方法时，我发现该方法只解决了符号需要1-3字节的情况

static int writeUTF(String str, DataOutput out) throws IOException {
    int strlen = str.length();
    int utflen = 0;
    int c, count = 0;

   /* use charAt instead of copying String to char array */
    for (int i = 0; i < strlen; i++) {
        c = str.charAt(i);
        if ((c >= 0x0001) && (c <= 0x007F)) {
            utflen++;
        } else if (c > 0x07FF) {
            utflen += 3;
        } else {
            utflen += 2;
        }
    }

    if (utflen > 65535)
        throw new UTFDataFormatException(
            "encoded string too long: " + utflen + " bytes");>

    byte[] bytearr = null;
    if (out instanceof DataOutputStream) {
        DataOutputStream dos = (DataOutputStream)out;
        if(dos.bytearr == null || (dos.bytearr.length < (utflen+2)))
            dos.bytearr = new byte[(utflen*2) + 2];
        bytearr = dos.bytearr;
    } else {
        bytearr = new byte[utflen+2];
    }

    bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF);
    bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF);

    int i=0;
    for (i=0; i<strlen; i++) {
       c = str.charAt(i);
       if (!((c >= 0x0001) && (c <= 0x007F))) break;
       bytearr[count++] = (byte) c;
    }

    for (;i < strlen; i++){
        c = str.charAt(i);
        if ((c >= 0x0001) && (c <= 0x007F)) {
            bytearr[count++] = (byte) c;

        } else if (c > 0x07FF) {
            bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F));
            bytearr[count++] = (byte) (0x80 | ((c >>  6) & 0x3F));
            bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
        } else {
            bytearr[count++] = (byte) (0xC0 | ((c >>  6) & 0x1F));
            bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
        }
    }
    out.write(bytearr, 0, utflen+2);
    return utflen + 2;
}

static int writeUTF（String str，DataOutput out）抛出IOException{
int strlen=str.length（）；
int-utflen=0；
int c，计数=0；
/*使用charAt而不是将字符串复制到char数组*/
对于（int i=0；i=0x0001）和（（c 0x07FF）{
utflen+=3；
}否则{
utflen+=2；
}
}
如果（utflen>65535）
抛出新的UTFDataFormatException(
“编码字符串太长：”+utflen+“字节”）；>
字节[]bytearr=null；
if（数据输出流的输出实例）{
DataOutputStream dos=（DataOutputStream）输出；
if（dos.bytearr==null | |（dos.bytearr.length<（utflen+2）））
dos.bytearr=新字节[（utflen*2）+2]；
bytearr=dos.bytearr；
}否则{
bytearr=新字节[utflen+2]；
}
bytearr[count++]=（字节）（（utflen>>>8）和0xFF）；
bytearr[count++]=（字节）（（utflen>>>0）和0xFF）；
int i=0；
对于（i=0；i=0x0001）和&（c=0x0001）和&（c 0x07FF）{
bytearr[count++]=（字节）（0xE0 |（（c>>12）和0x0F））；
bytearr[count++]=（字节）（0x80 |（（c>>6）和0x3F））；
bytearr[count++]=（字节）（0x80 |（（c>>0）和0x3F））；
}否则{
bytearr[count++]=（字节）（0xC0 |（（c>>6）和0x1F））；
bytearr[count++]=（字节）（0x80 |（（c>>0）和0x3F））；
}
}
out.write（bytearr，0，utflen+2）；
返回utflen+2；
}

为什么不解决一个符号需要4个字节的问题呢？

文档中已经解释了这一点，不过您需要进行额外的点击

docs for提到它使用了一种“编码”。该链接位于原始JavaDocs中（我不是为了这个答案才添加它的），如果您遵循它，您会看到一页解释这种编码。请特别注意摘要底部附近的部分（在进入方法摘要部分之前）：

此格式与标准UTF-8格式之间的差异如下：

•仅使用1字节、2字节和3字节格式

因此，虽然您认为UTF-8最多使用4个字节是正确的，

writeUTF

使用了一个修改版本，其中一个修改是它最多只支持3个字节。

什么是“slove”意思是？起初我以为这是一个拼写错误，但你用了两次这个词，拼写完全一样。我的意思是“解决”，我很抱歉我拼写错了。好吧，我想可能是。不幸的是，我仍然不清楚你在问什么。你说的“解决情况”是什么意思？在这种方法中，字符c有3种情况：它需要1~3字节。但在我看来，一个utf8字符可能需要1~4字节来填充。为什么它不考虑一个字符需要4字节？哦，我明白了。当我阅读原始JavaDocs时，我忽略了“修改的UTF-8”这个词。非常感谢。