Utf 8 如何检查现有字符与新字符的连续性?
如果您正在读取一个编码为UTF-8的文件,并且读取了一个字节,那么您如何能够 确定该字节是否是现有字符的延续,而不是Utf 8 如何检查现有字符与新字符的连续性?,utf-8,Utf 8,如果您正在读取一个编码为UTF-8的文件,并且读取了一个字节,那么您如何能够 确定该字节是否是现有字符的延续,而不是 新字符的开头?如果字节的二进制值为10xxxxxx(x可以是0或1),则该字节为UTF-8连续字节。初始UTF-8字节遵循以下模式: 0xxxxxxx - start (and end) of 1-byte sequence 110xxxxx - start of 2-byte sequence (followed by one continuation byte) 1110xx
新字符的开头?如果字节的二进制值为
10xxxxxx
(x可以是0或1),则该字节为UTF-8连续字节。初始UTF-8字节遵循以下模式:
0xxxxxxx - start (and end) of 1-byte sequence
110xxxxx - start of 2-byte sequence (followed by one continuation byte)
1110xxxx - start of 3-byte sequence (followed by two continuation bytes)
11110xxx - start of 4-byte sequence (followed by three continuation bytes)
再次阅读UTF-8字节是如何编码的。