在java中使用UTF8将InputStream转换为字符串和往返
考虑以下代码片段在java中使用UTF8将InputStream转换为字符串和往返,java,arrays,string,utf,Java,Arrays,String,Utf,考虑以下代码片段 byte[] b = new byte[]{ 0, 0, 0, -127 }; // possible Byte Array // converted byte array to String using UTF-8 String s = String(b, StandardCharsets.UTF_8); 现在再次尝试将字符串转换为字节数组 b = s.getBytes(StandardCharsets.UTF_8); 现在,当我们将其与原始字节数组进行比较时,其值
byte[] b = new byte[]{ 0, 0, 0, -127 }; // possible Byte Array
// converted byte array to String using UTF-8
String s = String(b, StandardCharsets.UTF_8);
现在再次尝试将字符串转换为字节数组
b = s.getBytes(StandardCharsets.UTF_8);
现在,当我们将其与原始字节数组进行比较时,其值在往返过程中并不相同
[0, 0, 0, -17, -65, -67]
有人能建议我们如何将字符串转换回原始字节数组吗?最稳定的答案是,应该在字节数组和十六进制字符串之间进行转换,即
1字节==2个字符介于0
和F
之间,格式为UTF-8
b = s.getBytes(StandardCharsets.UTF_8);
然后将十六进制数组转换回字节数组,再转换回其他堆栈跟踪问题,以了解如何找到它们
字节到十六进制:
十六进制到字节:
虽然我无法理解您需要无效UTF-8字符串的原因,但我有一个为您解释的解决方案(将此代码粘贴到TestDrive
类(包含静态void main(string[]args)函数的可运行类)
:
public static void main(String[] args) {
byte[] bytes1 = new byte[]{0, 0, 0, -127};
int[] unsigned = toUnsignedInt(bytes1);
String utf8String = toUtf8String(unsigned);
char[] chars = utf8String.toCharArray();
byte[] bytes2 = toBytes(chars);
System.out.println(Arrays.equals(bytes1, bytes2));
}
private static int[] toSigned(byte[] unsigned) {
int[] signed = new int[unsigned.length];
for (int i = 0; i < unsigned.length; i++) {
signed[i] = 2;
}
return signed;
}
private static int[] toUnsignedInt(byte[] signed) {
int[] unsigned = new int[signed.length];
for (int i = 0; i < signed.length; i++) {
unsigned[i] = Byte.toUnsignedInt(signed[i]);
}
return unsigned;
}
private static String toUtf8String(int[] unsigned) {
char[] chars = toChars(unsigned);
return new String(chars);
}
private static char[] toChars(int[] unsigned) {
char[] chars = new char[unsigned.length];
for (int i = 0; i < unsigned.length; i++) {
chars[i] = (char) unsigned[i];
}
return chars;
}
private static byte[] toBytes(char[] chars) {
int[] unsigned = toUnsignedInt(chars);
byte[] bytes = new byte[unsigned.length];
for (int i = 0; i < unsigned.length; i++) {
bytes[i] = (byte) unsigned[i];
}
return bytes;
}
private static int[] toUnsignedInt(char[] chars) {
int[] unsigned = new int[chars.length];
for (int i = 0; i < chars.length; i++) {
unsigned[i] = (int) chars[i];
}
return unsigned;
}
publicstaticvoidmain(字符串[]args){
字节[]字节1=新字节[]{0,0,0,-127};
int[]unsigned=toUnsignedInt(字节1);
字符串utf8String=toUtf8String(无符号);
char[]chars=utf8String.toCharArray();
字节[]字节2=字节(字符);
System.out.println(array.equals(bytes1,bytes2));
}
私有静态int[]toSigned(字节[]unsigned){
int[]signed=新int[unsigned.length];
for(int i=0;i
最稳定的答案是不使用字节数组,而是将其传递,完全避免字符串和往返。字符串不是二进制数据的容器。您的字节数组不是有效的UTF-8字符串。请查看UTF-8字符有多少字节。如果您的输入流包含这些字符,则它不是从UTF-8源读取的。首先确保使用正确的编码。我仍然无法理解为什么UTF-8转换需要有符号的值这里没有InputStream。