Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/364.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 将字节从韩语转换为utf-8,我在这里没有得到什么?_Java_Utf 8_Character Encoding_Internationalization - Fatal编程技术网

Java 将字节从韩语转换为utf-8,我在这里没有得到什么?

Java 将字节从韩语转换为utf-8,我在这里没有得到什么?,java,utf-8,character-encoding,internationalization,Java,Utf 8,Character Encoding,Internationalization,我的理解显然有些不完整。如果我运行下面的代码,我希望看到: translateTest:: start start_korean: (6) c0 af c8 f1 c8 c6 expected_utf8: (6) c7 20 d7 6c d6 c8 found_utf8: (6) c7 20 d7 6c d6 c8 expected utf8 matches found? true 我得到的是: translateTest:: start start_korean

我的理解显然有些不完整。如果我运行下面的代码,我希望看到:

 translateTest:: start
  start_korean: (6)  c0 af c8 f1 c8 c6
 expected_utf8: (6)  c7 20 d7 6c d6 c8
    found_utf8: (6)  c7 20 d7 6c d6 c8
 expected utf8 matches found? true
我得到的是:

 translateTest:: start
  start_korean: (6)  c0 af c8 f1 c8 c6
 expected_utf8: (6)  c7 20 d7 6c d6 c8
    found_utf8: (9)  ec 9c a0 ed 9d ac ed 9b 88
 expected utf8 matches found? false
我认为创建字符串,将字节声明为x-windows-949,然后将字节获取为utf-8将把它们从一个转换为另一个。显然,我对这一点并不正确

public class translateTest {

  public static void main (String [] Argv) {
      (new translateTest()).translate();
  }

  void translate() {
    System.out.println("translateTest:: start");

    try {

      // pages below linked from http://msdn.microsoft.com/en-US/goglobal/cc305154

      // Please ignore the lame bytesToHex helper method. Including it for completeness.

      // from http://msdn.microsoft.com/en-US/goglobal/gg696909
      //
      // 0xC0AF =  U+C720 = HANGUL SYLLABLE IEUNG YU

      // from http://msdn.microsoft.com/en-US/goglobal/gg696960
      //
      // 0xC8F1 =  U+D76C = HANGUL SYLLABLE HIEUH YI

      // also from http://msdn.microsoft.com/en-US/goglobal/gg696960
      //
      // 0xC8C6 =  U+D6C8 = HANGUL SYLLABLE HIEUH U NIEUN

      byte[] start_korean = new byte[] { (byte)0xC0, (byte)0xAF, (byte)0xC8, (byte)0xF1, (byte)0xC8, (byte)0xC6 };
      byte[] expected_utf8 = new byte[] { (byte)0xC7, (byte)0x20, (byte)0xD7, (byte)0x6C, (byte)0xD6, (byte)0xC8 };
      String str = new String(start_korean, "x-windows-949");
      byte[] found_utf8 = str.getBytes("utf8");

      boolean isEqual = java.util.Arrays.equals(expected_utf8, found_utf8);

      System.out.println(" start_korean: "+bytesToHex(start_korean));
      System.out.println("expected_utf8: "+bytesToHex(expected_utf8));
      System.out.println("   found_utf8: "+bytesToHex(found_utf8));

      System.out.println("expected utf8 matches found? "+isEqual);

    } catch (java.io.UnsupportedEncodingException uee) {
      System.err.println(uee.getMessage());
    }
  }

  public static String bytesToHex(byte[] b) {
    StringBuffer str = new StringBuffer("("+b.length+") ");
    for (int idx = 0; idx < b.length; idx++) {
      str.append(" "+byteToHex(b[idx]));
    }
    return str.toString();
  }

  public static String byteToHex(byte b) {
    String hex = Integer.toHexString(b);
    while (hex.length() < 2) hex = "0"+hex;
    if (hex.length() > 2)
      hex = hex.substring(hex.length()-2);
    return hex;
  }
}
公共类translateTest{
公共静态void main(字符串[]Argv){
(新的translateTest()).translate();
}
void translate(){
System.out.println(“translateTest::start”);
试一试{
//下面的页面链接自http://msdn.microsoft.com/en-US/goglobal/cc305154
//请忽略lame bytesToHex帮助程序方法。包括它以确保完整性。
//从http://msdn.microsoft.com/en-US/goglobal/gg696909
//
//0xC0AF=U+C720=韩语音节IEUNG YU
//从http://msdn.microsoft.com/en-US/goglobal/gg696960
//
//0xC8F1=U+D76C=HANGUL音节HIEUH-YI
//也来自http://msdn.microsoft.com/en-US/goglobal/gg696960
//
//0xC8C6=U+D6C8=韩语音节HIEUH U NIEUN
字节[]开始=新字节[]{(字节)0xC0,(字节)0xAF,(字节)0xC8,(字节)0xF1,(字节)0xC8,(字节)0xC6};
字节[]应为_utf8=新字节[]{(字节)0xC7,(字节)0x20,(字节)0xD7,(字节)0x6C,(字节)0xD6,(字节)0xC8};
String str=新字符串(开始为“x-windows-949”);
字节[]已找到\u utf8=str.getBytes(“utf8”);
布尔isEqual=java.util.array.equals(预期为\u utf8,发现为\u utf8);
System.out.println(“start_-korean:+bytesToHex(start_-korean));
System.out.println(“预期的_utf8:+bytesToHex(预期的_utf8));
System.out.println(“found_utf8:+bytesToHex(found_utf8));
System.out.println(“找到预期的utf8匹配?”+isEqual);
}catch(java.io.UnsupportedEncodingException uee){
System.err.println(uee.getMessage());
}
}
公共静态字符串bytesToHex(字节[]b){
StringBuffer str=新的StringBuffer(“(“+b.length+”);
for(intidx=0;idx2)
十六进制=十六进制子字符串(十六进制长度()-2);
返回十六进制;
}
}
您的问题是,“预期UTF8”值实际上是Unicode代码点,而不是这些代码点的UTF-8编码。我将此添加到您的代码中:

    StringBuilder buf = new StringBuilder();
    for (int i=0; i<str.length(); i++) buf.append(", ").append(Integer.toHexString(str.codePointAt(i)));
    System.out.println("     internal: "+buf.substring(2));
StringBuilder buf=new StringBuilder();

对于(int i=0;i@JimGarrison是正确的:如果您尝试将原始字符串转换为UTF-16 big-endian, 您将看到“预期utf-8”-

这意味着字符串是:

"\uc720\ud76c\ud6c8"

顺便说一句,%java-version java version“1.6.0_51”java(TM)SE运行时环境(build 1.6.0_51-b11-457-10M4509)java HotSpot(TM)64位服务器VM(build 20.51-b01-457,mixed mode)+1用于一个带有可编译和可运行SSCE的好问题。事实上,这是一个问题优点的模型:-),这毫无意义。韩语不是一种编码,我同意。你看代码了吗?如果我使用(新字符串(start_korean,“MS949”)),那么我得到的结果完全相同。顺便说一下,“MS949”和“x-windows-949”都给出了相同的答案。是的,我想两者都可以使用。。。医生似乎不这么说。谢谢你提供的信息。我们有大约80种不同语言设置的用户,我们正在尝试了解如何知道他们都在工作。。。。这会有帮助的。
"\uc720\ud76c\ud6c8"