Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何检查字符串是否可在某些编码中进行编码?_Java_String_Encoding_Character Encoding - Fatal编程技术网

Java 如何检查字符串是否可在某些编码中进行编码?

Java 如何检查字符串是否可在某些编码中进行编码?,java,string,encoding,character-encoding,Java,String,Encoding,Character Encoding,以下测试在转换的拉丁文1上失败,因为非法字符被替换为值为63(问号)的字节。问题是这些字符最好引起一些异常 @Test public void testEncoding() throws UnsupportedEncodingException { final String czech = "Řízeček a šampáňo a žízeň"; // okay final byte[] bytesInLatin2 = czech.getBytes("ISO885

以下测试在转换的拉丁文1上失败,因为非法字符被替换为值为63(问号)的字节。问题是这些字符最好引起一些异常

  @Test
  public void testEncoding() throws UnsupportedEncodingException {
    final String czech = "Řízeček a šampáňo a žízeň";
    // okay
    final byte[] bytesInLatin2 = czech.getBytes("ISO8859-2");
    // different bytes, but okay
    final byte[] bytesInWin1250 = czech.getBytes("Windows-1250");
    // different bytes, but okay
    final byte[] bytesInUtf8 = czech.getBytes("UTF-8");
    // nonsense; Ř,č,... are not in Latin1 code set!!!
    final byte[] bytesInLatin1 = czech.getBytes("ISO8859-1");

    System.out.println(Arrays.toString(bytesInLatin2));
    System.out.println(Arrays.toString(bytesInWin1250));
    System.out.println(Arrays.toString(bytesInUtf8));
    System.out.println(Arrays.toString(bytesInLatin1));
    System.out.flush();

    final String latin2 = new String(bytesInLatin2, "ISO8859-2");
    final String win1250 = new String(bytesInWin1250, "Windows-1250");
    final String utf8 = new String(bytesInUtf8, "UTF-8");
    final String latin1 = new String(bytesInLatin1, "ISO8859-1");

    Assert.assertEquals("latin2", czech, latin2);
    Assert.assertEquals("win1250", czech, win1250);
    Assert.assertEquals("utf8", czech, utf8);
    Assert.assertEquals("latin1", czech, latin1); // this test will fail!
  }

由于Java的这种行为,很多情况下数据最终会被破坏。如果字符串可以通过某种编码进行编码,是否有任何库可用于验证它们?

我怀疑您正在寻找


作为Jon Skeet建议的替代方案,您也可以使用类直接(使用该方法)进行编码,但首先调用和方法来指定编码器在遇到错误输入时应执行的操作


这样,大多数时候你只是在做一个简单的编码调用,但如果出现任何错误,你会得到一个异常。

+1,很好的一个!我甚至不知道这个存在。。。旁白:这些是不能用UTF-8编码的
CharSequence
s吗?@fge:只有无效的(例如,有半个代理项对的)。
Charset latin2 = Charset.forName("ISO8859-2");
boolean validInLatin2 = latin2.newEncoder().canEncode(czech);
...