Java 如何检查字符串是否可在某些编码中进行编码?
以下测试在转换的拉丁文1上失败,因为非法字符被替换为值为63(问号)的字节。问题是这些字符最好引起一些异常Java 如何检查字符串是否可在某些编码中进行编码?,java,string,encoding,character-encoding,Java,String,Encoding,Character Encoding,以下测试在转换的拉丁文1上失败,因为非法字符被替换为值为63(问号)的字节。问题是这些字符最好引起一些异常 @Test public void testEncoding() throws UnsupportedEncodingException { final String czech = "Řízeček a šampáňo a žízeň"; // okay final byte[] bytesInLatin2 = czech.getBytes("ISO885
@Test
public void testEncoding() throws UnsupportedEncodingException {
final String czech = "Řízeček a šampáňo a žízeň";
// okay
final byte[] bytesInLatin2 = czech.getBytes("ISO8859-2");
// different bytes, but okay
final byte[] bytesInWin1250 = czech.getBytes("Windows-1250");
// different bytes, but okay
final byte[] bytesInUtf8 = czech.getBytes("UTF-8");
// nonsense; Ř,č,... are not in Latin1 code set!!!
final byte[] bytesInLatin1 = czech.getBytes("ISO8859-1");
System.out.println(Arrays.toString(bytesInLatin2));
System.out.println(Arrays.toString(bytesInWin1250));
System.out.println(Arrays.toString(bytesInUtf8));
System.out.println(Arrays.toString(bytesInLatin1));
System.out.flush();
final String latin2 = new String(bytesInLatin2, "ISO8859-2");
final String win1250 = new String(bytesInWin1250, "Windows-1250");
final String utf8 = new String(bytesInUtf8, "UTF-8");
final String latin1 = new String(bytesInLatin1, "ISO8859-1");
Assert.assertEquals("latin2", czech, latin2);
Assert.assertEquals("win1250", czech, win1250);
Assert.assertEquals("utf8", czech, utf8);
Assert.assertEquals("latin1", czech, latin1); // this test will fail!
}
由于Java的这种行为,很多情况下数据最终会被破坏。如果字符串可以通过某种编码进行编码,是否有任何库可用于验证它们?我怀疑您正在寻找
作为Jon Skeet建议的替代方案,您也可以使用类直接(使用该方法)进行编码,但首先调用和方法来指定编码器在遇到错误输入时应执行的操作
这样,大多数时候你只是在做一个简单的编码调用,但如果出现任何错误,你会得到一个异常。+1,很好的一个!我甚至不知道这个存在。。。旁白:这些是不能用UTF-8编码的
CharSequence
s吗?@fge:只有无效的(例如,有半个代理项对的)。
Charset latin2 = Charset.forName("ISO8859-2");
boolean validInLatin2 = latin2.newEncoder().canEncode(czech);
...