Java 与ASCII码不同的编码,即使对于字母也是如此

Java 与ASCII码不同的编码,即使对于字母也是如此,java,encoding,character-encoding,ascii,codepages,Java,Encoding,Character Encoding,Ascii,Codepages,是否有任何字符编码在消费类设备(相对于大型机)上相当常见,并且将字母A-Za-z0-9映射为不同于ASCII的字符编码 目前,我正在考虑一个Java应用程序,因此我想知道是否有可能某个国家的某个Java软件的临时用户最终会以一种返回与之不同的结果的方式被报告。我正在尝试解决是否必须解决某些兼容性问题,这些问题可能是由于这方面的不同行为而导致的 我知道,从历史上看,EBCDIC将是ASCII不兼容编码的主要例子。但是,它是在最近的消费类设备上使用,还是仅仅在IBM大型机和老式计算机上使用?EBCD

是否有任何字符编码在消费类设备(相对于大型机)上相当常见,并且将字母
A-Za-z0-9
映射为不同于ASCII的字符编码

目前,我正在考虑一个Java应用程序,因此我想知道是否有可能某个国家的某个Java软件的临时用户最终会以一种返回与之不同的结果的方式被报告。我正在尝试解决是否必须解决某些兼容性问题,这些问题可能是由于这方面的不同行为而导致的

我知道,从历史上看,EBCDIC将是ASCII不兼容编码的主要例子。但是,它是在最近的消费类设备上使用,还是仅仅在IBM大型机和老式计算机上使用?EBCDIC的遗产是否存在于某些国家的通用编码中

我还知道UTF-16是ASCII不兼容的,在Windows上以这种方式编码文件是很常见的。但据我所知,这始终只是文件内容,而不是默认的应用程序区域设置。用户是否可以将其Windows计算机配置为使用UTF-16作为系统代码页,而不中断至少一半的应用程序


据我所知,所有在亚洲使用的前Unicode多字节编码仍然将ASCII范围00-7F映射为与ASCII兼容的内容,至少在字母和数字方面是如此。是否有任何亚洲编码仍在使用,其所有代码点使用超过一个字节?或者在其他大陆?

这里有一个简单的程序,可以找到答案。由您决定失败的字符集是否足够常见

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class EncodingTest {
    public static void main(String[] args) {
        String s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
        byte[] b = s.getBytes(StandardCharsets.UTF_8);
        for (Charset cs : Charset.availableCharsets().values()) {
            try {
                byte[] b2 = s.getBytes(cs);
                if (!Arrays.equals(b, b2)) {
                    System.out.println(cs.displayName() + " doesn't give the same result");
                }
            }
            catch (Exception e) {
                System.out.println(cs.displayName() + " throws an exception");
            }
        }
    }
}
在我的机器上的结果是

IBM-Thai doesn't give the same result
IBM01140 doesn't give the same result
IBM01141 doesn't give the same result
IBM01142 doesn't give the same result
IBM01143 doesn't give the same result
IBM01144 doesn't give the same result
IBM01145 doesn't give the same result
IBM01146 doesn't give the same result
IBM01147 doesn't give the same result
IBM01148 doesn't give the same result
IBM01149 doesn't give the same result
IBM037 doesn't give the same result
IBM1026 doesn't give the same result
IBM1047 doesn't give the same result
IBM273 doesn't give the same result
IBM277 doesn't give the same result
IBM278 doesn't give the same result
IBM280 doesn't give the same result
IBM284 doesn't give the same result
IBM285 doesn't give the same result
IBM290 doesn't give the same result
IBM297 doesn't give the same result
IBM420 doesn't give the same result
IBM424 doesn't give the same result
IBM500 doesn't give the same result
IBM870 doesn't give the same result
IBM871 doesn't give the same result
IBM918 doesn't give the same result
ISO-2022-CN throws an exception
JIS_X0212-1990 doesn't give the same result
UTF-16 doesn't give the same result
UTF-16BE doesn't give the same result
UTF-16LE doesn't give the same result
UTF-32 doesn't give the same result
UTF-32BE doesn't give the same result
UTF-32LE doesn't give the same result
x-IBM1025 doesn't give the same result
x-IBM1097 doesn't give the same result
x-IBM1112 doesn't give the same result
x-IBM1122 doesn't give the same result
x-IBM1123 doesn't give the same result
x-IBM1364 doesn't give the same result
x-IBM300 doesn't give the same result
x-IBM833 doesn't give the same result
x-IBM834 doesn't give the same result
x-IBM875 doesn't give the same result
x-IBM930 doesn't give the same result
x-IBM933 doesn't give the same result
x-IBM935 doesn't give the same result
x-IBM937 doesn't give the same result
x-IBM939 doesn't give the same result
x-JIS0208 doesn't give the same result
x-JISAutoDetect throws an exception
x-MacDingbat doesn't give the same result
x-MacSymbol doesn't give the same result
x-UTF-16LE-BOM doesn't give the same result
X-UTF-32BE-BOM doesn't give the same result
X-UTF-32LE-BOM doesn't give the same result

谢谢你的推荐。不幸的是,我很难确定这些字符集作为应用程序默认字符集的使用频率。我还没有找到任何代码页流行率统计数据或类似数据。IBM页面听起来像DOS或大型机。JIS X 0212显然主要用于与ASCII兼容的EUC-JP内部。ISO-2022-CN只能用于解码。