从Java输入流读取下一个字符（完整的unicode代码点）_Java_Utf 8

从Java输入流读取下一个字符（完整的unicode代码点）

java utf-8

从Java输入流读取下一个字符（完整的unicode代码点）,java,utf-8,Java,Utf 8,我需要逐个字符地解析UTF-8输入（来自文本文件）（字符是指完整的UTF-8字符（UTF-8代码点），而不是Java的字符）我应该使用什么方法？通过使用read（）方法，您可以使用InputStreamReader轻松做到这一点。read方法将返回一个int，它是一个代码点。请在此处查看更多信息：自从Java8以来例如： // if you want to work line by line, use Files.readAllLines() // if you use Guava, th

我需要逐个字符地解析UTF-8输入（来自文本文件）（字符是指完整的UTF-8字符（UTF-8代码点），而不是Java的字符）

我应该使用什么方法？

通过使用read（）方法，您可以使用InputStreamReader轻松做到这一点。read方法将返回一个int，它是一个代码点。请在此处查看更多信息：

自从Java8以来

例如：

// if you want to work line by line, use Files.readAllLines()
// if you use Guava, there's also Guava's Files.toString() for reading the whole file into a String
byte[] bytes = Files.readAllBytes(Paths.get("test.txt"));
String text = new String(bytes, StandardCharsets.UTF_8);

IntStream codePoints = text.codePoints();

// do something with the code points
codePoints.forEach(codePoint -> System.out.println(codePoint));

在所提供链接的示例中，有：

intch；while（（ch=in.read（））>-1）{buffer.append（（char）ch）；}

。但是如果

ch

（一个

int

）不能放入2字节

char

，会发生什么呢？这一点很好。您可能会遇到高于2^16的代码点。幸运的是，StringBuilder/StringBuffer类有方法：appendCodePoint（int codePoint）来处理这个问题。

Reader.read（）

返回的

int

是一个字符，而不是一个代码点。是的，如果它是一个流呢？如何逐个代码点读取代码点，而不是先将整个文件拖入一个字符串？

// if you want to work line by line, use Files.readAllLines()
// if you use Guava, there's also Guava's Files.toString() for reading the whole file into a String
byte[] bytes = Files.readAllBytes(Paths.get("test.txt"));
String text = new String(bytes, StandardCharsets.UTF_8);

IntStream codePoints = text.codePoints();

// do something with the code points
codePoints.forEach(codePoint -> System.out.println(codePoint));