如何在Java中使用Unicode字符填充字符串

如何在Java中使用Unicode字符填充字符串,java,unicode,text-formatting,Java,Unicode,Text Formatting,我将右填充添加到字符串中,以表格式输出它 for (String[] tuple : testData) { System.out.format("%-32s -> %s\n", tuple[0], tuple[1]); } 结果如下所示(随机测试数据): 当存在多字节unicode字符时,此方法不起作用: 0OZotAs is discussed by the comments in the question linked to by @Xehpuk, in t

我将右填充添加到字符串中,以表格式输出它

for (String[] tuple : testData) {
  System.out.format("%-32s -> %s\n", tuple[0], tuple[1]);
}
结果如下所示(随机测试数据):

当存在多字节unicode字符时,此方法不起作用:

0OZotAs is discussed by the comments in the question linked to by @Xehpuk, in this discussion on kotlinlang.org as well as in this blog post by Daniel Lemire the following seems to be correct:

The problem is that the java String class represents characters as UTF-16 characters. This means any unicode character that is represented by more than 16 bits is saved as 2 separate Char values. This fact is ignored by many of the functions within String, eg. String.lenght does not return the number of unicode characters, it returns the number of 16bit characters within the String, some emoji counting for 2 characters.

The behaviour, however, seems to be implementation-specific.

As David mentions in his post you could try the following to get the correct lenght:

tuple.codePointCount(0, tuple.length())

0OZot正如@Xehpuk在kotlinlang.org上链接到的问题中的评论所讨论的,以及以下博客文章中的评论似乎是正确的:

问题是java字符串类将字符表示为 UTF-16字符。这意味着任何unicode字符 由超过16位表示的字符保存为两个单独的字符值。 字符串中的许多函数都忽略了这一事实,例如。 String.lenght不返回unicode字符数,它 返回字符串中的16位字符数,某些表情符号 计算2个字符

然而,这种行为似乎是特定于实现的

正如David在他的帖子中提到的,您可以尝试以下方法来获得正确的长度:


从JavaSE文档中可以看到,这里实际上存在一些问题,除了一些字体显示的标志比其他字符宽之外。我假设您希望将中国国旗计算为单个字符(因为它在屏幕上是作为单个元素绘制的)

String类报告的长度不正确 String类与
char
s一起工作,后者是Unicode码点的16位整数。问题在于并非所有的代码点都适合16位,只有来自基本多语言平面(BMP)的代码点适合那些
char
s
String
length()
方法返回
char
s的数量,而不是代码点的数量

现在
String
codePointCount
方法在这种情况下可能会有所帮助:它统计给定索引范围内的代码点数量。因此,提供
string.length()
作为方法的第二个参数将返回代码点的总数

组合字符
然而,还有一个问题。这是否回答了你的问题?谢谢,它提供了一些背景,但我仍然无法解决我的具体问题。
0OZotAs is discussed by the comments in the question linked to by @Xehpuk, in this discussion on kotlinlang.org as well as in this blog post by Daniel Lemire the following seems to be correct:

The problem is that the java String class represents characters as UTF-16 characters. This means any unicode character that is represented by more than 16 bits is saved as 2 separate Char values. This fact is ignored by many of the functions within String, eg. String.lenght does not return the number of unicode characters, it returns the number of 16bit characters within the String, some emoji counting for 2 characters.

The behaviour, however, seems to be implementation-specific.

As David mentions in his post you could try the following to get the correct lenght:

tuple.codePointCount(0, tuple.length())