使用Lua使用unicode字符设置字符串格式_Lua_String.format_Unicode String

使用Lua使用unicode字符设置字符串格式

lua

使用Lua使用unicode字符设置字符串格式,lua,string.format,unicode-string,Lua,String.format,Unicode String,我正在尝试将字符串与unicode字符对齐。但它不起作用。空格不正确：（ Lua的版本是5.1。有什么问题吗？ local t = { "character", "루아", -- korean "abc감사합니다123", -- korean "ab23", "lua is funny", "ㅇㅅㅇ", "美國大將", --chinese "qwert-54321", }; for k,

我正在尝试将字符串与unicode字符对齐。
但它不起作用。
空格不正确：（
Lua的版本是5.1。
有什么问题吗？

local t = 
{
    "character",
    "루아",           -- korean
    "abc감사합니다123", -- korean
    "ab23",
    "lua is funny",
    "ㅇㅅㅇ",
    "美國大將",         --chinese
    "qwert-54321",
};

for k, v in pairs(t) do
    print(string.format("%30s", v));
end


result:----------------------------------------------
                     character  
                        루아  
          abc감사합니다123   
                          ab23  
                  lua is funny  
                      ㅇㅅㅇ   
                   美國大將 
                   qwert-54321

ASCII字符串的格式都正确，而非ASCII字符串的格式不正确

原因是，字符串的长度是以字节数计算的。例如，对于UTF-8编码

print(string.len("美國大將"))  -- 12
print(string.len("루아"))      -- 6

因此

%s

在

string.format

中，将这两个字符串视为宽度为12/6

function utf8format(fmt, ...)
   local args, strings, pos = {...}, {}, 0
   for spec in fmt:gmatch'%%.-([%a%%])' do
      pos = pos + 1
      local s = args[pos]
      if spec == 's' and type(s) == 'string' and s ~= '' then
         table.insert(strings, s)
         args[pos] = '\1'..('\2'):rep(#s:gsub("[\128-\191]", "")-1)
      end
   end
   return (fmt:format((table.unpack or unpack)(args))
      :gsub('\1\2*', function() return table.remove(strings, 1) end)
   )
end

local t =
{
   "character",
   "루아",           -- korean
   "abc감사합니다123", -- korean
   "ab23",
   "lua is funny",
   "ㅇㅅㅇ",
   "美國大將",         --chinese
   "qwert-54321",
   "∞"
};

for k, v in pairs(t) do
   print(utf8format("%30s", v));
end

但还有另一个问题：在大多数字体上，韩语和汉语符号比拉丁语字母宽。

谢谢你。明白了：d Unicode字符的格式可能很复杂，因为每个字符在编码时占用的字节数不一，而且在显示时也会占用可变的列数。在Luarocks，这个库看起来与您正在尝试做的事情相关。如果您想知道：这里的主要技巧是使用

\s:gsub（“[\128-\191]”，“”）

获取utf-8字符串中的代码点数量。gsub删除128到191之间的所有字符，即每个utf-8序列保留1个字节，无论它以多少字节开头。每个序列以<128（ascii）字符或一个字符开头≥ 192，然后是1-3个字节，它们都在128和191之间