Regex 正则表达式：ReplaceAllIn在生成UTF-8时返回StringIndexOutOfBoundsException_Regex_Scala_Unicode_Utf 8

Regex 正则表达式：ReplaceAllIn在生成UTF-8时返回StringIndexOutOfBoundsException

regex scala unicode utf-8

Regex 正则表达式：ReplaceAllIn在生成UTF-8时返回StringIndexOutOfBoundsException,regex,scala,unicode,utf-8,Regex,Scala,Unicode,Utf 8,我想替换所有出现的“\uxxx”类型的正则表达式，其中“XXXX”是一个表示对应字符的Unicode字符的十六进制数我尝试了以下Scala代码： def unscape(s : String) : String = { val rex = """\\u([0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z])""".r rex.replaceAllIn(s,m => { hex2str(m.group(1)) } } def he

我想替换所有出现的“\uxxx”类型的正则表达式，其中“XXXX”是一个表示对应字符的Unicode字符的十六进制数

我尝试了以下Scala代码：

def unscape(s : String) : String = {
 val rex = """\\u([0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z])""".r
 rex.replaceAllIn(s,m => {
     hex2str(m.group(1))
   }
}

def hex2str(s:String): String = {
  Integer.parseInt(s,16).toChar.toString  
}

例如，如果我尝试：

unscape("Hi\\u0024, \\u0024")

它给出了以下例外情况：

java.lang.StringIndexOutOfBoundsException: String index out of range: 1

在中，Java对Unicode字符的处理似乎存在缺陷。这就是问题所在吗

尝试以下操作：

def unscape(s: String): String = {
    val rex = """\\u([0-9a-fA-F]{4})""".r
    rex.replaceAllIn(s, m => {
        hex2str(m.group(1))
            .replaceAllLiterally("\\", "\\\\")
            .replaceAllLiterally("$", "\\$")
    })
}

根据它，

replaceAllIn

：

请注意，替换中的反斜杠（\）和美元符号（$）字符串可能会导致结果与正在使用的结果不同被视为文字替换字符串。美元符号可能会被处理作为上述捕获子序列的参考，以及反斜杠用于转义替换中的文字字符绳子

只是为了调整公认的答案：

  def unscape3(s: String): String = {
    val rex = """\\u(\p{XDigit}{4})""".r
    rex.replaceAllIn(s, m => Regex quoteReplacement hex2str(m group 1))
  }

  Console println unscape3("""Hi\u0024, \u0024""")

请注意，字符类是正确的，在使用

quoteReplacement

时，您不必知道需要转义什么

（可能比多次扫描替换文本更有效。）

但它会抛出

“\\u004H”

。你的意思是A-F。请看其他答案。好的，对……我后来注意到了，因为它再次引发了异常……还谢谢：）@som snytt，我的错误。非常感谢。