Regex 为具有双字节字符的字符串获取正确的索引_Regex_Go

Regex 为具有双字节字符的字符串获取正确的索引

regex go

Regex 为具有双字节字符的字符串获取正确的索引,regex,go,Regex,Go,不正确的索引： r, _ := regexp.Compile("hot") s := "it‘s hot" fmt.Println(r.FindStringIndex(s)) [7 10] r, _ := regexp.Compile("hot") s := "it‘s hot" s = strings.ReplaceAll(s, "‘", "'") fmt.Println

不正确的索引：

r, _ := regexp.Compile("hot")
s := "it‘s hot"
fmt.Println(r.FindStringIndex(s))

[7 10]

r, _ := regexp.Compile("hot")
s := "it‘s hot"
s = strings.ReplaceAll(s, "‘", "'")
fmt.Println(r.FindStringIndex(s))

[5 8]

正确索引：

r, _ := regexp.Compile("hot")
s := "it‘s hot"
fmt.Println(r.FindStringIndex(s))

[7 10]

r, _ := regexp.Compile("hot")
s := "it‘s hot"
s = strings.ReplaceAll(s, "‘", "'")
fmt.Println(r.FindStringIndex(s))

[5 8]

如您所见，字符“”导致了问题。问题是：这个问题有没有更通用的解决方案？或者我们必须继续收集这样的字符串，并为这些字符创建我们自己的自定义替换函数。

您可以使用。您可以将其与字符串切片结合使用，从字节索引中获取符文索引：

r, _ := regexp.Compile("hot")
s := "it‘s hot"
idx := r.FindStringIndex(s)
fmt.Println(utf8.RuneCountInString(s[:idx[0]]), utf8.RuneCountInString(s[:idx[1]]))

产出：

5 8

操场链接：

这似乎可以做到：

package main
import "strings"

func runeIndex(s, substr string) int {
   n := strings.Index(s, substr)
   if n == -1 { return -1 }
   r := []rune(s[:n])
   return len(r)
}

func main() {
   n := runeIndex("it‘s hot", "hot")
   println(n == 5)
}

Go“Index”函数返回字节索引。如何处理这个问题取决于你在做什么。你想完成什么？是的，我就是这么想的。我需要人物索引。你能用另一种方法解决你的问题吗？您提到了“自定义替换功能”，但替换工作正常。你用字符索引干什么？你知道“字符”是一种严重的过度简化，在处理Unicode文本时会导致丑陋的错误吗？您知道UTF-32（为方便起见）字节字符串有多少个“字符”

1F926 1F3FC 200D 2640 FE0F

？