Unicode 如何在Go中从编码转换为UTF-8？_Unicode_Go

Unicode 如何在Go中从编码转换为UTF-8？

unicode go

Unicode 如何在Go中从编码转换为UTF-8？,unicode,go,Unicode,Go,我正在进行一个项目，需要将文本从编码（例如Windows-1256阿拉伯语）转换为UTF-8 如何在Go中执行此操作？您可以使用，其中包括通过包golang.org/x/text/encoding/charmap支持Windows-1256（在下面的示例中，导入此包并使用charmap.Windows1256，而不是japanese.ShiftJIS）下面是一个简短的示例，它将一个日语UTF-8字符串编码为ShiftJIS编码，然后将ShiftJIS字符串解码回UTF-8。不幸的是，它在操场上

我正在进行一个项目，需要将文本从编码（例如Windows-1256阿拉伯语）转换为UTF-8

如何在Go中执行此操作？

您可以使用，其中包括通过包

golang.org/x/text/encoding/charmap

支持Windows-1256（在下面的示例中，导入此包并使用

charmap.Windows1256

，而不是

japanese.ShiftJIS

）

下面是一个简短的示例，它将一个日语UTF-8字符串编码为ShiftJIS编码，然后将ShiftJIS字符串解码回UTF-8。不幸的是，它在操场上不起作用，因为操场上没有“x”包

package main

import (
    "bytes"
    "fmt"
    "io/ioutil"
    "strings"

    "golang.org/x/text/encoding/japanese"
    "golang.org/x/text/transform"
)

func main() {
    // the string we want to transform
    s := "今日は"
    fmt.Println(s)

    // --- Encoding: convert s from UTF-8 to ShiftJIS 
    // declare a bytes.Buffer b and an encoder which will write into this buffer
    var b bytes.Buffer
    wInUTF8 := transform.NewWriter(&b, japanese.ShiftJIS.NewEncoder())
    // encode our string
    wInUTF8.Write([]byte(s))
    wInUTF8.Close()
    // print the encoded bytes
    fmt.Printf("%#v\n", b)
    encS := b.String()
    fmt.Println(encS)

    // --- Decoding: convert encS from ShiftJIS to UTF8
    // declare a decoder which reads from the string we have just encoded
    rInUTF8 := transform.NewReader(strings.NewReader(encS), japanese.ShiftJIS.NewDecoder())
    // decode our string
    decBytes, _ := ioutil.ReadAll(rInUTF8)
    decS := string(decBytes)
    fmt.Println(decS)
}

在日本StackOverflow网站上有一个更完整的例子。文本是日文，但代码应该是不言自明的：

您可以使用，其中包括通过包

golang.org/x/text/encoding/charmap

支持Windows-1256（在下面的示例中，导入此包并使用

charmap.Windows1256

而不是

Japanese.ShiftJIS

）

package main

import (
    "bytes"
    "fmt"
    "io/ioutil"
    "strings"

    "golang.org/x/text/encoding/japanese"
    "golang.org/x/text/transform"
)

func main() {
    // the string we want to transform
    s := "今日は"
    fmt.Println(s)

    // --- Encoding: convert s from UTF-8 to ShiftJIS 
    // declare a bytes.Buffer b and an encoder which will write into this buffer
    var b bytes.Buffer
    wInUTF8 := transform.NewWriter(&b, japanese.ShiftJIS.NewEncoder())
    // encode our string
    wInUTF8.Write([]byte(s))
    wInUTF8.Close()
    // print the encoded bytes
    fmt.Printf("%#v\n", b)
    encS := b.String()
    fmt.Println(encS)

    // --- Decoding: convert encS from ShiftJIS to UTF8
    // declare a decoder which reads from the string we have just encoded
    rInUTF8 := transform.NewReader(strings.NewReader(encS), japanese.ShiftJIS.NewDecoder())
    // decode our string
    decBytes, _ := ioutil.ReadAll(rInUTF8)
    decS := string(decBytes)
    fmt.Println(decS)
}

在日本StackOverflow网站上有一个更完整的例子。文本是日语，但代码应该是不言自明的：

使用来自的模块。在您的情况下，这将类似于：

b := /* Win1256 bytes here. */
dec := charmap.Windows1256.NewDecoder()
// Take more space just in case some characters need
// more bytes in UTF-8 than in Win1256.
bUTF := make([]byte, len(b)*3)
n, _, err := dec.Transform(bUTF, b, false)
if err != nil {
    panic(err)
}
bUTF = bUTF[:n]

使用来自的模块。在您的情况下，这将类似于：

b := /* Win1256 bytes here. */
dec := charmap.Windows1256.NewDecoder()
// Take more space just in case some characters need
// more bytes in UTF-8 than in Win1256.
bUTF := make([]byte, len(b)*3)
n, _, err := dec.Transform(bUTF, b, false)
if err != nil {
    panic(err)
}
bUTF = bUTF[:n]

我查看了文档，想出了一种将字节数组转换为（或从）UTF-8的方法

我遇到的困难是，到目前为止，我还没有找到一个允许我使用区域设置的接口。相反，可能的方法仅限于预定义的编码集

在我的例子中，我需要将UTF-16（实际上我有USC-2数据，但它仍然可以工作）转换为UTF-8。为此，我需要检查BOM表，然后进行转换：

bom := buf[0] + buf[1] * 256
if bom == 0xFEFF {
    enc = unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
} else if bom == 0xFFFE {
    enc = unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
} else {
    return Error("BOM missing")
}

e := enc.NewDecoder()

// convert USC-2 (LE or BE) to UTF-8
utf8 := e.Bytes(buf[2:])

不幸的是，我不得不使用“忽略”BOM，因为在我的情况下，它应该被禁止超过第一个字符。但这对我的处境来说已经足够接近了。这些函数在几个地方提到过，但在实践中没有显示出来。

我查看了文档，想出了一种将字节数组转换为（或从）UTF-8的方法

我遇到的困难是，到目前为止，我还没有找到一个允许我使用区域设置的接口。相反，可能的方法仅限于预定义的编码集

在我的例子中，我需要将UTF-16（实际上我有USC-2数据，但它仍然可以工作）转换为UTF-8。为此，我需要检查BOM表，然后进行转换：

bom := buf[0] + buf[1] * 256
if bom == 0xFEFF {
    enc = unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
} else if bom == 0xFFFE {
    enc = unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
} else {
    return Error("BOM missing")
}

e := enc.NewDecoder()

// convert USC-2 (LE or BE) to UTF-8
utf8 := e.Bytes(buf[2:])

不幸的是，我不得不使用“忽略”BOM，因为在我的情况下，它应该被禁止超过第一个字符。但这对我的处境来说已经足够接近了。这些功能在一些地方提到过，但在实践中没有显示出来。

我为自己制作了一个工具，也许你可以从中借鉴一些想法：）

这是关键代码：

\，err=io.Copy(
transform.NewWriter（输出，targetencode.NewEncoder（）），
transform.NewReader（输入，sourceEncoding.NewDecoder（）），
)

我为自己做了一个工具，也许你可以从中借鉴一些想法：）

这是关键代码：

\，err=io.Copy(
transform.NewWriter（输出，targetencode.NewEncoder（）），
transform.NewReader（输入，sourceEncoding.NewDecoder（）），
)

您是指编码吗？只有一个Unicode，阿拉伯语1256不是“Unicode”。你说得对，Iv'e编辑了这个问题。谢谢。你是说编码吗？只有一个Unicode，阿拉伯语1256不是“Unicode”。你说得对，Iv'e编辑了这个问题。谢谢。我找不到一个将一种编码转换成另一种编码的活生生的例子，在网络上做这件事很容易，但在这里我真的是个新手。很棒的活生生的例子。嗯，我们在这里尝试从UTF8转换为日语SHIFTJIS，是否可以反过来转换呢？要解码SHIFTJIS，请使用第二部分，从“声明解码器…”开始，encS是您希望解码的字符串，string（decBytes）是解码的字符串。也许两个函数会更好，但我想让示例尽可能简短…我找不到一个将编码转换为另一种编码的实例，在点网中实现这一点很容易，但在这里我真的是个新手。很棒的实例。嗯，我们在这里尝试从UTF8转换为日语SHIFTJIS，是否可以反过来转换呢？要解码SHIFTJIS，请使用第二部分，从“声明解码器…”开始，encS是您希望解码的字符串，string（decBytes）是解码的字符串。也许两个函数会更好，但我想让示例尽可能简短……我对Go不是非常精通，但大致分配一个缓冲区似乎是个糟糕的主意。理论上UTF-8的大小可能是输入字符串的四倍（但在实践中可能从来没有）。这只是一个例子。Win1256中的大多数字符都会使用，并且不会超过三个字符。必须有一种确定缓冲区大小的方法，而不是猜测@rob74的答案似乎说明了这一点。

NewDecoder

返回一个

transform.Transformer

。你不应该像你那样直接调用这个方法！（例如，像

io.Reader

一样，每次调用时，转换器可以随意变换。）如果你想用转换器变换

[]字节

，你应该使用I不太精通Go，但大致分配一个缓冲区

*2

似乎是个糟糕的主意。理论上UTF-8的大小可能是输入字符串的四倍（但在实践中可能从来没有）。这只是一个例子。Win1256中的大多数字符都会使用，并且不会超过三个字符。编辑，一定有