文章/答案/技术大牛

发布

社区首页 >问答首页 >字符串到UCS-2

问字符串到UCS-2
EN

Stack Overflow用户

提问于 2015-05-31 10:56:15

回答 3查看 1.9K关注 0票数 2

我想在Go中翻译python程序，将unicode字符串转换为UCS-2 HEX字符串。

在python中，它非常简单：

u"Bien joué".encode('utf-16-be').encode('hex')
-> 004200690065006e0020006a006f007500e9

我是Go的初学者，我发现最简单的方法是：

package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "Bien joué" 
    fmt.Printf("str: %s\n", str)

    ucs2HexArray := []rune(str)
    s := fmt.Sprintf("%U", ucs2HexArray)
    a := strings.Replace(s, "U+", "", -1)
    b := strings.Replace(a, "[", "", -1)
    c := strings.Replace(b, "]", "", -1)
    d := strings.Replace(c, " ", "", -1)
    fmt.Printf("->: %s", d)
}

str: Bien joué
->: 004200690065006E0020006A006F007500E9
Program exited.

我真的认为这显然是没有效率的。怎么才能-我能改进吗？

谢谢

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-05-31 13:24:32

使此转换成为一个函数，那么您可以很容易地改进以后的转换算法。例如,

package main

import (
    "fmt"
    "strings"
    "unicode/utf16"
)

func hexUTF16FromString(s string) string {
    hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s)))
    return strings.Replace(hex[1:len(hex)-1], " ", "", -1)
}

func main() {
    str := "Bien joué"
    fmt.Println(str)
    hex := hexUTF16FromString(str)
    fmt.Println(hex)
}

输出：

Bien joué
004200690065006e0020006a006f007500e9

注意：

您可以说“将unicode字符串转换为UCS-2字符串”，但是Python示例使用的是UTF-16：

u"Bien joué".encode('utf-16-be').encode('hex')

Unicode联盟 ATF-16常见问题问: UCS-2和UTF-16有什么区别？答: UCS-2是过时的术语，它指的是Unicode实现，直到Unicode 1.1，然后才将代理代码点和UTF-16添加到标准的2.0版本中。现在应该避免使用这一术语。 UCS-2没有描述与UTF-16不同的数据格式，因为两者都使用完全相同的16位代码单元表示。但是，UCS-2不解释代理代码点，因此不能用于表示补充字符. 有时，在过去，实现被标记为"UCS-2“，以表明它不支持补充字符，并且不将代理代码点对解释为字符。这样的实现不会处理字符属性、代码点边界、排序规则等补充字符的处理。

票数 3

Stack Overflow用户

发布于 2015-05-31 15:28:48

对于不太短的输入(甚至可能是这样)，我会使用golang.org/x/text/encoding/unicode包将其转换为UTF-16 (正如@peterSo和@JimB指出的，与过时的UCS-2略有不同)。

与unicode/utf16相比，使用它(以及golang.org/x/text/transform包)的优点是，您可以获得BOM支持，包括大的或小的endian，并且可以对短字符串或字节进行编码/解码，但您也可以将其作为过滤器应用于io.Reader或io.Writer，以便在处理数据时转换数据，而不是预先处理(即，对于一个大数据流，您不需要同时将其全部存储在内存中)。

例如：

package main

import (
    "bytes"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "strings"

    "golang.org/x/text/encoding/unicode"
    "golang.org/x/text/transform"
)

const input = "Bien joué"

func main() {
    // Get a `transform.Transformer` for encoding.
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
    t := e.NewEncoder()
    // For decoding, allows a Byte Order Mark at the start to
    // switch to corresponding Unicode decoding (UTF-8, UTF-16BE, or UTF-16LE)
    // otherwise we use `e` (UTF-16BE without BOM):
    t2 := unicode.BOMOverride(e.NewDecoder())
    _ = t2 // we don't show/use this

    // If you have a string:
    str := input
    outstr, n, err := transform.String(t, str)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("string:   n=%d, bytes=%02x\n", n, []byte(outstr))

    // If you have a []byte:
    b := []byte(input)
    outbytes, n, err := transform.Bytes(t, b)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("bytes:    n=%d, bytes=%02x\n", n, outbytes)

    // If you have an io.Reader for the input:
    ir := strings.NewReader(input)
    r := transform.NewReader(ir, t)
    // Now just read from r as you normal would and the encoding will
    // happen as you read, good for large sources to avoid pre-encoding
    // everything. Here we'll just read it all in one go though which negates
    // that benefit (normally avoid ioutil.ReadAll).
    outbytes, err = ioutil.ReadAll(r)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("reader: len=%d, bytes=%02x\n", len(outbytes), outbytes)

    // If you have an io.Writer for the output:
    var buf bytes.Buffer
    w := transform.NewWriter(&buf, t)
    _, err = fmt.Fprint(w, input) // or io.Copy from an io.Reader, or whatever
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("writer: len=%d, bytes=%02x\n", buf.Len(), buf.Bytes())
}

// Whichever of these you need you could of
// course put in a single simple function. E.g.:

// NewUTF16BEWriter returns a new writer that wraps w
// by transforming the bytes written into UTF-16-BE.
func NewUTF16BEWriter(w io.Writer) io.Writer {
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
    return transform.NewWriter(w, e.NewEncoder())
}

// ToUTFBE converts UTF8 `b` into UTF-16-BE.
func ToUTF16BE(b []byte) ([]byte, error) {
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
    out, _, err := transform.Bytes(e.NewEncoder(), b)
    return out, err
}

给予：

string:   n=10, bytes=004200690065006e0020006a006f007500e9
bytes:    n=10, bytes=004200690065006e0020006a006f007500e9
reader: len=18, bytes=004200690065006e0020006a006f007500e9
writer: len=18, bytes=004200690065006e0020006a006f007500e9

票数 3

Stack Overflow用户

发布于 2015-05-31 12:30:44

为此，标准库具有内置的utf16.Encode() (https://golang.org/pkg/unicode/utf16/#Encode)函数。

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/30556584

复制

相似问题

问字符串到UCS-2
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问字符串到UCS-2EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问字符串到UCS-2
EN