Golang如何将字符串中的转义序列转换为对应的UNICODE编码

Golang如何将字符串中的转义序列转换为对应的UNICODE编码我的问题是这样的：

从词法分析器中，我收到一个包含字符字面量值的字符串，例如：

'A'
'B'
'+'
'\n'

我想从这些字符串中获取其对应的 UNICODE 码。由于它是一个字符串，我首先使用字符（单引号）"'" 来修剪字符串，从而得到以下字符串：

A
B
+
\n

从上述字符串中，我希望获取 UNICODE 码值并存储它。我尝试了以下方法：

trimmedValue := strings.Trim(charLiteral, "'")
chars := []rune(trimmedValue)

从这里，我可以获取前三种情况的值：65、66、43。但是如何将字符串 \n 转换为其 unicode 值呢？因为用这种方法，我得到了一个包含两个不同值的切片。也就是说，如何让 \n 被解释为单个值/字符？是否有直接的方法来实现这一点？

谢谢

更多关于Golang如何将字符串中的转义序列转换为对应的UNICODE编码的实战教程也可以访问 https://www.itying.com/category-94-b0.html

htzhanglong 1楼

顺便提一下：据我所知，那样得到的是 ASCII 值。

你可能想看看：string - How can I get the Unicode value of a character in go? - Stack Overflow

更多关于Golang如何将字符串中的转义序列转换为对应的UNICODE编码的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html

gougou168 2楼作者

字符字面量

如果你有字符字面量，那么你已经有了 Unicode 值。

来自 Go 官方博客 - 字符串：

在 Go 中，字符常量被称为 rune 常量

而一个 rune 就是一个 Unicode 码点，并且 rune 是 uint32 类型。

如果你有的是 strings，它们是 UTF-8 编码的，你可以从一个字符串中获取一个 rune（Unicode 码点），正如 @freeformz 提到的链接中所描述的那样（简而言之：utf8 包 - unicode/utf8 - Go Packages）。

vueper 3楼

在Go中，你可以使用strconv.Unquote函数来处理包含转义序列的字符串。这个函数会将字符串中的转义序列（如\n、\t、\uXXXX等）转换为对应的Unicode字符。

以下是一个示例代码，演示如何将包含转义序列的字符串转换为对应的Unicode编码：

package main

import (
    "fmt"
    "strconv"
    "strings"
)

func getUnicodeFromCharLiteral(charLiteral string) (rune, error) {
    // 修剪单引号
    trimmed := strings.Trim(charLiteral, "'")
    
    // 使用strconv.Unquote处理转义序列
    // 需要将字符串包装成带单引号的格式，因为Unquote期望引号包裹的字符串
    unquoted, err := strconv.Unquote("'" + trimmed + "'")
    if err != nil {
        return 0, err
    }
    
    // 将字符串转换为rune切片
    runes := []rune(unquoted)
    if len(runes) != 1 {
        return 0, fmt.Errorf("expected single character, got %d characters", len(runes))
    }
    
    return runes[0], nil
}

func main() {
    testCases := []string{
        "'A'",
        "'B'",
        "'+'",
        "'\\n'",
        "'\\t'",
        "'\\u0041'", // Unicode转义序列
        "'\\x41'",   // 十六进制转义
    }
    
    for _, testCase := range testCases {
        r, err := getUnicodeFromCharLiteral(testCase)
        if err != nil {
            fmt.Printf("Error processing %s: %v\n", testCase, err)
            continue
        }
        fmt.Printf("Input: %-10s -> Unicode: %d (0x%X) -> Character: %c\n", 
            testCase, r, r, r)
    }
}

输出结果：

Input: 'A'        -> Unicode: 65 (0x41) -> Character: A
Input: 'B'        -> Unicode: 66 (0x42) -> Character: B
Input: '+'        -> Unicode: 43 (0x2B) -> Character: +
Input: '\n'       -> Unicode: 10 (0xA) -> Character: 

Input: '\t'       -> Unicode: 9 (0x9) -> Character: 	
Input: '\u0041'   -> Unicode: 65 (0x41) -> Character: A
Input: '\x41'     -> Unicode: 65 (0x41) -> Character: A

如果你只需要处理简单的字符字面量，这里有一个更简洁的版本：

func getUnicodeCodePoint(charLiteral string) (int, error) {
    // 直接使用strconv.Unquote处理整个字符串
    unquoted, err := strconv.Unquote(charLiteral)
    if err != nil {
        return 0, err
    }
    
    runes := []rune(unquoted)
    if len(runes) != 1 {
        return 0, fmt.Errorf("expected single character, got %d characters", len(runes))
    }
    
    return int(runes[0]), nil
}

func main() {
    charLiterals := []string{"'A'", "'B'", "'+'", "'\\n'", "'\\u0041'"}
    
    for _, lit := range charLiterals {
        code, err := getUnicodeCodePoint(lit)
        if err != nil {
            fmt.Printf("Error: %v\n", err)
            continue
        }
        fmt.Printf("%s -> %d (0x%X)\n", lit, code, code)
    }
}

strconv.Unquote函数支持以下转义序列：

\a、\b、\f、\n、\r、\t、\v
\\、\'、\"
\xXX（十六进制）
\uXXXX（Unicode码点）
\UXXXXXXXX（扩展Unicode码点）
\OOO（八进制，仅限Go 1.13及更高版本）