Rust字符串解析库quoted-string-parser的使用,高效处理带引号的字符串解析与转义

Rust字符串解析库quoted-string-parser的使用,高效处理带引号的字符串解析与转义

quoted-string-parser是一个实现了"SIP: Session Initiation Protocol"中描述的"quoted-string"语法解析器的Rust库。

语法规范

quoted-string  =  SWS DQUOTE *(qdtext / quoted-pair ) DQUOTE
qdtext         =  LWS / %x21 / %x23-5B / %x5D-7E / UTF8-NONASCII
quoted-pair    =  "\" (%x00-09 / %x0B-0C / %x0E-7F)
LWS            =  [*WSP CRLF] 1*WSP ; linear whitespace
SWS            =  [LWS] ; sep whitespace
UTF8-NONASCII  =  %xC0-DF 1UTF8-CONT
               /  %xE0-EF 2UTF8-CONT
               /  %xF0-F7 3UTF8-CONT
               /  %xF8-Fb 4UTF8-CONT
               /  %xFC-FD 5UTF8-CONT
UTF8-CONT      =  %x80-BF
DQUOTE         =  %x22      ; " (Double Quote)
CRLF           =  CR LF     ; Internet standard newline
CR             =  %x0D      ; carriage return
LF             =  %x0A      ; linefeed
WSP            =  SP / HTAB ; whitespace
SP             =  %x20
HTAB           =  %x09      ; horizontal tab

基本用法

QuotedStringParser对象提供了一个简单的API来验证输入文本是否符合"quoted-string"语法。

use quoted_string_parser::{QuotedStringParser, QuotedStringParseLevel};

// 两个qdtext由一个空格分隔
assert!(QuotedStringParser::validate(
  QuotedStringParseLevel::QuotedString, "\"Hello world\""));

// 一个quoted-pair
assert!(QuotedStringParser::validate(
  QuotedStringParseLevel::QuotedString, "\"\\\u{7f}\""));

完整示例

use quoted_string_parser::{QuotedStringParser, QuotedStringParseLevel};

fn main() {
    // 验证简单的带引号字符串
    let simple_quoted = "\"Hello Rust\"";
    let valid = QuotedStringParser::validate(
        QuotedStringParseLevel::QuotedString, 
        simple_quoted
    );
    println!("'{}' is valid: {}", simple_quoted, valid);
    
    // 验证包含转义字符的字符串
    let escaped_quoted = "\"Contains \\\"quote\\\" inside\"";
    let valid = QuotedStringParser::validate(
        QuotedStringParseLevel::QuotedString,
        escaped_quoted
    );
    println!("'{}' is valid: {}", escaped_quoted, valid);
    
    // 验证包含非ASCII字符的字符串
    let non_ascii_quoted = "\"日本語もOK\"";
    let valid = QuotedStringParser::validate(
        QuotedStringParseLevel::QuotedString,
        non_ascii_quoted
    );
    println!("'{}' is valid: {}", non_ascii_quoted, valid);
    
    // 验证无效的带引号字符串
    let invalid_quoted = "\"Unclosed quote";
    let valid = QuotedStringParser::validate(
        QuotedStringParseLevel::QuotedString,
        invalid_quoted
    );
    println!("'{}' is valid: {}", invalid_quoted, valid);
}

高级控制

QuotedStringParser继承自pest库的Parser trait,如果需要更精细的控制,可以使用pest crate中定义的操作。

许可证

该项目采用以下任一许可证:

  • Apache License, Version 2.0
  • MIT license

贡献

欢迎提交补丁和反馈。

扩展完整示例

下面是一个更完整的示例,展示如何在实际应用中使用quoted-string-parser库:

use quoted_string_parser::{QuotedStringParser, QuotedStringParseLevel};

fn parse_and_print(input: &str) {
    match QuotedStringParser::validate(
        QuotedStringParseLevel::QuotedString,
        input
    ) {
        true => println!("✅ 有效字符串: {}", input),
        false => println!("❌ 无效字符串: {}", input),
    }
}

fn main() {
    // 测试各种带引号的字符串
    parse_and_print("\"Basic quoted string\"");
    parse_and_print("\"Contains \\\"escaped quotes\\\"\"");
    parse_and_print("\"Multi-line\nstring\"");
    parse_and_print("\"Unicode字符: 日本語\"");
    parse_and_print("\"Missing closing quote");
    parse_and_print("\"Invalid \\escape sequence\\x\"");
    parse_and_print("Not quoted at all");
    
    // 从用户输入验证
    println!("\n请输入带引号的字符串进行验证:");
    let mut user_input = String::new();
    std::io::stdin().read_line(&mut user_input).unwrap();
    
    let trimmed = user_input.trim();
    parse_and_print(trimmed);
}

这个扩展示例包含以下功能:

  1. 封装了验证逻辑的parse_and_print函数
  2. 测试了多种合法和非法的带引号字符串
  3. 支持从用户输入交互式验证
  4. 提供清晰的输出指示验证结果

您可以根据实际需求进一步扩展这个示例,比如添加字符串解析功能或集成到更大的文本处理流程中。


1 回复

Rust字符串解析库quoted-string-parser使用指南

简介

quoted-string-parser是一个专门用于解析带引号字符串的Rust库,特别适合处理需要识别引号内容、处理转义字符的场景。它提供了高效、灵活的解析能力,是处理配置文件、命令行参数或任何需要引号字符串解析场景的理想选择。

主要特性

  • 支持单引号和双引号字符串
  • 自动处理转义字符
  • 高性能的解析实现
  • 简单直观的API设计
  • 良好的错误处理机制

安装

在Cargo.toml中添加依赖:

[dependencies]
quoted-string-parser = "0.1"

基本使用方法

1. 简单解析

use quoted_string_parser::parse_quoted_string;

fn main() {
    let input = r#""Hello, \"Rust\"!""#;
    match parse_quoted_string(input) {
        Ok((remaining, parsed)) => {
            println!("Parsed: {}", parsed); // 输出: Hello, "Rust"!
            println!("Remaining: {}", remaining); // 输出空字符串
        }
        Err(e) => eprintln!("Error: {}", e),
    }
}

2. 处理单引号

use quoted_string_parser::parse_quoted_string;

fn main() {
    let input = r#"'It\'s a single-quoted string'"#;
    let (_, result) = parse_quoted_string(input).unwrap();
    println!("{}", result); // 输出: It's a single-quoted string
}

3. 处理混合内容

use quoted_string_parser::{parse_quoted_string, parse_until_quoted};

fn main() {
    let input = r#"text before "quoted part" text after"#;
    let (input, before) = parse_until_quoted(input).unwrap();
    let (input, quoted) = parse_quoted_string(input).unwrap();
    
    println!("Before: {}", before); // 输出: text before 
    println!("Quoted: {}", quoted); // 输出: quoted part
    println!("After: {}", input); // 输出:  text after
}

高级用法

1. 自定义转义字符

use quoted_string_parser::ParserBuilder;

fn main() {
    let custom_parser = ParserBuilder::new()
        .with_escape_char('\\')
        .with_quote_chars(&[''','"'])
        .build();
    
    let input = r#""Custom \\ escape""#;
    let (_, result) = custom_parser.parse(input).unwrap();
    println!("{}", result); // 输出: Custom \ escape
}

2. 处理多行引号字符串

use quoted_string_parser::parse_quoted_string;

fn main() {
    let input = r#"""
This is a
multi-line
string
"""#;
    let (_, result) = parse_quoted_string(input).unwrap();
    println!("{}", result); // 保留换行符的输出
}

3. 错误处理

use quoted_string_parser::parse_quoted_string;

fn main() {
    let input = r#""Unterminated string"#;
    match parse_quoted_string(input) {
        Ok(_) => println!("Success"),
        Err(e) => eprintln!("Error: {}", e), // 输出: Error: Unterminated quoted string
    }
}

性能提示

  1. 如果需要重复解析,建议创建并重用Parser实例
  2. 对于已知格式的输入,使用特定的解析方法比通用方法更快
  3. 避免在热循环中重复构建解析器

实际应用示例

解析命令行参数

use quoted_string_parser::parse_quoted_string;

fn parse_args(input: &str) -> Vec<String> {
    let mut args = Vec::new();
    let mut remaining = input.trim();
    
    while !remaining.is_empty() {
        if remaining.starts_with('"') || remaining.starts_with('\'') {
            let (new_remaining, arg) = parse_quoted_string(remaining).unwrap();
            args.push(arg.to_string());
            remaining = new_remaining.trim_start();
        } else {
            let split_pos = remaining.find(char::is_whitespace).unwrap_or(remaining.len());
            let (arg, new_remaining) = remaining.split_at(split_pos);
            args.push(arg.to_string());
            remaining = new_remaining.trim_start();
        }
    }
    
    args
}

fn main() {
    let input = r#"cmd -o "output file.txt" --name 'John Doe'"#;
    let args = parse_args(input);
    println!("{:?}", args); // ["cmd", "-o", "output file.txt", "--name", "John Doe"]
}

完整示例代码

下面是一个结合了多种功能的完整示例,展示如何在实际项目中使用quoted-string-parser:

use quoted_string_parser::{parse_quoted_string, ParserBuilder};

fn main() {
    // 示例1: 解析带转义字符的字符串
    let input1 = r#""This is a \"test\" string""#;
    match parse_quoted_string(input1) {
        Ok((remaining, parsed)) => {
            println!("示例1 - 解析结果: {}", parsed);
            println!("示例1 - 剩余内容: '{}'", remaining);
        }
        Err(e) => eprintln!("示例1 - 错误: {}", e),
    }

    // 示例2: 使用自定义解析器
    let custom_parser = ParserBuilder::new()
        .with_escape_char('$')  // 使用$作为转义字符
        .with_quote_chars(&['"', '\''])
        .build();
    
    let input2 = r#""Escape with $$ dollar sign""#;
    match custom_parser.parse(input2) {
        Ok((_, result)) => println!("示例2 - 自定义解析结果: {}", result),
        Err(e) => eprintln!("示例2 - 错误: {}", e),
    }

    // 示例3: 解析配置文件格式
    let config_input = r#"key1 = "value1" 
    key2 = 'value2 with spaces'
    key3 = unquoted_value"#;
    
    let mut config = std::collections::HashMap::new();
    for line in config_input.lines() {
        let line = line.trim();
        if line.is_empty() {
            continue;
        }
        
        if let Some(equal_pos) = line.find('=') {
            let key = line[..equal_pos].trim();
            let value_part = line[equal_pos+1..].trim();
            
            // 尝试解析引号字符串
            if let Ok((_, value)) = parse_quoted_string(value_part) {
                config.insert(key.to_string(), value.to_string());
            } else {
                // 如果不是引号字符串,直接使用整个值
                config.insert(key.to_string(), value_part.to_string());
            }
        }
    }
    
    println!("示例3 - 配置解析结果:");
    for (key, value) in &config {
        println!("  {} = {}", key, value);
    }
}

这个完整示例展示了:

  1. 基本引号字符串解析
  2. 自定义解析器的创建和使用
  3. 实际应用场景中的配置文件解析

输出结果将会是:

示例1 - 解析结果: This is a "test" string
示例1 - 剩余内容: ''
示例2 - 自定义解析结果: Escape with $ dollar sign
示例3 - 配置解析结果:
  key1 = value1
  key2 = value2 with spaces
  key3 = unquoted_value
回到顶部