Rust PostgreSQL查询解析库pg_query的使用,支持SQL语法解析与查询分析

pg_query.rs

这个Rust库使用实际的PostgreSQL服务器源代码来解析SQL查询并返回PostgreSQL内部解析树。

它还允许您规范化查询(用$1等替换常量值),并将这些规范化查询再次解析为解析树。

当您构建此库时,它会构建PostgreSQL服务器源代码的一部分,然后将其静态链接到此库中。

开始使用

将以下内容添加到您的Cargo.toml

[dependencies]
pg_query = "5.1"

示例

解析查询

use pg_query::NodeRef;

let result = pg_query::parse(“SELECT * FROM contacts”); assert!(result.is_ok()); let result = result.unwrap(); assert_eq!(result.tables(), vec![“contacts”]); assert!(matches!(result.protobuf.nodes()[0].0, NodeRef::SelectStmt(_)));

规范化查询

let result = pg_query::normalize("SELECT 1 FROM x WHERE y = (SELECT 123 FROM a WHERE z = 'bla')").unwrap();
assert_eq!(result, "SELECT $1 FROM x WHERE y = (SELECT $2 FROM a WHERE z = $3)");

查询指纹

let result = pg_query::fingerprint("SELECT * FROM contacts.person WHERE id IN (1, 2, 3, 4);").unwrap();
assert_eq!(result.hex, "643d2a3c294ab8a7");

截断查询

let query = "INSERT INTO \"x\" (a, b, c, d, e, f) VALUES (?)";
let result = pg_query::parse(query).unwrap();
assert_eq!(result.truncate(32).unwrap(), "INSERT INTO x (...) VALUES (...)");

完整示例代码

use pg_query::NodeRef;

fn main() -> Result<(), Box<dyn std::error::Error>> { // 示例1: 解析查询 let parse_result = pg_query::parse(“SELECT * FROM contacts”)?; println!(“解析的表: {:?}”, parse_result.tables());

// 示例2: 规范化查询
let normalized = pg_query::normalize("SELECT 1 FROM x WHERE y = (SELECT 123 FROM a WHERE z = 'bla')")?;
println!("规范化查询: {}", normalized);

// 示例3: 查询指纹
let fingerprint_result = pg_query::fingerprint("SELECT * FROM contacts.person WHERE id IN (1, 2, 3, 4);")?;
println!("查询指纹: {}", fingerprint_result.hex);

// 示例4: 截断查询
let truncate_query = "INSERT INTO \"x\" (a, b, c, d, e, f) VALUES (?)";
let truncate_result = pg_query::parse(truncate_query)?;
let truncated = truncate_result.truncate(32)?;
println!("截断后的查询: {}", truncated);

Ok(())

}

致谢

感谢Paul Mason在pg_parse上的工作,这个crate基于此。

在0.6.0版本之后,Paul将pg_query crate捐赠给了pganalyze团队。pg_parse是一个更轻量级的替代方案,专注于查询解析,而pg_query旨在与Ruby gem具有功能对等性。

许可证

PostgreSQL服务器源代码,根据PostgreSQL许可证使用。
部分版权 (c) 1996-2023, The PostgreSQL Global Development Group
部分版权 (c) 1994, The Regents of the University of California

所有其他部分均根据MIT许可证授权,详见LICENSE文件。
版权 (c) 2021 Paul Mason 版权 (c) 2021-2023, Duboce Labs, Inc. (pganalyze)


1 回复

Rust PostgreSQL查询解析库pg_query的使用

介绍

pg_query是一个用于解析和分析PostgreSQL查询语句的Rust库,它基于PostgreSQL的查询解析器,能够将SQL语句解析为抽象语法树(AST),并提供丰富的查询分析功能。该库特别适合需要处理SQL语句的应用程序,如查询优化器、SQL格式化工具、语法检查器等。

主要特性

  • 支持标准PostgreSQL SQL语法
  • 将SQL语句解析为JSON格式的AST
  • 提供查询分析和重构功能
  • 支持查询标准化和去语义化
  • 跨平台兼容

安装方法

在Cargo.toml中添加依赖:

[dependencies]
pg_query = "0.10"

基本使用方法

1. 解析SQL语句

use pg_query::{Node, NodeEnum, NodeRef};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let sql = "SELECT id, name FROM users WHERE age > 18";
    
    // 解析SQL语句
    let result = pg_query::parse(sql)?;
    
    // 获取解析树
    let parse_tree = result.parse_tree;
    println!("{:#?}", parse_tree);
    
    Ok(())
}

2. 提取查询信息

use pg_query::{Node, NodeEnum};

fn extract_columns(ast: &Node) -> Vec<String> {
    let mut columns = Vec::new();
    
    if let NodeEnum::SelectStmt(select_stmt) = &ast.node {
        if let Some(target_list) = &select_stmt.target_list {
            for target in target_list {
                if let NodeEnum::ResTarget(res_target) = &target.node {
                    if let Some(name) = &res_target.name {
                        columns.push(name.clone());
                    }
                }
            }
        }
    }
    
    columns
}

3. 查询标准化

fn normalize_query(sql: &str) -> Result<String, Box<dyn std::error::Error>> {
    let normalized = pg_query::normalize(sql)?;
    Ok(normalized)
}

// 示例:将查询参数化
let sql = "SELECT * FROM users WHERE id = 123";
let normalized = normalize_query(sql)?;
println!("{}", normalized); // 输出: SELECT * FROM users WHERE id = $1

4. 指纹生成

fn generate_fingerprint(sql: &str) -> Result<String, Box<dyn std::error::Error>> {
    let fingerprint = pg_query::fingerprint(sql)?;
    Ok(fingerprint)
}

// 生成查询指纹用于缓存或比较
let sql1 = "SELECT name FROM users WHERE id = 1";
let sql2 = "SELECT name FROM users WHERE id = 2";
let fp1 = generate_fingerprint(sql1)?;
let fp2 = generate_fingerprint(sql2)?;

println!("指纹是否相同: {}", fp1 == fp2); // 输出: true

高级用法

遍历AST节点

use pg_query::{Node, NodeEnum, NodeRef};

fn walk_ast(node: &Node, depth: usize) {
    let indent = "  ".repeat(depth);
    
    match &node.node {
        NodeEnum::SelectStmt(select) => {
            println!("{}Select语句", indent);
            // 递归处理子节点
        }
        NodeEnum::ColumnRef(column) => {
            println!("{}列引用: {:?}", indent, column);
        }
        // 处理其他节点类型...
        _ => {}
    }
    
    // 递归处理所有子节点
    if let Some(children) = node.get_children() {
        for child in children {
            walk_ast(&child, depth + 1);
        }
    }
}

查询重写示例

fn rewrite_query(sql: &str) -> Result<String, Box<dyn std::error::Error>> {
    let parsed = pg_query::parse(sql)?;
    
    // 这里可以添加自定义的重写逻辑
    // 例如:修改表名、添加条件等
    
    // 将修改后的AST重新转换为SQL
    let rewritten_sql = pg_query::deparse(&parsed.parse_tree)?;
    Ok(rewritten_sql)
}

错误处理

fn safe_parse(sql: &str) -> Result<(), Box<dyn std::error::Error>> {
    match pg_query::parse(sql) {
        Ok(result) => {
            println!("解析成功: {:?}", result);
            Ok(())
        }
        Err(e) => {
            eprintln!("解析错误: {}", e);
            Err(e.into())
        }
    }
}

完整示例demo

use pg_query::{Node, NodeEnum, NodeRef};
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    // 示例SQL查询
    let sql = "SELECT id, name, email FROM users WHERE age > 18 AND status = 'active' ORDER BY name DESC";
    
    println!("原始SQL: {}", sql);
    println!("{}", "=".repeat(50));
    
    // 1. 解析SQL语句
    println!("1. 解析SQL语句:");
    let result = pg_query::parse(sql)?;
    println!("解析成功,获得AST");
    println!();
    
    // 2. 提取查询信息
    println!("2. 提取查询信息:");
    if let Some(parse_tree) = result.parse_tree {
        let columns = extract_columns(&parse_tree);
        println!("提取的列: {:?}", columns);
        
        let table_name = extract_table_name(&parse_tree);
        println!("查询的表: {:?}", table_name);
        
        let conditions = extract_conditions(&parse_tree);
        println!("查询条件: {:?}", conditions);
    }
    println!();
    
    // 3. 查询标准化
    println!("3. 查询标准化:");
    let normalized = pg_query::normalize(sql)?;
    println!("标准化后的SQL: {}", normalized);
    println!();
    
    // 4. 指纹生成
    println!("4. 指纹生成:");
    let fingerprint = pg_query::fingerprint(sql)?;
    println!("查询指纹: {}", fingerprint);
    println!();
    
    // 5. 遍历AST节点
    println!("5. 遍历AST节点:");
    if let Some(parse_tree) = result.parse_tree {
        walk_ast(&parse_tree, 0);
    }
    println!();
    
    // 6. 查询重写示例
    println!("6. 查询重写示例:");
    let rewritten = rewrite_query_add_limit(sql, 10)?;
    println!("重写后的SQL: {}", rewritten);
    
    Ok(())
}

// 提取列名
fn extract_columns(ast: &Node) -> Vec<String> {
    let mut columns = Vec::new();
    
    if let NodeEnum::SelectStmt(select_stmt) = &ast.node {
        if let Some(target_list) = &select_stmt.target_list {
            for target in target_list {
                if let NodeEnum::ResTarget(res_target) = &target.node {
                    if let Some(name) = &res_target.name {
                        columns.push(name.clone());
                    } else if let Some(NodeEnum::ColumnRef(column_ref)) = res_target.val.as_ref().map(|n| &n.node) {
                        if let Some(fields) = &column_ref.fields {
                            for field in fields {
                                if let NodeEnum::String(str_node) = &field.node {
                                    columns.push(str_node.str.clone());
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    
    columns
}

// 提取表名
fn extract_table_name(ast: &Node) -> Option<String> {
    if let NodeEnum::SelectStmt(select_stmt) = &ast.node {
        if let Some(from_clause) = &select_stmt.from_clause {
            for from_item in from_clause {
                if let NodeEnum::RangeVar(range_var) = &from_item.node {
                    return range_var.relname.clone();
                }
            }
        }
    }
    None
}

// 提取查询条件
fn extract_conditions(ast: &Node) -> Vec<String> {
    let mut conditions = Vec::new();
    
    if let NodeEnum::SelectStmt(select_stmt) = &ast.node {
        if let Some(where_clause) = &select_stmt.where_clause {
            extract_conditions_recursive(where_clause, &mut conditions);
        }
    }
    
    conditions
}

fn extract_conditions_recursive(node: &Node, conditions: &mut Vec<String>) {
    match &node.node {
        NodeEnum::A_Expr(a_expr) => {
            if let (Some(left), Some(right)) = (&a_expr.lexpr, &a_expr.rexpr) {
                let left_str = node_to_string(left);
                let right_str = node_to_string(right);
                let op = a_expr.name.as_ref().and_then(|names| {
                    names.first().and_then(|name| {
                        if let NodeEnum::String(str_node) = &name.node {
                            Some(str_node.str.clone())
                        } else {
                            None
                        }
                    })
                }).unwrap_or_default();
                
                conditions.push(format!("{} {} {}", left_str, op, right_str));
            }
        }
        NodeEnum::BoolExpr(bool_expr) => {
            if let Some(args) = &bool_expr.args {
                for arg in args {
                    extract_conditions_recursive(arg, conditions);
                }
            }
        }
        _ => {}
    }
}

fn node_to_string(node: &Node) -> String {
    match &node.node {
        NodeEnum::ColumnRef(column_ref) => {
            if let Some(fields) = &column_ref.fields {
                fields.iter()
                    .filter_map(|f| {
                        if let NodeEnum::String(str_node) = &f.node {
                            Some(str_node.str.clone())
                        } else {
                            None
                        }
                    })
                    .collect::<Vec<String>>()
                    .join(".")
            } else {
                "?".to_string()
            }
        }
        NodeEnum::A_Const(a_const) => {
            if let Some(val) = &a_const.val {
                match &val.node {
                    NodeEnum::Integer(int_node) => int_node.ival.to_string(),
                    NodeEnum::String(str_node) => format!("'{}'", str_node.str),
                    NodeEnum::Float(float_node) => float_node.str.clone(),
                    _ => "?".to_string(),
                }
            } else {
                "?".to_string()
            }
        }
        _ => "?".to_string(),
    }
}

// 遍历AST节点
fn walk_ast(node: &Node, depth: usize) {
    let indent = "  ".repeat(depth);
    
    match &node.node {
        NodeEnum::SelectStmt(_) => {
            println!("{}Select语句", indent);
        }
        NodeEnum::ColumnRef(column) => {
            if let Some(fields) = &column.fields {
                let column_name = fields.iter()
                    .filter_map(|f| {
                        if let NodeEnum::String(str_node) = &f.node {
                            Some(str_node.str.clone())
                        } else {
                            None
                        }
                    })
                    .collect::<Vec<String>>()
                    .join(".");
                println!("{}列引用: {}", indent, column_name);
            }
        }
        NodeEnum::RangeVar(range_var) => {
            if let Some(name) = &range_var.relname {
                println!("{}表引用: {}", indent, name);
            }
        }
        NodeEnum::A_Expr(a_expr) => {
            let op = a_expr.name.as_ref().and_then(|names| {
                names.first().and_then(|name| {
                    if let NodeEnum::String(str_node) = &name.node {
                        Some(str_node.str.clone())
                    } else {
                        None
                    }
                })
            }).unwrap_or_default();
            println!("{}表达式操作: {}", indent, op);
        }
        NodeEnum::SortBy(sort_by) => {
            if let Some(sort_dir) = &sort_by.sortby_dir {
                let direction = match sort_dir {
                    0 => "ASC",
                    1 => "DESC",
                    _ => "UNKNOWN",
                };
                println!("{}排序方向: {}", indent, direction);
            }
        }
        _ => {}
    }
    
    // 递归处理所有子节点
    if let Some(children) = node.get_children() {
        for child in children {
            walk_ast(&child, depth + 1);
        }
    }
}

// 查询重写:添加LIMIT子句
fn rewrite_query_add_limit(sql: &str, limit: i32) -> Result<String, Box<dyn Error>> {
    let parsed = pg_query::parse(sql)?;
    
    if let Some(mut parse_tree) = parsed.parse_tree {
        // 修改AST添加LIMIT子句
        if let NodeEnum::SelectStmt(select_stmt) = &mut parse_tree.node {
            select_stmt.limit_count = Some(Node {
                node: NodeEnum::A_Const {
                    val: Some(Box::new(Node {
                        node: NodeEnum::Integer {
                            ival: limit,
                        },
                    })),
                },
            });
        }
        
        // 将修改后的AST重新转换为SQL
        let rewritten_sql = pg_query::deparse(&parse_tree)?;
        Ok(rewritten_sql)
    } else {
        Err("解析失败".into())
    }
}

实际应用场景

  1. SQL语法检查器:验证SQL语句的正确性
  2. 查询优化器:分析查询结构进行优化
  3. 权限检查:解析查询以进行访问控制
  4. 查询日志分析:标准化和分类查询
  5. SQL格式化工具:美化SQL输出

注意事项

  • 该库基于PostgreSQL的解析器,支持PostgreSQL特有的语法
  • 解析大型查询时注意性能影响
  • AST结构可能随版本更新而变化

通过pg_query库,开发者可以在Rust应用中轻松实现强大的SQL处理功能,充分利用PostgreSQL成熟的查询解析能力。

回到顶部