Rust模糊搜索插件库nucleo的使用,nucleo提供高性能文本搜索与匹配功能
Rust模糊搜索插件库nucleo的使用,nucleo提供高性能文本搜索与匹配功能
Nucleo简介
nucleo是一个用Rust编写的高性能模糊匹配器,旨在填补与fzf和skim相同的用例。与fzf相比,nucleo具有显著更快的匹配算法。
主要特点:
- 使用与fzf完全相同的评分系统
 - 更准确的Smith-Waterman算法实现
 - 比skim快约6倍
 - 更好的Unicode支持
 
示例代码
下面是一个使用nucleo进行模糊搜索的完整示例:
use nucleo::pattern::Pattern;
use nucleo::Utf32String;
fn main() {
    // 创建一个模式(搜索查询)
    let pattern = Pattern::new("rust", nucleo::pattern::CaseMatching::Smart);
    
    // 准备要搜索的文本
    let haystack = Utf32String::from("Rust is a systems programming language");
    
    // 进行匹配
    let result = pattern.indices(&haystack);
    
    // 输出匹配结果
    if let Some((score, indices)) = result {
        println!("匹配成功!得分: {}", score);
        println!("匹配位置: {:?}", indices);
        
        // 可以高亮显示匹配的字符
        let mut highlighted = String::new();
        let mut last_pos = 0;
        for &pos in &indices {
            highlighted.push_str(&haystack[last_pos..pos]);
            highlighted.push('[');
            highlighted.push(haystack[pos]);
            highlighted.push(']');
            last_pos = pos + 1;
        }
        highlighted.push_str(&haystack[last_pos..]);
        println!("高亮结果: {}", highlighted);
    } else {
        println!("没有匹配到结果");
    }
}
完整示例demo
下面是一个更完整的nucleo使用示例,展示了如何对多个项目进行搜索并排序:
use nucleo::pattern::{Pattern, CaseMatching};
use nucleo::{Utf32String, Matcher, Config};
fn main() {
    // 创建匹配器配置
    let config = Config::DEFAULT;
    
    // 初始化匹配器
    let mut matcher = Matcher::new(config);
    
    // 准备多个搜索项
    let items = vec![
        "Rust programming language",
        "Python is easy to learn",
        "Java is widely used",
        "Go is fast and simple",
        "JavaScript for web development",
        "TypeScript is JavaScript with syntax for types",
        "C++ is complex but powerful"
    ];
    
    // 创建搜索模式
    let pattern = Pattern::new("prog", CaseMatching::Smart);
    
    // 对每个项目进行匹配并收集结果
    let mut results: Vec<(u16, usize)> = items.iter()
        .enumerate()
        .filter_map(|(idx, &item)| {
            let haystack = Utf32String::from(item);
            pattern.indices(&haystack).map(|(score, _)| (score, idx))
        })
        .collect();
    
    // 按匹配分数排序
    results.sort_by(|a, b| b.0.cmp(&a.0));
    
    // 输出排序后的结果
    println!("匹配结果(按相关性排序):");
    for (score, idx) in results {
        println!("得分: {} - {}", score, items[idx]);
    }
}
性能对比
在Linux内核源代码上进行基准测试的结果:
| 方法 | 平均时间 | 样本数 | 
|---|---|---|
| nucleo “never_matches” | 2.30 ms | 2,493/2,500 | 
| skim “never_matches” | 17.44 ms | 574/574 | 
| nucleo “copying” | 2.12 ms | 2,496/2,500 | 
| skim “copying” | 16.85 ms | 593/594 | 
安装
在Cargo.toml中添加:
nucleo = "0.5.0"
或者运行:
cargo add nucleo
实现细节
nucleo的模糊匹配算法基于Smith-Waterman算法(带有affine gaps),并进行了多项优化:
- 预分割Unicode
 - 积极的预过滤
 - 特殊处理ASCII
 - 对超长匹配的fallback机制
 
状态
nucleo已在helix-editor中使用,核心匹配器实现已经完成,不太可能看到重大变化。nucleo-matcher crate已经准备好广泛使用。
        
          1 回复
        
      
      
        Rust模糊搜索插件库nucleo使用指南
概述
nucleo是一个高性能的Rust模糊搜索库,专注于提供快速、灵活的文本搜索与匹配功能。它特别适合需要实时模糊搜索的应用场景,如代码编辑器、文件搜索工具或任何需要高效文本匹配的应用程序。
主要特性
- 高性能模糊匹配算法
 - Unicode支持
 - 可配置的匹配策略
 - 多线程支持
 - 低内存占用
 
安装
在Cargo.toml中添加依赖:
[dependencies]
nucleo = "0.1"  # 请检查最新版本
基本使用方法
简单模糊搜索
use nucleo::Matcher;
fn main() {
    let mut matcher = Matcher::new(nucleo::Config::DEFAULT);
    
    let pattern = "rs";
    let candidates = vec!["Rust", "JavaScript", "TypeScript", "Python"];
    
    for candidate in candidates {
        let score = matcher.fuzzy_match(candidate, pattern);
        if score.is_some() {
            println!("Match found: {} (score: {:?})", candidate, score);
        }
    }
}
更复杂的匹配配置
use nucleo::{Matcher, Config, Utf32Str};
fn main() {
    let config = Config {
        case_sensitive: false,  // 不区分大小写
        normalize: true,        // 标准化Unicode字符
        ..Config::DEFAULT
    };
    
    let mut matcher = Matcher::new(config);
    let pattern = "núcleo";
    let candidate = "NUCLEO";
    
    let score = matcher.fuzzy_match(candidate, pattern);
    println!("Match score: {:?}", score);
}
多线程搜索
use nucleo::{Matcher, Config, Utf32Str};
use rayon::prelude::*;
fn main() {
    let config = Config::DEFAULT;
    let matcher = Matcher::new(config);
    
    let pattern = "rust";
    let candidates = vec![
        "Rust Programming Language",
        "JavaScript: The Good Parts",
        "Programming Rust",
        "Effective TypeScript"
    ];
    
    let results: Vec<_> = candidates
        .par_iter()
        .filter_map(|candidate| {
            matcher.fuzzy_match(candidate, pattern)
                .map(|score| (candidate, score))
        })
        .collect();
    
    for (candidate, score) in results {
        println!("Match: {} (score: {})", candidate, score);
    }
}
高级用法
自定义匹配权重
use nucleo::{Matcher, Config, Utf32Str};
fn main() {
    let config = Config {
        bonus_consecutive: 5,   // 连续匹配加分
        bonus_word_start: 10,   // 单词开头匹配加分
        ..Config::DEFAULT
    };
    
    let mut matcher = Matcher::new(config);
    let pattern = "mod";
    let candidates = vec!["module.rs", "models.rs", "mod.rs"];
    
    for candidate in candidates {
        if let Some(score) = matcher.fuzzy_match(candidate, pattern) {
            println!("{} matches with score {}", candidate, score);
        }
    }
}
处理大文本
use nucleo::{Matcher, Config, Utf32Str};
fn main() {
    let mut matcher = Matcher::new(Config::DEFAULT);
    let pattern = "error";
    
    let large_text = "This is a large text containing multiple words. \
                     An error occurred while processing the request. \
                     The error code is 404.";
    
    // 分割文本为单词
    let words: Vec<&str> = large_text.split_whitespace().collect();
    
    for word in words {
        if let Some(score) = matcher.fuzzy_match(word, pattern) {
            println!("Matched word: {} (score: {})", word, score);
        }
    }
}
完整示例
下面是一个结合了多种功能的完整示例:
use nucleo::{Matcher, Config, Utf32Str};
use rayon::prelude::*;
fn main() {
    // 配置匹配器
    let config = Config {
        case_sensitive: false,  // 不区分大小写
        normalize: true,        // 标准化Unicode
        bonus_consecutive: 5,   // 连续匹配加分
        bonus_word_start: 10,   // 单词开头匹配加分
        ..Config::DEFAULT
    };
    
    // 创建Matcher实例
    let matcher = Matcher::new(config);
    
    // 搜索模式
    let pattern = "rust";
    
    // 候选文本集合
    let candidates = vec![
        "The Rust Programming Language",
        "Learning Rust",
        "Rust in Action",
        "Programming Rust: Fast, Safe Systems Development",
        "Beginning Rust: From Novice to Professional",
        "Rust for Rustaceans",
        "The Rust Programming Language (Covers Rust 2018)",
        "Rust Cookbook",
        "Rust in Practice",
        "Zero To Production In Rust"
    ];
    
    // 使用多线程进行模糊匹配
    let results: Vec<_> = candidates
        .par_iter()
        .filter_map(|candidate| {
            matcher.fuzzy_match(candidate, pattern)
                .map(|score| (candidate, score))
        })
        .collect();
    
    // 按分数降序排序结果
    let mut sorted_results = results;
    sorted_results.sort_by(|a, b| b.1.cmp(&a.1));
    
    // 输出匹配结果
    println!("匹配结果(按相关性排序):");
    for (i, (candidate, score)) in sorted_results.iter().enumerate() {
        println!("{}. {} (分数: {})", i + 1, candidate, score);
    }
    
    // 处理大文本示例
    println!("\n大文本搜索示例:");
    let large_text = "Rust is a multi-paradigm, general-purpose programming language. \
                     Rust emphasizes performance, type safety, and concurrency. \
                     Rust enforces memory safety—that is, that all references point to valid memory—\
                     without requiring the use of a garbage collector or reference counting. \
                     Rust is popular for systems programming but also offers high-level features.";
    
    let words: Vec<&str> = large_text.split_whitespace().collect();
    
    for word in words {
        if let Some(score) = matcher.fuzzy_match(word, pattern) {
            println!("匹配词: {} (分数: {})", word, score);
        }
    }
}
性能提示
- 重用Matcher实例以避免重复分配内存
 - 对于大量数据,考虑使用多线程处理
 - 根据实际需求调整Config参数以获得最佳性能
 
nucleo库提供了高度可配置的模糊搜索功能,可以根据具体应用场景调整匹配策略和权重设置。
        
      
                    
                  
                    
