Rust模糊搜索插件库nucleo的使用,nucleo提供高性能文本搜索与匹配功能

Rust模糊搜索插件库nucleo的使用,nucleo提供高性能文本搜索与匹配功能

Nucleo简介

nucleo是一个用Rust编写的高性能模糊匹配器,旨在填补与fzfskim相同的用例。与fzf相比,nucleo具有显著更快的匹配算法。

主要特点:

  • 使用与fzf完全相同的评分系统
  • 更准确的Smith-Waterman算法实现
  • 比skim快约6倍
  • 更好的Unicode支持

示例代码

下面是一个使用nucleo进行模糊搜索的完整示例:

use nucleo::pattern::Pattern;
use nucleo::Utf32String;

fn main() {
    // 创建一个模式(搜索查询)
    let pattern = Pattern::new("rust", nucleo::pattern::CaseMatching::Smart);
    
    // 准备要搜索的文本
    let haystack = Utf32String::from("Rust is a systems programming language");
    
    // 进行匹配
    let result = pattern.indices(&haystack);
    
    // 输出匹配结果
    if let Some((score, indices)) = result {
        println!("匹配成功!得分: {}", score);
        println!("匹配位置: {:?}", indices);
        
        // 可以高亮显示匹配的字符
        let mut highlighted = String::new();
        let mut last_pos = 0;
        for &pos in &indices {
            highlighted.push_str(&haystack[last_pos..pos]);
            highlighted.push('[');
            highlighted.push(haystack[pos]);
            highlighted.push(']');
            last_pos = pos + 1;
        }
        highlighted.push_str(&haystack[last_pos..]);
        println!("高亮结果: {}", highlighted);
    } else {
        println!("没有匹配到结果");
    }
}

完整示例demo

下面是一个更完整的nucleo使用示例,展示了如何对多个项目进行搜索并排序:

use nucleo::pattern::{Pattern, CaseMatching};
use nucleo::{Utf32String, Matcher, Config};

fn main() {
    // 创建匹配器配置
    let config = Config::DEFAULT;
    
    // 初始化匹配器
    let mut matcher = Matcher::new(config);
    
    // 准备多个搜索项
    let items = vec![
        "Rust programming language",
        "Python is easy to learn",
        "Java is widely used",
        "Go is fast and simple",
        "JavaScript for web development",
        "TypeScript is JavaScript with syntax for types",
        "C++ is complex but powerful"
    ];
    
    // 创建搜索模式
    let pattern = Pattern::new("prog", CaseMatching::Smart);
    
    // 对每个项目进行匹配并收集结果
    let mut results: Vec<(u16, usize)> = items.iter()
        .enumerate()
        .filter_map(|(idx, &item)| {
            let haystack = Utf32String::from(item);
            pattern.indices(&haystack).map(|(score, _)| (score, idx))
        })
        .collect();
    
    // 按匹配分数排序
    results.sort_by(|a, b| b.0.cmp(&a.0));
    
    // 输出排序后的结果
    println!("匹配结果(按相关性排序):");
    for (score, idx) in results {
        println!("得分: {} - {}", score, items[idx]);
    }
}

性能对比

在Linux内核源代码上进行基准测试的结果:

方法 平均时间 样本数
nucleo “never_matches” 2.30 ms 2,493/2,500
skim “never_matches” 17.44 ms 574/574
nucleo “copying” 2.12 ms 2,496/2,500
skim “copying” 16.85 ms 593/594

安装

在Cargo.toml中添加:

nucleo = "0.5.0"

或者运行:

cargo add nucleo

实现细节

nucleo的模糊匹配算法基于Smith-Waterman算法(带有affine gaps),并进行了多项优化:

  • 预分割Unicode
  • 积极的预过滤
  • 特殊处理ASCII
  • 对超长匹配的fallback机制

状态

nucleo已在helix-editor中使用,核心匹配器实现已经完成,不太可能看到重大变化。nucleo-matcher crate已经准备好广泛使用。


1 回复

Rust模糊搜索插件库nucleo使用指南

概述

nucleo是一个高性能的Rust模糊搜索库,专注于提供快速、灵活的文本搜索与匹配功能。它特别适合需要实时模糊搜索的应用场景,如代码编辑器、文件搜索工具或任何需要高效文本匹配的应用程序。

主要特性

  • 高性能模糊匹配算法
  • Unicode支持
  • 可配置的匹配策略
  • 多线程支持
  • 低内存占用

安装

在Cargo.toml中添加依赖:

[dependencies]
nucleo = "0.1"  # 请检查最新版本

基本使用方法

简单模糊搜索

use nucleo::Matcher;

fn main() {
    let mut matcher = Matcher::new(nucleo::Config::DEFAULT);
    
    let pattern = "rs";
    let candidates = vec!["Rust", "JavaScript", "TypeScript", "Python"];
    
    for candidate in candidates {
        let score = matcher.fuzzy_match(candidate, pattern);
        if score.is_some() {
            println!("Match found: {} (score: {:?})", candidate, score);
        }
    }
}

更复杂的匹配配置

use nucleo::{Matcher, Config, Utf32Str};

fn main() {
    let config = Config {
        case_sensitive: false,  // 不区分大小写
        normalize: true,        // 标准化Unicode字符
        ..Config::DEFAULT
    };
    
    let mut matcher = Matcher::new(config);
    let pattern = "núcleo";
    let candidate = "NUCLEO";
    
    let score = matcher.fuzzy_match(candidate, pattern);
    println!("Match score: {:?}", score);
}

多线程搜索

use nucleo::{Matcher, Config, Utf32Str};
use rayon::prelude::*;

fn main() {
    let config = Config::DEFAULT;
    let matcher = Matcher::new(config);
    
    let pattern = "rust";
    let candidates = vec![
        "Rust Programming Language",
        "JavaScript: The Good Parts",
        "Programming Rust",
        "Effective TypeScript"
    ];
    
    let results: Vec<_> = candidates
        .par_iter()
        .filter_map(|candidate| {
            matcher.fuzzy_match(candidate, pattern)
                .map(|score| (candidate, score))
        })
        .collect();
    
    for (candidate, score) in results {
        println!("Match: {} (score: {})", candidate, score);
    }
}

高级用法

自定义匹配权重

use nucleo::{Matcher, Config, Utf32Str};

fn main() {
    let config = Config {
        bonus_consecutive: 5,   // 连续匹配加分
        bonus_word_start: 10,   // 单词开头匹配加分
        ..Config::DEFAULT
    };
    
    let mut matcher = Matcher::new(config);
    let pattern = "mod";
    let candidates = vec!["module.rs", "models.rs", "mod.rs"];
    
    for candidate in candidates {
        if let Some(score) = matcher.fuzzy_match(candidate, pattern) {
            println!("{} matches with score {}", candidate, score);
        }
    }
}

处理大文本

use nucleo::{Matcher, Config, Utf32Str};

fn main() {
    let mut matcher = Matcher::new(Config::DEFAULT);
    let pattern = "error";
    
    let large_text = "This is a large text containing multiple words. \
                     An error occurred while processing the request. \
                     The error code is 404.";
    
    // 分割文本为单词
    let words: Vec<&str> = large_text.split_whitespace().collect();
    
    for word in words {
        if let Some(score) = matcher.fuzzy_match(word, pattern) {
            println!("Matched word: {} (score: {})", word, score);
        }
    }
}

完整示例

下面是一个结合了多种功能的完整示例:

use nucleo::{Matcher, Config, Utf32Str};
use rayon::prelude::*;

fn main() {
    // 配置匹配器
    let config = Config {
        case_sensitive: false,  // 不区分大小写
        normalize: true,        // 标准化Unicode
        bonus_consecutive: 5,   // 连续匹配加分
        bonus_word_start: 10,   // 单词开头匹配加分
        ..Config::DEFAULT
    };
    
    // 创建Matcher实例
    let matcher = Matcher::new(config);
    
    // 搜索模式
    let pattern = "rust";
    
    // 候选文本集合
    let candidates = vec![
        "The Rust Programming Language",
        "Learning Rust",
        "Rust in Action",
        "Programming Rust: Fast, Safe Systems Development",
        "Beginning Rust: From Novice to Professional",
        "Rust for Rustaceans",
        "The Rust Programming Language (Covers Rust 2018)",
        "Rust Cookbook",
        "Rust in Practice",
        "Zero To Production In Rust"
    ];
    
    // 使用多线程进行模糊匹配
    let results: Vec<_> = candidates
        .par_iter()
        .filter_map(|candidate| {
            matcher.fuzzy_match(candidate, pattern)
                .map(|score| (candidate, score))
        })
        .collect();
    
    // 按分数降序排序结果
    let mut sorted_results = results;
    sorted_results.sort_by(|a, b| b.1.cmp(&a.1));
    
    // 输出匹配结果
    println!("匹配结果(按相关性排序):");
    for (i, (candidate, score)) in sorted_results.iter().enumerate() {
        println!("{}. {} (分数: {})", i + 1, candidate, score);
    }
    
    // 处理大文本示例
    println!("\n大文本搜索示例:");
    let large_text = "Rust is a multi-paradigm, general-purpose programming language. \
                     Rust emphasizes performance, type safety, and concurrency. \
                     Rust enforces memory safety—that is, that all references point to valid memory—\
                     without requiring the use of a garbage collector or reference counting. \
                     Rust is popular for systems programming but also offers high-level features.";
    
    let words: Vec<&str> = large_text.split_whitespace().collect();
    
    for word in words {
        if let Some(score) = matcher.fuzzy_match(word, pattern) {
            println!("匹配词: {} (分数: {})", word, score);
        }
    }
}

性能提示

  1. 重用Matcher实例以避免重复分配内存
  2. 对于大量数据,考虑使用多线程处理
  3. 根据实际需求调整Config参数以获得最佳性能

nucleo库提供了高度可配置的模糊搜索功能,可以根据具体应用场景调整匹配策略和权重设置。

回到顶部