Rust模糊搜索插件库nucleo的使用,nucleo提供高性能文本搜索与匹配功能
Rust模糊搜索插件库nucleo的使用,nucleo提供高性能文本搜索与匹配功能
Nucleo简介
nucleo
是一个用Rust编写的高性能模糊匹配器,旨在填补与fzf
和skim
相同的用例。与fzf
相比,nucleo
具有显著更快的匹配算法。
主要特点:
- 使用与fzf完全相同的评分系统
- 更准确的Smith-Waterman算法实现
- 比skim快约6倍
- 更好的Unicode支持
示例代码
下面是一个使用nucleo进行模糊搜索的完整示例:
use nucleo::pattern::Pattern;
use nucleo::Utf32String;
fn main() {
// 创建一个模式(搜索查询)
let pattern = Pattern::new("rust", nucleo::pattern::CaseMatching::Smart);
// 准备要搜索的文本
let haystack = Utf32String::from("Rust is a systems programming language");
// 进行匹配
let result = pattern.indices(&haystack);
// 输出匹配结果
if let Some((score, indices)) = result {
println!("匹配成功!得分: {}", score);
println!("匹配位置: {:?}", indices);
// 可以高亮显示匹配的字符
let mut highlighted = String::new();
let mut last_pos = 0;
for &pos in &indices {
highlighted.push_str(&haystack[last_pos..pos]);
highlighted.push('[');
highlighted.push(haystack[pos]);
highlighted.push(']');
last_pos = pos + 1;
}
highlighted.push_str(&haystack[last_pos..]);
println!("高亮结果: {}", highlighted);
} else {
println!("没有匹配到结果");
}
}
完整示例demo
下面是一个更完整的nucleo使用示例,展示了如何对多个项目进行搜索并排序:
use nucleo::pattern::{Pattern, CaseMatching};
use nucleo::{Utf32String, Matcher, Config};
fn main() {
// 创建匹配器配置
let config = Config::DEFAULT;
// 初始化匹配器
let mut matcher = Matcher::new(config);
// 准备多个搜索项
let items = vec![
"Rust programming language",
"Python is easy to learn",
"Java is widely used",
"Go is fast and simple",
"JavaScript for web development",
"TypeScript is JavaScript with syntax for types",
"C++ is complex but powerful"
];
// 创建搜索模式
let pattern = Pattern::new("prog", CaseMatching::Smart);
// 对每个项目进行匹配并收集结果
let mut results: Vec<(u16, usize)> = items.iter()
.enumerate()
.filter_map(|(idx, &item)| {
let haystack = Utf32String::from(item);
pattern.indices(&haystack).map(|(score, _)| (score, idx))
})
.collect();
// 按匹配分数排序
results.sort_by(|a, b| b.0.cmp(&a.0));
// 输出排序后的结果
println!("匹配结果(按相关性排序):");
for (score, idx) in results {
println!("得分: {} - {}", score, items[idx]);
}
}
性能对比
在Linux内核源代码上进行基准测试的结果:
方法 | 平均时间 | 样本数 |
---|---|---|
nucleo “never_matches” | 2.30 ms | 2,493/2,500 |
skim “never_matches” | 17.44 ms | 574/574 |
nucleo “copying” | 2.12 ms | 2,496/2,500 |
skim “copying” | 16.85 ms | 593/594 |
安装
在Cargo.toml中添加:
nucleo = "0.5.0"
或者运行:
cargo add nucleo
实现细节
nucleo的模糊匹配算法基于Smith-Waterman算法(带有affine gaps),并进行了多项优化:
- 预分割Unicode
- 积极的预过滤
- 特殊处理ASCII
- 对超长匹配的fallback机制
状态
nucleo已在helix-editor中使用,核心匹配器实现已经完成,不太可能看到重大变化。nucleo-matcher
crate已经准备好广泛使用。
1 回复
Rust模糊搜索插件库nucleo使用指南
概述
nucleo是一个高性能的Rust模糊搜索库,专注于提供快速、灵活的文本搜索与匹配功能。它特别适合需要实时模糊搜索的应用场景,如代码编辑器、文件搜索工具或任何需要高效文本匹配的应用程序。
主要特性
- 高性能模糊匹配算法
- Unicode支持
- 可配置的匹配策略
- 多线程支持
- 低内存占用
安装
在Cargo.toml中添加依赖:
[dependencies]
nucleo = "0.1" # 请检查最新版本
基本使用方法
简单模糊搜索
use nucleo::Matcher;
fn main() {
let mut matcher = Matcher::new(nucleo::Config::DEFAULT);
let pattern = "rs";
let candidates = vec!["Rust", "JavaScript", "TypeScript", "Python"];
for candidate in candidates {
let score = matcher.fuzzy_match(candidate, pattern);
if score.is_some() {
println!("Match found: {} (score: {:?})", candidate, score);
}
}
}
更复杂的匹配配置
use nucleo::{Matcher, Config, Utf32Str};
fn main() {
let config = Config {
case_sensitive: false, // 不区分大小写
normalize: true, // 标准化Unicode字符
..Config::DEFAULT
};
let mut matcher = Matcher::new(config);
let pattern = "núcleo";
let candidate = "NUCLEO";
let score = matcher.fuzzy_match(candidate, pattern);
println!("Match score: {:?}", score);
}
多线程搜索
use nucleo::{Matcher, Config, Utf32Str};
use rayon::prelude::*;
fn main() {
let config = Config::DEFAULT;
let matcher = Matcher::new(config);
let pattern = "rust";
let candidates = vec![
"Rust Programming Language",
"JavaScript: The Good Parts",
"Programming Rust",
"Effective TypeScript"
];
let results: Vec<_> = candidates
.par_iter()
.filter_map(|candidate| {
matcher.fuzzy_match(candidate, pattern)
.map(|score| (candidate, score))
})
.collect();
for (candidate, score) in results {
println!("Match: {} (score: {})", candidate, score);
}
}
高级用法
自定义匹配权重
use nucleo::{Matcher, Config, Utf32Str};
fn main() {
let config = Config {
bonus_consecutive: 5, // 连续匹配加分
bonus_word_start: 10, // 单词开头匹配加分
..Config::DEFAULT
};
let mut matcher = Matcher::new(config);
let pattern = "mod";
let candidates = vec!["module.rs", "models.rs", "mod.rs"];
for candidate in candidates {
if let Some(score) = matcher.fuzzy_match(candidate, pattern) {
println!("{} matches with score {}", candidate, score);
}
}
}
处理大文本
use nucleo::{Matcher, Config, Utf32Str};
fn main() {
let mut matcher = Matcher::new(Config::DEFAULT);
let pattern = "error";
let large_text = "This is a large text containing multiple words. \
An error occurred while processing the request. \
The error code is 404.";
// 分割文本为单词
let words: Vec<&str> = large_text.split_whitespace().collect();
for word in words {
if let Some(score) = matcher.fuzzy_match(word, pattern) {
println!("Matched word: {} (score: {})", word, score);
}
}
}
完整示例
下面是一个结合了多种功能的完整示例:
use nucleo::{Matcher, Config, Utf32Str};
use rayon::prelude::*;
fn main() {
// 配置匹配器
let config = Config {
case_sensitive: false, // 不区分大小写
normalize: true, // 标准化Unicode
bonus_consecutive: 5, // 连续匹配加分
bonus_word_start: 10, // 单词开头匹配加分
..Config::DEFAULT
};
// 创建Matcher实例
let matcher = Matcher::new(config);
// 搜索模式
let pattern = "rust";
// 候选文本集合
let candidates = vec![
"The Rust Programming Language",
"Learning Rust",
"Rust in Action",
"Programming Rust: Fast, Safe Systems Development",
"Beginning Rust: From Novice to Professional",
"Rust for Rustaceans",
"The Rust Programming Language (Covers Rust 2018)",
"Rust Cookbook",
"Rust in Practice",
"Zero To Production In Rust"
];
// 使用多线程进行模糊匹配
let results: Vec<_> = candidates
.par_iter()
.filter_map(|candidate| {
matcher.fuzzy_match(candidate, pattern)
.map(|score| (candidate, score))
})
.collect();
// 按分数降序排序结果
let mut sorted_results = results;
sorted_results.sort_by(|a, b| b.1.cmp(&a.1));
// 输出匹配结果
println!("匹配结果(按相关性排序):");
for (i, (candidate, score)) in sorted_results.iter().enumerate() {
println!("{}. {} (分数: {})", i + 1, candidate, score);
}
// 处理大文本示例
println!("\n大文本搜索示例:");
let large_text = "Rust is a multi-paradigm, general-purpose programming language. \
Rust emphasizes performance, type safety, and concurrency. \
Rust enforces memory safety—that is, that all references point to valid memory—\
without requiring the use of a garbage collector or reference counting. \
Rust is popular for systems programming but also offers high-level features.";
let words: Vec<&str> = large_text.split_whitespace().collect();
for word in words {
if let Some(score) = matcher.fuzzy_match(word, pattern) {
println!("匹配词: {} (分数: {})", word, score);
}
}
}
性能提示
- 重用Matcher实例以避免重复分配内存
- 对于大量数据,考虑使用多线程处理
- 根据实际需求调整Config参数以获得最佳性能
nucleo库提供了高度可配置的模糊搜索功能,可以根据具体应用场景调整匹配策略和权重设置。