Python中如何实现一个简单的Flask全文搜索插件

flask 貌似很少全文搜索的插件,有一个Flask-WhooshAlchemy,但试了几次都用不了,所以参考 Flask-WhooshAlchemy 自己写了一个

插件基于 whoosh,纯 python 编写,使用上很简单

from flask_msearch import Search
[...]
search = Search()
search.init_app(app)
models.py
class Post(db.Model):
tablename = ‘post’
searchable = [‘title’, ‘content’]
views.py
@app.route("/search")
def w_search():
keyword = request.args.get(‘keyword’)
results = search.whoosh_search(Post,query=keyword,fields=[‘title’],limit=20)
return ‘’

如果要对已存在的数据创建索引

search.create_index()

自定义 analyzer

from jieba.analyse import ChineseAnalyzer
search = Search(analyzer=ChineseAnalyzer())

项目地址:https://github.com/honmaple/flask-msearch

可以查看演示:demo

(还有更多 whoosh 的功能还没加上)

Python中如何实现一个简单的Flask全文搜索插件

htzhanglong 1楼作者

简单的项目可以用用。其他还是上 ES 吧

bupafengyu 2楼

要做一个简单的Flask全文搜索插件，核心就是集成一个轻量级搜索引擎。用Whoosh或者SQLite的FTS5扩展都行，这里用Whoosh给你写个例子，因为它纯Python，集成起来方便。

首先，装好包：pip install whoosh flask

然后，我们搞个简单的插件结构。主要思路是：初始化索引、提供搜索函数、在蓝图中暴露搜索接口。

# search_plugin.py
import os
from whoosh import index
from whoosh.fields import Schema, TEXT, ID
from whoosh.qparser import QueryParser
from flask import Blueprint, request, jsonify, current_app

class FlaskSearch:
    def __init__(self, app=None, index_dir='search_index'):
        self.index_dir = index_dir
        self.schema = Schema(
            id=ID(stored=True, unique=True),
            title=TEXT(stored=True),
            content=TEXT(stored=True)
        )
        if app is not None:
            self.init_app(app)

    def init_app(self, app):
        # 确保索引目录存在
        if not os.path.exists(self.index_dir):
            os.mkdir(self.index_dir)
        
        # 创建或打开索引
        if not index.exists_in(self.index_dir):
            self.ix = index.create_in(self.index_dir, self.schema)
        else:
            self.ix = index.open_dir(self.index_dir)
        
        # 注册蓝图
        self.register_blueprint(app)

    def register_blueprint(self, app):
        bp = Blueprint('search', __name__)

        @bp.route('/search')
        def search():
            query_str = request.args.get('q', '')
            if not query_str:
                return jsonify({'results': []})
            
            with self.ix.searcher() as searcher:
                query = QueryParser("content", self.ix.schema).parse(query_str)
                results = searcher.search(query, limit=10)
                return jsonify({
                    'results': [{
                        'id': hit['id'],
                        'title': hit['title'],
                        'score': hit.score
                    } for hit in results]
                })

        app.register_blueprint(bp)

    def add_document(self, doc_id, title, content):
        writer = self.ix.writer()
        writer.update_document(id=str(doc_id), title=title, content=content)
        writer.commit()

    def remove_document(self, doc_id):
        writer = self.ix.writer()
        writer.delete_by_term('id', str(doc_id))
        writer.commit()

用的时候在Flask应用里初始化这个插件，然后就能往索引里加文档和搜索了：

# app.py
from flask import Flask
from search_plugin import FlaskSearch

app = Flask(__name__)
search = FlaskSearch(app)

# 添加一些示例文档
with app.app_context():
    search.add_document(1, "Flask指南", "Flask是一个轻量级Web框架")
    search.add_document(2, "Python教程", "学习Python编程和Whoosh搜索")

if __name__ == '__main__':
    app.run(debug=True)

运行起来后，访问 http://localhost:5000/search?q=Flask 就能搜到结果了。这个插件实现了基本的增删查功能，通过蓝图提供了搜索接口。想扩展的话可以加更多字段、支持中文分词（用jieba配合Whoosh），或者搞个后台管理界面来管理索引。

总结：用Whoosh快速集成全文搜索功能。

vueper 3楼

然而搜标题美国并没有结果

phonegap100 4楼

分词建议用 jieba
ChineseAnalyzer 很弱

songsunli 5楼

分词引擎没有选好

itying888 6楼

嗯， ES 需要 java 环境，纯 python 的就 whoosh,简单方便

h691938207 7楼

是的,demo 上我使用默认的 StemmingAnalyzer 作为分词引擎,并没有使用 ChineseAnalyzer,后来我在本地上试过了,如果使用 ChineseAnalyzer 创建索引后是可以搜索到的

htzhanglong 8楼作者

ChineseAnalyzer 就是使用 jieba 实现的啊,不过使用 jieba 检索速度会有所下降

wuwangju 9楼

用 jieba 检索速度会下降是什么意思?
如果说做索引的速度会下降我还比较理解

yuanlaile 10楼

因为 jieba 是在每次使用时加载，而不是保存到内存中,使用时可以看出有明显的停顿

 Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 1.450 seconds. Prefix dict has been built succesfully. 

wuwangju 11楼

不好意思,我刚才再次验证了一下, jieba 只在第一次使用时会从 cache 中加载,之后就保存到内存中了,使用 jieba 对检索速度没有太大影响