Full-Text Search
Full-text search enables keyword-based text retrieval using BM25 scoring. Milvus supports this through collection functions that automatically tokenize text and convert it to sparse vectors at insert and query time.
How It Works
Section titled “How It Works”- You define a VarChar field for text and a SparseFloatVector field for the generated vectors
- You add a BM25 function to the collection that maps the text field to the sparse vector field
- When you insert text, Milvus automatically tokenizes it and generates sparse vectors
- When you search, you pass text queries and Milvus automatically converts them to sparse vectors for matching
Testing Analyzers
Section titled “Testing Analyzers”Before creating a collection, you can test how an analyzer tokenizes text using runAnalyzer():
const result = await client.runAnalyzer({ text: ['machine learning is great', 'deep learning fundamentals'], analyzer_params: { type: 'standard', },});
console.log('Results:', result.results);// Each text input produces an AnalyzerResult with tokensYou can also test analyzers that are configured on a collection field, request detailed token data, and include token hashes:
const result = await client.runAnalyzer({ db_name: 'default', collection_name: 'articles', field_name: 'body', analyzer_names: ['english_analyzer'], text: ['Running analyzers with offsets and hashes'], with_detail: true, with_hash: true,});
result.results[0].tokens.forEach(token => { console.log(token.token, token.start_offset, token.end_offset, token.hash);});runAnalyzer() accepts either analyzer_params for an ad-hoc analyzer configuration or collection context fields (db_name, collection_name, field_name, analyzer_names) to use analyzers defined in a schema.
Analyzer Types
Section titled “Analyzer Types”| Type | Description |
|---|---|
standard | Standard tokenizer with lowercase filter |
english | English-specific tokenizer with stemming |
chinese | Chinese text tokenizer |
Creating a Full-Text Search Collection
Section titled “Creating a Full-Text Search Collection”import { MilvusClient, DataType, FunctionType } from '@zilliz/milvus2-sdk-node';
const client = new MilvusClient({ address: 'localhost:19530' });
// 1. Create collection with text and sparse vector fieldsawait client.createCollection({ collection_name: 'articles', fields: [ { name: 'id', data_type: DataType.Int64, is_primary_key: true, autoID: true, }, { name: 'title', data_type: DataType.VarChar, max_length: 256, enable_analyzer: true, }, { name: 'body', data_type: DataType.VarChar, max_length: 10000, enable_analyzer: true, }, { name: 'sparse_vector', data_type: DataType.SparseFloatVector }, ], functions: [ { name: 'bm25', type: FunctionType.BM25, input_field_names: ['body'], output_field_names: ['sparse_vector'], }, ],});Adding Functions to Existing Collections
Section titled “Adding Functions to Existing Collections”You can also add a BM25 function to an existing collection:
await client.addCollectionFunction({ collection_name: 'articles', function: { name: 'title_bm25', type: FunctionType.BM25, input_field_names: ['title'], output_field_names: ['title_sparse'], params: {}, },});Searching with Full-Text
Section titled “Searching with Full-Text”Pass a text string as the search data. Milvus uses the BM25 function to convert it to a sparse vector automatically:
// Create index and loadawait client.createIndex({ collection_name: 'articles', field_name: 'sparse_vector', index_type: 'SPARSE_INVERTED_INDEX', metric_type: 'BM25',});
await client.loadCollectionSync({ collection_name: 'articles' });
// Insert documentsawait client.insert({ collection_name: 'articles', data: [ { title: 'Introduction to Machine Learning', body: 'Machine learning is a branch of AI...', }, { title: 'Deep Learning Basics', body: 'Deep learning uses neural networks...', }, ],});
// Search by textconst results = await client.search({ collection_name: 'articles', data: ['machine learning algorithms'], anns_field: 'sparse_vector', limit: 10, output_fields: ['title', 'body'],});
console.log('Results:', results.results);Highlighting Search Results
Section titled “Highlighting Search Results”Use the highlighter parameter to highlight matched text fragments in search results.
Lexical Highlighting
Section titled “Lexical Highlighting”Highlights exact keyword matches:
const results = await client.search({ collection_name: 'articles', data: ['machine learning'], anns_field: 'sparse_vector', limit: 10, output_fields: ['title', 'body'], highlighter: { type: 0, // HighlightType.Lexical pre_tags: ['<em>'], post_tags: ['</em>'], fragment_size: 100, num_of_fragments: 3, },});Semantic Highlighting
Section titled “Semantic Highlighting”Highlights semantically similar text:
const results = await client.search({ collection_name: 'articles', data: ['machine learning'], anns_field: 'sparse_vector', limit: 10, output_fields: ['title', 'body'], highlighter: { type: 1, // HighlightType.Semantic queries: ['machine learning'], input_fields: ['body'], pre_tags: ['<mark>'], post_tags: ['</mark>'], threshold: 0.5, },});Highlight Results
Section titled “Highlight Results”Search hits include highlight data when highlighting is enabled. Each highlighted field contains text fragments and optional scores.
results.results.forEach(hit => { console.log('Title:', hit.title); console.log('Highlight:', hit.highlight);
const bodyHighlight = hit.highlight?.body; bodyHighlight?.fragments.forEach((fragment, index) => { console.log('Fragment:', fragment); console.log('Score:', bodyHighlight.scores?.[index]); });});For lexical highlighting, set highlight_search_text: true to include the matched search text in highlight output when supported by Milvus. For semantic highlighting, set highlight_only: true to return only highlighted fragments instead of the full source text.
Managing Collection Functions
Section titled “Managing Collection Functions”// List functions on a collectionconst desc = await client.describeCollection({ collection_name: 'articles' });console.log('Functions:', desc.schema.functions);
// Alter a functionawait client.alterCollectionFunction({ collection_name: 'articles', function_name: 'bm25', function: { name: 'bm25', type: FunctionType.BM25, input_field_names: ['body'], output_field_names: ['sparse_vector'], params: { key: 'new_value' }, },});
// Drop a functionawait client.dropCollectionFunction({ collection_name: 'articles', function_name: 'bm25',});Next Steps
Section titled “Next Steps”- Learn about Hybrid Search for combining full-text with vector search
- Explore Data Types & Schemas for SparseFloatVector details
Commit
Section titled “Commit”git add docs/src/content/docs/advanced/full-text-search.mdxgit commit --signoff -m "docs: add full-text search documentation page"