Full-Text Search
Full-text search enables keyword-based text retrieval using BM25 scoring. Milvus supports this through collection functions that automatically tokenize text and convert it to sparse vectors at insert and query time.
How It Works
- You define a VarChar field for text and a SparseFloatVector field for the generated vectors
- You add a BM25 function to the collection that maps the text field to the sparse vector field
- When you insert text, Milvus automatically tokenizes it and generates sparse vectors
- When you search, you pass text queries and Milvus automatically converts them to sparse vectors for matching
Testing Analyzers
Before creating a collection, you can test how an analyzer tokenizes text using runAnalyzer():
const result = await client.runAnalyzer({
text: ['machine learning is great', 'deep learning fundamentals'],
analyzer_params: {
type: 'standard',
},
});
console.log('Results:', result.results);
// Each text input produces an AnalyzerResult with tokensAnalyzer Types
| Type | Description |
|---|---|
standard | Standard tokenizer with lowercase filter |
english | English-specific tokenizer with stemming |
chinese | Chinese text tokenizer |
Creating a Full-Text Search Collection
import { MilvusClient, DataType, FunctionType } from '@zilliz/milvus2-sdk-node';
const client = new MilvusClient({ address: 'localhost:19530' });
// 1. Create collection with text and sparse vector fields
await client.createCollection({
collection_name: 'articles',
fields: [
{ name: 'id', data_type: DataType.Int64, is_primary_key: true, autoID: true },
{ name: 'title', data_type: DataType.VarChar, max_length: 256, enable_analyzer: true },
{ name: 'body', data_type: DataType.VarChar, max_length: 10000, enable_analyzer: true },
{ name: 'sparse_vector', data_type: DataType.SparseFloatVector },
],
functions: [
{
name: 'bm25',
type: FunctionType.BM25,
input_field_names: ['body'],
output_field_names: ['sparse_vector'],
},
],
});Adding Functions to Existing Collections
You can also add a BM25 function to an existing collection:
await client.addCollectionFunction({
collection_name: 'articles',
function: {
name: 'title_bm25',
type: FunctionType.BM25,
input_field_names: ['title'],
output_field_names: ['title_sparse'],
params: {},
},
});Searching with Full-Text
Pass a text string as the search data. Milvus uses the BM25 function to convert it to a sparse vector automatically:
// Create index and load
await client.createIndex({
collection_name: 'articles',
field_name: 'sparse_vector',
index_type: 'SPARSE_INVERTED_INDEX',
metric_type: 'BM25',
});
await client.loadCollectionSync({ collection_name: 'articles' });
// Insert documents
await client.insert({
collection_name: 'articles',
data: [
{ title: 'Introduction to Machine Learning', body: 'Machine learning is a branch of AI...' },
{ title: 'Deep Learning Basics', body: 'Deep learning uses neural networks...' },
],
});
// Search by text
const results = await client.search({
collection_name: 'articles',
data: ['machine learning algorithms'],
anns_field: 'sparse_vector',
limit: 10,
output_fields: ['title', 'body'],
});
console.log('Results:', results.results);Highlighting Search Results
Use the highlighter parameter to highlight matched text fragments in search results.
Lexical Highlighting
Highlights exact keyword matches:
const results = await client.search({
collection_name: 'articles',
data: ['machine learning'],
anns_field: 'sparse_vector',
limit: 10,
output_fields: ['title', 'body'],
highlighter: {
type: 0, // HighlightType.Lexical
pre_tags: ['<em>'],
post_tags: ['</em>'],
fragment_size: 100,
num_of_fragments: 3,
},
});Semantic Highlighting
Highlights semantically similar text:
const results = await client.search({
collection_name: 'articles',
data: ['machine learning'],
anns_field: 'sparse_vector',
limit: 10,
output_fields: ['title', 'body'],
highlighter: {
type: 1, // HighlightType.Semantic
queries: ['machine learning'],
input_fields: ['body'],
pre_tags: ['<mark>'],
post_tags: ['</mark>'],
threshold: 0.5,
},
});Managing Collection Functions
// List functions on a collection
const desc = await client.describeCollection({ collection_name: 'articles' });
console.log('Functions:', desc.schema.functions);
// Alter a function
await client.alterCollectionFunction({
collection_name: 'articles',
function_name: 'bm25',
function: {
name: 'bm25',
type: FunctionType.BM25,
input_field_names: ['body'],
output_field_names: ['sparse_vector'],
params: { key: 'new_value' },
},
});
// Drop a function
await client.dropCollectionFunction({
collection_name: 'articles',
function_name: 'bm25',
});Next Steps
- Learn about Hybrid Search for combining full-text with vector search
- Explore Data Types & Schemas for SparseFloatVector details
Commit
git add docs/content/advanced/full-text-search.mdx
git commit --signoff -m "docs: add full-text search documentation page"Last updated on