Skip to content
API Reference Zilliz Cloud Milvus Attu

Data Types & Schemas

Milvus supports various data types for storing different kinds of data. Understanding these types is essential for designing effective schemas.

  • Int8: 8-bit signed integer (-128 to 127)
  • Int16: 16-bit signed integer (-32,768 to 32,767)
  • Int32: 32-bit signed integer
  • Int64: 64-bit signed integer
import { DataType } from '@zilliz/milvus2-sdk-node';
{
name: 'age',
data_type: DataType.Int64,
}
  • Float: 32-bit floating point number
  • Double: 64-bit floating point number
{
name: 'score',
data_type: DataType.Float,
}
  • Bool: Boolean value (true/false)
{
name: 'is_active',
data_type: DataType.Bool,
}
  • VarChar: Variable-length string (requires max_length)
{
name: 'title',
data_type: DataType.VarChar,
max_length: 256,
}
  • JSON: JSON object
{
name: 'metadata',
data_type: DataType.JSON,
}

Geospatial data stored in WKT (Well-Known Text) format.

{
name: 'location',
data_type: DataType.Geometry,
}
// Insert as WKT string
{ location: 'POINT(121.47 31.23)' }

Timestamp with timezone information.

{
name: 'created_at',
data_type: DataType.Timestamptz,
}

Nested structured data type for complex objects.

{
name: 'metadata',
data_type: DataType.Struct,
}

Dense float vector (most common):

{
name: 'embedding',
data_type: DataType.FloatVector,
dim: 128, // Required: dimension of the vector
}

Binary vector (for binary embeddings):

{
name: 'binary_embedding',
data_type: DataType.BinaryVector,
dim: 128, // Must be multiple of 8
}

Sparse vectors store only non-zero values with their indices. Commonly used for BM25 full-text search and keyword-based retrieval.

{
name: 'sparse_vector',
data_type: DataType.SparseFloatVector,
}

Sparse vectors can be inserted in two formats:

Dictionary format (recommended):

// Keys are indices, values are float weights
{ 10: 0.5, 100: 0.3, 500: 0.8, 1200: 0.1 }

Array of tuples format:

// [index, value] pairs
[
[10, 0.5],
[100, 0.3],
[500, 0.8],
[1200, 0.1],
];

Note: Dimension is determined automatically from the maximum index across all vectors.

16-bit float vector:

{
name: 'f16_embedding',
data_type: DataType.Float16Vector,
dim: 128,
}

BFloat16 vector:

{
name: 'bf16_embedding',
data_type: DataType.BFloat16Vector,
dim: 128,
}

8-bit integer vector:

{
name: 'int8_embedding',
data_type: DataType.Int8Vector,
dim: 128,
}

Vector fields can be marked as nullable. Insert or upsert null for rows that do not have an embedding yet; query and search results return null for those fields.

{
name: 'optional_embedding',
data_type: DataType.FloatVector,
dim: 128,
nullable: true,
}
await client.insert({
collection_name: 'my_collection',
data: [
{ id: 1, optional_embedding: [/* 128 floats */] },
{ id: 2, optional_embedding: null },
],
});

Nullable vector payloads are supported for FloatVector, BinaryVector, Float16Vector, BFloat16Vector, SparseFloatVector, and Int8Vector.

Array of scalar values:

{
name: 'tags',
data_type: DataType.Array,
element_type: DataType.VarChar,
max_capacity: 100,
}

DataType.ArrayOfVector stores multiple vectors in a single field value. This is useful for documents split into chunks, multi-vector embeddings, or struct array payloads that contain vector arrays.

{
name: 'chunk_embeddings',
data_type: DataType.ArrayOfVector,
element_type: DataType.FloatVector,
dim: 128,
max_capacity: 32,
}
await client.insert({
collection_name: 'my_collection',
data: [
{
id: 1,
chunk_embeddings: [
[/* first 128-d vector */],
[/* second 128-d vector */],
],
},
],
});

Supported element vector types include dense, binary, float16, bfloat16, sparse, and int8 vector payloads where supported by Milvus.

Nested structure:

{
name: 'user_info',
data_type: DataType.Struct,
element_type: {
name: 'name',
data_type: DataType.VarChar,
max_length: 100,
},
}

Each field in a collection schema must include:

{
name: 'field_name', // Required: field name
data_type: DataType.Int64, // Required: data type
description: 'Field description', // Optional
is_primary_key: false, // Optional: primary key flag
autoID: false, // Optional: auto-generate IDs
max_length: 256, // Required for VarChar
dim: 128, // Required for vector types
nullable: false, // Optional: allow null values
default_value: undefined, // Optional scalar default value
external_field: undefined, // Optional: source field name for external collections
}

Every collection must have exactly one primary key field:

{
name: 'id',
data_type: DataType.Int64,
is_primary_key: true,
autoID: true, // Let Milvus generate IDs automatically
}

Or with manual IDs:

{
name: 'user_id',
data_type: DataType.Int64,
is_primary_key: true,
autoID: false, // You provide IDs
}
import { MilvusClient, DataType } from '@zilliz/milvus2-sdk-node';
const schema = [
{
name: 'id',
data_type: DataType.Int64,
is_primary_key: true,
autoID: true,
},
{
name: 'vector',
data_type: DataType.FloatVector,
dim: 128,
},
{
name: 'text',
data_type: DataType.VarChar,
max_length: 256,
},
];
await client.createCollection({
collection_name: 'my_collection',
fields: schema,
});
const schema = [
{
name: 'id',
data_type: DataType.Int64,
is_primary_key: true,
autoID: true,
},
{
name: 'text_vector',
data_type: DataType.FloatVector,
dim: 768,
},
{
name: 'image_vector',
data_type: DataType.FloatVector,
dim: 512,
},
{
name: 'metadata',
data_type: DataType.JSON,
},
];
await client.createCollection({
collection_name: 'my_collection',
fields: schema,
consistency_level: 'Bounded', // 'Strong', 'Session', 'Bounded', 'Eventually'
});

Enable dynamic schema to add fields without redefining the schema:

await client.createCollection({
collection_name: 'my_collection',
fields: [
{
name: 'id',
data_type: DataType.Int64,
is_primary_key: true,
autoID: true,
},
{
name: 'vector',
data_type: DataType.FloatVector,
dim: 128,
},
],
enable_dynamic_field: true, // Enable dynamic fields
});

With dynamic schema enabled, you can insert data with additional fields:

await client.insert({
collection_name: 'my_collection',
data: [
{
vector: [
/* ... */
],
dynamic_field_1: 'value1', // Automatically added
dynamic_field_2: 123, // Automatically added
},
],
});

The SDK validates schemas before creating collections. Common validation errors:

  • Missing primary key field
  • Multiple primary key fields
  • Missing dimension for vector fields
  • Missing max_length for VarChar fields
  • Invalid dimension for BinaryVector (must be multiple of 8)
  1. Choose appropriate data types: Use Int64 for IDs, FloatVector for embeddings
  2. Set reasonable max_length: For VarChar fields, set max_length based on expected content
  3. Use autoID: Let Milvus generate IDs unless you have specific requirements
  4. Enable dynamic schema: For flexible schemas that may change over time
  5. Document fields: Use descriptions to document field purposes