Data Types & Schemas

Milvus supports various data types for storing different kinds of data. Understanding these types is essential for designing effective schemas.

Supported Data Types

Scalar Types

Integer Types

Int8: 8-bit signed integer (-128 to 127)
Int16: 16-bit signed integer (-32,768 to 32,767)
Int32: 32-bit signed integer
Int64: 64-bit signed integer


import { DataType } from '@zilliz/milvus2-sdk-node';
 
{
  name: 'age',
  data_type: DataType.Int64,
}

Floating Point Types

Float: 32-bit floating point number
Double: 64-bit floating point number


{
  name: 'score',
  data_type: DataType.Float,
}

Boolean Type

Bool: Boolean value (true/false)


{
  name: 'is_active',
  data_type: DataType.Bool,
}

String Types

VarChar: Variable-length string (requires max_length)


{
  name: 'title',
  data_type: DataType.VarChar,
  max_length: 256,
}

JSON Type

JSON: JSON object


{
  name: 'metadata',
  data_type: DataType.JSON,
}

Special Types

Geometry: Geometric data
Timestamptz: Timestamp with timezone

Vector Types

FloatVector

Dense float vector (most common):


{
  name: 'embedding',
  data_type: DataType.FloatVector,
  dim: 128, // Required: dimension of the vector
}

BinaryVector

Binary vector (for binary embeddings):


{
  name: 'binary_embedding',
  data_type: DataType.BinaryVector,
  dim: 128, // Must be multiple of 8
}

SparseFloatVector

Sparse float vector (for sparse embeddings):


{
  name: 'sparse_embedding',
  data_type: DataType.SparseFloatVector,
}

Sparse vectors can be represented in multiple formats:


// Array format (with undefined for zeros)
[1.0, undefined, undefined, 2.5, undefined]
 
// Dictionary format
{ '0': 1.0, '3': 2.5 }
 
// CSR format
{
  indices: [0, 3],
  values: [1.0, 2.5]
}
 
// COO format
[
  { index: 0, value: 1.0 },
  { index: 3, value: 2.5 }
]

Float16Vector

16-bit float vector:


{
  name: 'f16_embedding',
  data_type: DataType.Float16Vector,
  dim: 128,
}

BFloat16Vector

BFloat16 vector:


{
  name: 'bf16_embedding',
  data_type: DataType.BFloat16Vector,
  dim: 128,
}

Int8Vector

8-bit integer vector:


{
  name: 'int8_embedding',
  data_type: DataType.Int8Vector,
  dim: 128,
}

Complex Types

Array

Array of scalar values:


{
  name: 'tags',
  data_type: DataType.Array,
  element_type: DataType.VarChar,
  max_capacity: 100,
}

Struct

Nested structure:


{
  name: 'user_info',
  data_type: DataType.Struct,
  element_type: {
    name: 'name',
    data_type: DataType.VarChar,
    max_length: 100,
  },
}

Field Schema Definition

Each field in a collection schema must include:


{
  name: 'field_name',           // Required: field name
  data_type: DataType.Int64,    // Required: data type
  description: 'Field description', // Optional
  is_primary_key: false,        // Optional: primary key flag
  autoID: false,                // Optional: auto-generate IDs
  max_length: 256,              // Required for VarChar
  dim: 128,                     // Required for vector types
}

Primary Key Fields

Every collection must have exactly one primary key field:


{
  name: 'id',
  data_type: DataType.Int64,
  is_primary_key: true,
  autoID: true, // Let Milvus generate IDs automatically
}

Or with manual IDs:


{
  name: 'user_id',
  data_type: DataType.Int64,
  is_primary_key: true,
  autoID: false, // You provide IDs
}

Collection Schema Creation

Basic Schema


import { MilvusClient, DataType } from '@zilliz/milvus2-sdk-node';
 
const schema = [
  {
    name: 'id',
    data_type: DataType.Int64,
    is_primary_key: true,
    autoID: true,
  },
  {
    name: 'vector',
    data_type: DataType.FloatVector,
    dim: 128,
  },
  {
    name: 'text',
    data_type: DataType.VarChar,
    max_length: 256,
  },
];
 
await client.createCollection({
  collection_name: 'my_collection',
  fields: schema,
});

Schema with Multiple Vector Fields


const schema = [
  {
    name: 'id',
    data_type: DataType.Int64,
    is_primary_key: true,
    autoID: true,
  },
  {
    name: 'text_vector',
    data_type: DataType.FloatVector,
    dim: 768,
  },
  {
    name: 'image_vector',
    data_type: DataType.FloatVector,
    dim: 512,
  },
  {
    name: 'metadata',
    data_type: DataType.JSON,
  },
];

Schema with Consistency Level


await client.createCollection({
  collection_name: 'my_collection',
  fields: schema,
  consistency_level: 'Bounded', // 'Strong', 'Session', 'Bounded', 'Eventually'
});

Dynamic Schema

Enable dynamic schema to add fields without redefining the schema:


await client.createCollection({
  collection_name: 'my_collection',
  fields: [
    {
      name: 'id',
      data_type: DataType.Int64,
      is_primary_key: true,
      autoID: true,
    },
    {
      name: 'vector',
      data_type: DataType.FloatVector,
      dim: 128,
    },
  ],
  enable_dynamic_field: true, // Enable dynamic fields
});

With dynamic schema enabled, you can insert data with additional fields:


await client.insert({
  collection_name: 'my_collection',
  data: [
    {
      vector: [/* ... */],
      dynamic_field_1: 'value1', // Automatically added
      dynamic_field_2: 123,      // Automatically added
    },
  ],
});

Schema Validation

The SDK validates schemas before creating collections. Common validation errors:

Missing primary key field
Multiple primary key fields
Missing dimension for vector fields
Missing max_length for VarChar fields
Invalid dimension for BinaryVector (must be multiple of 8)

Schema Best Practices

Choose appropriate data types: Use Int64 for IDs, FloatVector for embeddings
Set reasonable max_length: For VarChar fields, set max_length based on expected content
Use autoID: Let Milvus generate IDs unless you have specific requirements
Enable dynamic schema: For flexible schemas that may change over time
Document fields: Use descriptions to document field purposes

Next Steps

Learn about Collection Management
Explore Data Operations
Check out Best Practices