Vector Search Guide¶

Complete guide to semantic search in NornicDB

Last Updated: December 11, 2025

Overview¶

NornicDB provides production-ready vector search with:

Automatic indexing - All node embeddings are indexed automatically
Cypher integration - db.index.vector.queryNodes procedure
String auto-embedding - Pass text, get results (no pre-computation)
GPU acceleration - 10-100x speedup with Metal/CUDA/OpenCL
Hybrid search - RRF fusion of vector + BM25
Caching - 450,000x speedup for repeated queries

Important: semantic search requires embeddings. Embedding generation is disabled by default in current releases; enable it with NORNICDB_EMBEDDING_ENABLED=true (or nornicdb serve --embedding-enabled) or provide vectors yourself.

How Vector Search Works¶

Two Types of Indexes¶

NornicDB maintains two complementary vector index systems:

1. Internal Automatic Index (Zero Configuration)¶

NornicDB automatically maintains an internal vector index that: - Indexes nodes with managed embeddings in node.ChunkEmbeddings (main embedding is ChunkEmbeddings[0]) - Updates automatically when nodes are created, updated, or deleted - Requires no setup - works out of the box - Used by REST API (/nornicdb/search) and hybrid search

// This happens automatically at database startup:
db.searchService = search.NewServiceWithDimensions(storage, 1024)

// Nodes are indexed automatically via storage callbacks:
// OnNodeCreated → searchService.IndexNode(node)
// OnNodeUpdated → searchService.IndexNode(node)  
// OnNodeDeleted → searchService.RemoveNode(nodeID)

search.Service.IndexNode() also indexes node.NamedEmbeddings (client-managed vectors, e.g. Qdrant gRPC) under IDs like nodeID-named-{vectorName}.

2. User-Defined Cypher Indexes (Optional)¶

Create named indexes for specific labels/properties:

CALL db.index.vector.createNodeIndex(
  'embeddings',      -- Your index name
  'Document',        -- Node label to filter
  'embedding',       -- Property name to search (also used as a NamedEmbeddings key if present)
  1024,              -- Vector dimensions
  'cosine'           -- Similarity: 'cosine', 'euclidean', or 'dot'
)

Key insight: User-defined indexes are metadata only - they specify which nodes to search and where to find embeddings. The actual embeddings come from either: 1. node.NamedEmbeddings[index.property] (or "default" when no property is configured) 2. The specified property (e.g., node.Properties["embedding"] when it contains a vector array) 3. node.ChunkEmbeddings[0..N] (best score across chunks)

Embedding Lookup Order (Cypher `db.index.vector.queryNodes`)¶

When db.index.vector.queryNodes runs, it finds embeddings in this order:

1. NamedEmbeddings[index.property] (or "default")
2. node.Properties[index.property] (vector array)
3. ChunkEmbeddings[0..N]

This means user-defined indexes can match managed embeddings (via ChunkEmbeddings) and/or property vectors.

Quick Start¶

Cypher (Recommended)¶

-- String query (auto-embedded)
CALL db.index.vector.queryNodes('embeddings', 10, 'machine learning tutorial')
YIELD node, score
RETURN node.title, score
ORDER BY score DESC

-- Direct vector array (Neo4j compatible)
CALL db.index.vector.queryNodes('embeddings', 10, [0.1, 0.2, 0.3, 0.4])
YIELD node, score

Go API¶

// Search for similar content
results, err := db.Search(ctx, "AI and learning algorithms", 10)
for _, result := range results {
    fmt.Printf("Found: %s (score: %.3f)\n", result.Title, result.Score)
}

Cypher Vector Search¶

`db.index.vector.queryNodes`¶

Parameter	Type	Description
`indexName`	String	Name of the vector index
`k`	Integer	Number of results to return
`queryInput`	Array/String/Parameter	Query vector or text

Query Input Types:

-- 1. String Query (Auto-Embedded) ✨ NORNICDB EXCLUSIVE
CALL db.index.vector.queryNodes('idx', 10, 'database performance')
YIELD node, score

-- 2. Direct Vector Array (Neo4j Compatible)
CALL db.index.vector.queryNodes('idx', 10, [0.1, 0.2, 0.3, 0.4])
YIELD node, score

-- 3. Parameter Reference
CALL db.index.vector.queryNodes('idx', 10, $queryVector)
YIELD node, score

Qdrant gRPC: Text Queries (Upstream `Points.Query`)¶

If you have the Qdrant gRPC endpoint enabled, you can also run text queries using the upstream Qdrant protobuf contract (no custom protos).

Requirements:

NORNICDB_QDRANT_GRPC_ENABLED=true
NORNICDB_EMBEDDING_ENABLED=true (needed to embed the query text)

Concept:

Use qdrant.Points/Query with Query.nearest(VectorInput.document(Document{text: ...})).

See Qdrant gRPC Endpoint for setup, configuration, and multi-language client examples.

Storing Embeddings via Cypher¶

-- Single property
MATCH (n:Document {id: 'doc1'})
SET n.embedding = [0.7, 0.2, 0.05, 0.05]

-- Multi-line SET with optional user metadata
MATCH (n:Document {id: 'doc1'})
SET n.embedding = [0.7, 0.2, 0.05, 0.05],
    n.embedding_dimensions = 1024,
    n.embedding_model = 'mxbai-embed-large',
    n.has_embedding = true

Creating Vector Indexes¶

CALL db.index.vector.createNodeIndex(
  'embeddings',      -- index name
  'Document',        -- node label  
  'embedding',       -- property name (also used as a NamedEmbeddings key if present)
  1024,              -- dimensions
  'cosine'           -- similarity function: 'cosine', 'euclidean', or 'dot'
)

💡 Tip: Managed embeddings are stored internally (ChunkEmbeddings + EmbedMeta). Even if a node has no node.Properties["embedding"], Cypher/vector search can still match it via managed/internal embeddings and/or NamedEmbeddings.

REST API (Hybrid Search)¶

The REST API uses NornicDB's internal automatic index for combined vector + BM25 search:

# Hybrid search (vector + BM25 with RRF fusion)
curl -X POST http://localhost:7474/nornicdb/search \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "query": "machine learning algorithms",
    "limit": 10,
    "labels": ["Document", "Memory"]
  }'

Response:

{
  "status": "ok",
  "results": [
    {
      "id": "node-123",
      "title": "ML Basics",
      "score": 0.92,
      "rrf_score": 0.034,
      "vector_rank": 1,
      "bm25_rank": 3
    }
  ],
  "search_method": "hybrid",
  "metrics": {
    "vector_search_time_ms": 12,
    "bm25_search_time_ms": 8,
    "fusion_time_ms": 1
  }
}

When to Use Each Approach¶

Use Case	Recommended Approach
General semantic search	REST API `/nornicdb/search`
Neo4j driver compatibility	`db.index.vector.queryNodes` with user index
Filter by specific label	User-defined index with label filter
Custom embedding property	User-defined index with property name
Use managed embeddings	Either (Cypher uses `ChunkEmbeddings`; HTTP uses `search.Service`)
Hybrid vector + keyword search	REST API (built-in RRF fusion)

Go API¶

Basic Search¶

// Generate embedding
embedder, _ := embed.New(&embed.Config{
    Provider: "ollama",
    APIUrl:   "http://localhost:11434",
    Model:    "mxbai-embed-large",
})

embedding, _ := embedder.Embed(ctx, "Machine learning is awesome")

// Store with embedding
memory := &nornicdb.Memory{
    Content:   "Machine learning enables computers to learn from data",
    Title:     "ML Basics",
    Embedding: embedding,
}
db.Store(ctx, memory)

// Search
results, _ := db.Search(ctx, "AI and learning algorithms", 10)

Batch Embedding¶

texts := []string{
    "Python is a programming language",
    "Go is fast and concurrent",
    "Rust provides memory safety",
}

embeddings, _ := embedder.BatchEmbed(ctx, texts)
// 2-5x faster than sequential embedding

Cached Embeddings (450,000x Speedup)¶

// Wrap any embedder with caching
cached := embed.NewCachedEmbedder(embedder, 10000) // 10K cache

// First call: ~50-200ms
emb1, _ := cached.Embed(ctx, "Hello world")

// Second call: ~111ns (450,000x faster!)
emb2, _ := cached.Embed(ctx, "Hello world")

// Check stats
stats := cached.Stats()
fmt.Printf("Cache: %.1f%% hit rate\n", stats.HitRate)

Server defaults:

nornicdb serve                        # 10K cache (~40MB)
nornicdb serve --embedding-cache 50000  # Larger cache
nornicdb serve --embedding-cache 0      # Disable

Async Embedding¶

autoEmbedder.QueueEmbed("doc-1", "Some content",
    func(nodeID string, embedding []float32, err error) {
        db.UpdateNodeEmbedding(nodeID, embedding)
    })

GPU Acceleration¶

Enable GPU¶

gpuConfig := &gpu.Config{
    Enabled:          true,
    PreferredBackend: gpu.BackendMetal, // or CUDA, OpenCL, Vulkan
    MaxMemoryMB:      8192,
}

manager, _ := gpu.NewManager(gpuConfig)
index := gpu.NewEmbeddingIndex(manager, gpu.DefaultEmbeddingIndexConfig(1024))

// Add embeddings and sync
for _, emb := range embeddings {
    index.Add(nodeID, emb)
}
index.SyncToGPU()

// Search (10-100x faster!)
results, _ := index.Search(queryEmbedding, 10)

GPU Backends¶

Backend	Platform	Performance	Notes
Metal	Apple Silicon	Excellent	Native M1/M2/M3
CUDA	NVIDIA	Highest	Requires toolkit
OpenCL	Cross-platform	Good	Best compatibility
Vulkan	Cross-platform	Good	Future-proof

Hybrid Search¶

Combines vector similarity with BM25 full-text search using RRF (Reciprocal Rank Fusion):

-- Via Cypher
CALL db.index.vector.queryNodes('memories', 20, 'authentication patterns')
YIELD node, score
WHERE node.type IN ['decision', 'code'] AND score >= 0.5
RETURN node

// Via Go API
vectorResults, _ := db.Search(ctx, "machine learning", 10)
fullTextResults, _ := db.SearchFullText(ctx, "machine learning", 10)
combined := mergeResults(vectorResults, fullTextResults)

Performance Tuning¶

Vector Strategy Selection¶

NornicDB chooses the fastest available vector-search strategy at runtime:

K-means clustered search (when clustering is enabled and clustered)
GPU brute-force (exact) (when GPU is enabled and the dataset is within the configured threshold)
CPU brute-force (exact) for small datasets
HNSW (ANN) for large CPU-only datasets

GPU brute-force is exact and typically stays competitive to much larger N than CPU brute-force due to massive parallelism. Once brute-force becomes too slow (or GPU is unavailable), the pipeline switches to HNSW.

Tuning knobs:

# Use GPU brute-force when N is in this range (defaults shown)
export NORNICDB_VECTOR_GPU_BRUTE_MIN_N=5000
export NORNICDB_VECTOR_GPU_BRUTE_MAX_N=15000

Compressed ANN profile (`quality=compressed`)¶

NornicDB also supports a compressed ANN profile for large-scale memory economics:

export NORNICDB_VECTOR_ANN_QUALITY=compressed

When enabled, the query path uses IVF/PQ compressed candidate generation with bounded exact reranking. If compressed prerequisites are not satisfied for a run, the service logs diagnostics and falls back safely.

High-level tradeoff (latest benchmark snapshot, averaged):

Dataset size ->      1500      3000      6000      12000
HNSW latency      ~5.81us   ~5.75us   ~5.83us   ~5.64us
IVFPQ latency    ~23.0us   ~42.6us   ~38.9us   ~48.7us

Dataset size ->        1500      3000      6000      12000
HNSW heap delta      ~1.56MiB  ~1.57MiB  ~1.57MiB  ~1.58MiB
IVFPQ heap delta     ~1.57MiB  ~2.08MiB  ~2.08MiB  ~2.08MiB

Use this profile when memory-scaled ANN operation is more important than lowest-latency single-query performance. For full knobs and operational tuning, see docs/operations/configuration.md ("Compressed ANN mode (IVFPQ)").

Dimensions¶

Dimensions	Speed	Quality	Model Examples
384	Fast	Good	all-MiniLM-L6-v2
768	Balanced	Better	e5-base
1024	Slower	Best	mxbai-embed-large
3072	Slowest	Highest	OpenAI ada-002

Similarity Thresholds¶

db.Search(ctx, query, 10, 0.9) // Very similar only
db.Search(ctx, query, 10, 0.7) // Moderately similar
db.Search(ctx, query, 10, 0.0) // All results

Tips¶

Use caching - 450,000x speedup for repeated queries
Enable GPU - 10-100x speedup for search
Set thresholds - Eliminate weak matches early
Batch operations - 2-5x faster than sequential

Common Patterns¶

RAG (Retrieval-Augmented Generation)¶

// 1. Search for context
results, _ := db.Search(ctx, userQuery, 5)

// 2. Build context
context := ""
for _, r := range results {
    context += r.Content + "\n"
}

// 3. Generate with context
response := llm.Generate(userQuery, context)

Semantic K-Means Clustering¶

results, _ := db.Search(ctx, seed, 100)
clusters := groupBySimilarity(results, 0.8)

Configuration¶

Environment Variables¶

NORNICDB_EMBEDDING_ENABLED=true
NORNICDB_EMBEDDING_API_URL=http://localhost:8080
NORNICDB_EMBEDDING_MODEL=mxbai-embed-large
NORNICDB_EMBEDDING_DIMENSIONS=1024
NORNICDB_EMBEDDING_CACHE_SIZE=10000
NORNICDB_KMEANS_MIN_EMBEDDINGS=1000  # Minimum embeddings before K-means clustering

K-Means cluster count (auto by default): - The number of clusters is chosen from the dataset size when clustering runs: K ≈ √(n/2) (min 10, max 8192). For ~900k embeddings this yields ~670 clusters (~1350 vectors per cluster) instead of a fixed 100. - Override with NORNICDB_KMEANS_NUM_CLUSTERS=500 (or any positive value) to use a fixed K.

K-Means Clustering Threshold: - NORNICDB_KMEANS_MIN_EMBEDDINGS (default: 1000): Minimum number of embeddings required before K-means clustering is triggered. Below this threshold, brute-force search is used as it's faster for small datasets.

Performance Scaling (Benchmarked): - 2,000 embeddings: 14% faster (61ms → 65ms avg) - 4,500 embeddings: 26% faster (35ms → 47ms avg) - 10,000+ embeddings: 10-50x faster

Tuning: - 1000 (default): Safe for most workloads, proven benefit - 500-1000: Latency-sensitive applications (14-26% speedup) - 100-500: Testing or small datasets - 2000+: Very large datasets, maximize speedup

Verify Status¶

curl http://localhost:8080/health
# Check "embedding" section

NornicDB vs Neo4j¶

Feature	Neo4j GDS	NornicDB
Vector array queries	✅	✅
String auto-embedding	❌	✅
Multi-line SET with arrays	❌	✅
Native embedding field	❌	✅
Server-side embedding	❌	✅
GPU acceleration	❌	✅
Embedding cache	❌	✅

Troubleshooting¶

Issue	Solution
Slow search	Enable GPU, use caching, reduce dimensions
Poor results	Increase dimensions, lower threshold, use hybrid
Out of memory	Reduce batch size, enable GPU (uses VRAM)
No embedder error	Configure embedding service or use vector arrays
Dimension mismatch	Ensure all embeddings use same model

GPU K-Means - GPU clustering
Functions Index - Vector similarity functions
Search Implementation - Hybrid search internals

Last updated: December 1, 2025

Vector Search Guide¶

Overview¶

How Vector Search Works¶

Two Types of Indexes¶

1. Internal Automatic Index (Zero Configuration)¶

2. User-Defined Cypher Indexes (Optional)¶

Embedding Lookup Order (Cypher db.index.vector.queryNodes)¶

Quick Start¶

Cypher (Recommended)¶

Go API¶

Cypher Vector Search¶

db.index.vector.queryNodes¶

Qdrant gRPC: Text Queries (Upstream Points.Query)¶

Storing Embeddings via Cypher¶

Creating Vector Indexes¶

REST API (Hybrid Search)¶

When to Use Each Approach¶

Go API¶

Basic Search¶

Batch Embedding¶

Cached Embeddings (450,000x Speedup)¶

Async Embedding¶

GPU Acceleration¶

Enable GPU¶

GPU Backends¶

Hybrid Search¶

Performance Tuning¶

Vector Strategy Selection¶

Compressed ANN profile (quality=compressed)¶

Dimensions¶

Similarity Thresholds¶

Tips¶

Common Patterns¶

RAG (Retrieval-Augmented Generation)¶

Semantic K-Means Clustering¶

Configuration¶

Environment Variables¶

Verify Status¶

NornicDB vs Neo4j¶

Troubleshooting¶

Related Docs¶

Embedding Lookup Order (Cypher `db.index.vector.queryNodes`)¶

`db.index.vector.queryNodes`¶

Qdrant gRPC: Text Queries (Upstream `Points.Query`)¶

Compressed ANN profile (`quality=compressed`)¶