Vector & Full-Text Search (Cypher API)¶
NornicDB exposes Neo4j-compatible vector and full-text indexes plus search procedures. Most queries that work on Neo4j 5 vector indexes work unchanged on NornicDB; the extensions are documented inline.
Vector indexes¶
Create¶
DDL form (Neo4j-compatible):
CREATE VECTOR INDEX docEmbeddings IF NOT EXISTS
FOR (n:Document) ON (n.embedding)
OPTIONS { indexConfig: {
`vector.dimensions`: 1024,
`vector.similarity_function`: 'cosine' -- 'cosine' | 'euclidean' | 'dot'
}}
Procedure form (handy from drivers that prefer CALL):
CALL db.index.vector.createNodeIndex(
'docEmbeddings', -- index name
'Document', -- node label
'embedding', -- property holding the vector
1024, -- dimensions
'cosine' -- similarity function
)
For relationship vectors:
CALL db.index.vector.createRelationshipIndex(
'edgeEmbeddings', 'CO_ACCESSED', 'embedding', 1024, 'cosine'
)
Drop:
Similarity functions¶
| Name | Range | Use when |
|---|---|---|
cosine (default) | [-1, 1] mapped to [0, 1] for ranking | Most embeddings — the model trained with normalized cosine |
dot | unbounded; valid only on normalized vectors | Speed-optimized cosine on guaranteed-unit-norm vectors |
euclidean | distance, not similarity | Use when the model was trained with L2 distance |
Dimensions¶
Every node indexed under a vector index must have the same dimension as the index's configured vector.dimensions. Mismatched dimensions are silently skipped on write, and produce 400 Bad Request from the search HTTP endpoint when queried.
| Model | Dim |
|---|---|
all-MiniLM-L6-v2 | 384 |
bge-m3, mxbai-embed-large, e5-large | 1024 |
text-embedding-3-small | 1536 |
text-embedding-3-large | 3072 |
Query¶
The flagship procedure accepts three input shapes. Pick whichever fits the call site:
-- 1. With an explicit vector (Neo4j-compatible)
CALL db.index.vector.queryNodes('docEmbeddings', 10, [0.1, 0.2, ...])
YIELD node, score
RETURN node.title, score
-- 2. With a parameter
:param queryVector => [0.1, 0.2, ...]
CALL db.index.vector.queryNodes('docEmbeddings', 10, $queryVector)
YIELD node, score
RETURN node, score
-- 3. With a string — NornicDB extension
-- The server embeds the string using the configured embedder.
CALL db.index.vector.queryNodes('docEmbeddings', 10, 'machine learning patterns')
YIELD node, score
RETURN node.title, score
ORDER BY score DESC
For relationships use db.index.vector.queryRelationships(indexName, k, query). Yields (relationship, score).
Filter results¶
The procedure already orders by score descending. Wrap with WHERE / SKIP / LIMIT for app-side filtering:
CALL db.index.vector.queryNodes('docEmbeddings', 50, $queryVector)
YIELD node, score
WHERE node.tenantId = $tenant AND score >= 0.6
RETURN node.id, score
LIMIT 10
Server-side embedding¶
db.index.vector.embed returns an embedding for arbitrary text using the configured embedder. Useful for chained calls or multi-stage retrieval:
CALL db.index.vector.embed('zero-trust authentication')
YIELD embedding
CALL db.index.vector.queryNodes('docEmbeddings', 20, embedding)
YIELD node, score
RETURN node, score
Requires NORNICDB_EMBEDDING_ENABLED=true. See the nornicdb-managed-embeddings skill.
Full-text (BM25) indexes¶
Create¶
CREATE FULLTEXT INDEX docFulltext IF NOT EXISTS
FOR (n:Document) ON EACH [n.title, n.content]
-- With analyzer option
CREATE FULLTEXT INDEX docFulltext IF NOT EXISTS
FOR (n:Document) ON EACH [n.title, n.content]
OPTIONS { analyzer: 'standard' }
-- Procedure form
CALL db.index.fulltext.createNodeIndex(
'docFulltext',
['Document'],
['title', 'content']
)
-- Available analyzers
CALL db.index.fulltext.listAvailableAnalyzers()
For relationships: db.index.fulltext.createRelationshipIndex(name, types[], properties[]).
Drop:
Query¶
CALL db.index.fulltext.queryNodes('docFulltext', 'authentication security')
YIELD node, score
RETURN node.title, score
ORDER BY score DESC
LIMIT 10
CALL db.index.fulltext.queryRelationships('edgeFulltext', 'co-occurrence')
YIELD relationship, score
RETURN relationship, score
score is BM25; higher is better. NornicDB's BM25 indexer also indexes properties beyond those listed in ON EACH for substring lookup, but only the listed properties contribute to the priority ranking.
Discovery¶
SHOW INDEXES -- standard Cypher form
CALL db.indexes() YIELD name, type, labelsOrTypes, properties, state
CALL db.index.stats() YIELD name, type, totalEntries, uniqueValues, selectivity
Hybrid + RAG¶
For combined vector + BM25 + RRF + optional rerank, see the nornicdb-rag-procedures skill (db.retrieve, db.rretrieve, db.rerank, db.infer).
Common pitfalls¶
- Dimensions must match. Build embeddings with the model the index expects. Mixing models silently corrupts results.
- Cosine vs dot vs euclidean. Pick the function that matches the model's training objective.
cosineis the right default unless you know otherwise. - String-in-vector procedure is a NornicDB extension. Drivers written against Neo4j may strip/escape the input.
db.index.vector.queryNodesrequires the index to exist. It does not auto-create indexes.- Vector index applies to nodes that have the property set. Nodes without the embedding property are simply not searchable through that index.
- Properties of type
LIST<FLOAT>are eligible. Mixed-type lists or shorter-than-expected vectors are dropped. - Suppression interaction. If decay is enabled, suppressed entities are filtered out of vector and BM25 results unless the query is wrapped in
reveal()(see the knowledge-policies skill).
Worked example¶
-- 1. Ensure both indexes
CREATE VECTOR INDEX docEmbeddings IF NOT EXISTS
FOR (n:Document) ON (n.embedding)
OPTIONS { indexConfig: { `vector.dimensions`: 1024, `vector.similarity_function`: 'cosine' } }
CREATE FULLTEXT INDEX docFulltext IF NOT EXISTS
FOR (n:Document) ON EACH [n.title, n.content]
-- 2. Insert with managed embedding
UNWIND $docs AS d
MERGE (n:Document {id: d.id})
SET n.title = d.title,
n.content = d.content
WITH EMBEDDING
RETURN count(n) AS upserted
-- 3. Query both
CALL db.index.vector.queryNodes('docEmbeddings', 20, 'graph database performance')
YIELD node, score
RETURN node.id, node.title, score
Per-database master switches¶
Each database has independent BM25 and vector master switches plus warming triggers. Defaults reproduce today's behaviour: both enabled, both built at startup. Any caller (HTTP, Bolt, GraphQL, gRPC, Cypher procedures) sees the same contract because the gating lives on search.Service.
| Setting (per DB or global) | What changes |
|---|---|
NORNICDB_SEARCH_BM25_ENABLED=false | No BM25 build, no fulltext docs ingested. db.index.fulltext.queryNodes returns empty. |
NORNICDB_SEARCH_VECTOR_ENABLED=false | No ANN substrate built or loaded. db.index.vector.queryNodes returns empty + WARN log; query does not error. |
NORNICDB_SEARCH_*_WARMING=lazy | First inbound search blocks synchronously while the build runs; subsequent calls are warm. |
Both *_ENABLED=false | POST /nornicdb/search returns 503 search_disabled_for_database, retryable: false. |
Health/liveness probes must NOT target /nornicdb/search for warming=lazy or any *_enabled=false database — every probe would either trigger the build or stream 503 responses that look like real failures. Use /nornicdb/health (DB-agnostic) or /admin/databases/{name}/config (lookup-only) instead.
See docs/operations/configuration.md#per-database-search-index-control for the full matrix and yaml examples.