RAG Procedures (Cypher API)¶
NornicDB exposes the entire retrieval-augmented-generation pipeline as Cypher procedures. The mental model is "Cypher all the way down": write CALL db.<verb>({...}) with a map argument and YIELD the columns you want.
The four verbs¶
| Procedure | What it does | Stages |
|---|---|---|
db.retrieve(req) | Hybrid retrieval | vector + BM25, fused with RRF |
db.rretrieve(req) | Retrieve + rerank when reranker is configured | adds cross-encoder rerank if NORNICDB_SEARCH_RERANK_ENABLED=true |
db.rerank(req) | Standalone rerank over user-supplied candidates | cross-encoder only |
db.infer(req) | LLM generation | calls the configured inference manager |
All four take a single map argument. Most fields support both camelCase and snake_case; pick one and stick with it.
db.retrieve — hybrid retrieval¶
CALL db.retrieve({
query: 'authentication patterns', -- required
limit: 10, -- default 50
minSimilarity: 0.5, -- vector floor
types: ['Document', 'Memory'], -- label filter; alias: labels
rerankEnabled: false, -- explicit on/off
rerankTopK: 100,
rerankMinScore: 0.0,
embedding: $queryVector -- optional pre-computed; alias: queryEmbedding / query_embedding
})
YIELD node, score, rrf_score, vector_rank, bm25_rank, search_method, fallback_triggered
RETURN node.id, node.title, score, search_method
ORDER BY score DESC
Behavior: - If embedding is omitted and the embedder is configured, NornicDB embeds query server-side. - Adaptive RRF weights are picked from query length: 1–2 words → BM25-favored (V=0.5, B=1.5); 3–5 → balanced; 6+ → vector-favored (V=1.5, B=0.5). - search_method reports the winning path: rrf_hybrid, rrf_hybrid+rerank, vector_only, or bm25_only. - fallback_triggered: true means one strategy returned nothing and the engine fell back.
db.rretrieve — retrieve + auto-rerank¶
Same input shape as db.retrieve, but reranks when the reranker is available. Use this when you want "always rerank if you can":
CALL db.rretrieve({
query: 'zero-trust security architecture',
limit: 20,
rerankTopK: 100
})
YIELD node, score, search_method
RETURN node.id, score, search_method
If no reranker is configured, behaves identically to db.retrieve (no error).
db.rerank — standalone reranker¶
Use it when you already have candidates from another source (manual ranking, federated search, cached list) and want to apply the cross-encoder:
CALL db.rerank({
query: 'authentication patterns',
candidates: [
{ id: 'doc1', content: '...', score: 0.92 },
{ id: 'doc2', content: '...', score: 0.81 }
-- score is optional; aliases: bi_score, rrf_score
],
rerankTopK: 50
})
YIELD id, content, original_rank, new_rank, bi_score, cross_score, final_score
RETURN id, final_score
ORDER BY final_score DESC
db.rerank exercises the cross-encoder when one is configured (NORNICDB_SEARCH_RERANK_ENABLED=true plus a provider). Without a configured reranker the procedure still succeeds and returns candidates in pass-through order — the cross_score and final_score simply equal the input score, and rank order is unchanged. Candidates must include a non-empty id; missing content is allowed but will hurt rerank quality when the reranker is on.
db.infer — LLM generation¶
CALL db.infer({
prompt: 'Summarize: ' + $context, -- aliases: query
model: 'llama3:70b', -- optional; defaults to manager config
max_tokens: 256,
temperature: 0.0,
top_p: 1.0,
top_k: 0,
stop_tokens: ['<|eot|>']
})
YIELD text, structured, model, usage, latencyMs, finishReason
RETURN text, finishReason, latencyMs
Chat form (when the inference manager exposes a chat endpoint):
CALL db.infer({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'You are a concise assistant.' },
{ role: 'user', content: 'List three graph DB design patterns.' }
],
max_tokens: 200, temperature: 0.0
})
YIELD text, model, usage, finishReason
RETURN text, finishReason
Notes: - If text is valid JSON, structured is its parsed form. Otherwise structured is null. - usage is { prompt_tokens, completion_tokens, total_tokens } when the provider returns it. - db.infer requires an inference manager. If none is configured the procedure errors with inference manager is not configured.
End-to-end RAG in one query¶
CALL db.rretrieve({ query: $userQuestion, limit: 5 })
YIELD node, score
WITH collect(coalesce(node.content, toString(node))) AS context, $userQuestion AS q
CALL db.infer({
prompt: 'Use only this context:\n' + apoc.text.join(context, '\n---\n')
+ '\n\nQuestion: ' + q + '\nAnswer:',
max_tokens: 300,
temperature: 0.0
})
YIELD text
RETURN text AS answer
Argument shorthand¶
For one-off interactive use, db.retrieve accepts a string or $param directly and treats it as {query: <value>}:
CALL db.retrieve('authentication patterns') YIELD node, score RETURN node, score
CALL db.retrieve($q) YIELD node, score RETURN node, score
This shortcut is not available on db.rerank or db.infer, both of which require explicit map arguments.
When to use which¶
- "Just give me good results" →
db.retrieve(ordb.rretrieveif you want the rerank when available). - "I already have candidates from somewhere else" →
db.rerank. - "I need the model's answer, not search hits" →
db.infer. - "I want everything in one transaction" → chain them in a single Cypher statement.
Tuning knobs¶
| Knob | Where | Effect |
|---|---|---|
limit | db.retrieve request | Number of results returned |
minSimilarity | db.retrieve request | Drops vector results below this cosine score |
types / labels | db.retrieve request | Restricts to specific labels |
rerankTopK | db.retrieve / db.rretrieve / db.rerank | Pool size sent to the cross-encoder |
rerankMinScore | same | Floor on final_score after rerank |
temperature / top_p / top_k | db.infer | Sampling controls; lower = more deterministic |
max_tokens | db.infer | Length cap |
Failure modes worth knowing¶
- Empty results. Most often: index missing, no embedder configured, or
minSimilaritytoo high.db.retrievereturns zero rows rather than error. query is required. You calleddb.retrievewith a map that doesn't havequery(ortext).db.rerank requires non-empty candidates. You passed an empty list. Make sure your candidate-gathering subquery actually produced rows before calling.inference manager is not configured.db.inferonly works when the deployment has an LLM connector wired up. Check the runtime config.- Reranker degrades silently. If reranking fails or is unavailable, the engine returns the pre-rerank ranking and sets
search_methodto the non-rerank path. Don't expect a hard error.
Configuration recap¶
# Hybrid + adaptive RRF — no extra config; on by default
# Reranking
export NORNICDB_SEARCH_RERANK_ENABLED=true
export NORNICDB_SEARCH_RERANK_PROVIDER=local # local | ollama | openai | http
export NORNICDB_SEARCH_RERANK_MODEL=bge-reranker-v2-m3-Q4_K_M.gguf
export NORNICDB_SEARCH_RERANK_API_URL=... # for provider=http (Cohere, TEI, ...)
export NORNICDB_SEARCH_RERANK_API_KEY=...
Inference (db.infer) provider configuration is part of NornicDB's heimdall/inference subsystem and is configured separately from search.
See also¶
nornicdb-vector-search— the vector and full-text indexes the retrieval procedures sit on top of.nornicdb-managed-embeddings— generating the embeddings consumed by retrieval.nornicdb-knowledge-policies— suppression and decay scoring that filter the candidates retrieval sees.