RAG Procedures (Cypher API)¶

NornicDB exposes the entire retrieval-augmented-generation pipeline as Cypher procedures. The mental model is "Cypher all the way down": write CALL db.<verb>({...}) with a map argument and YIELD the columns you want.

The four verbs¶

Procedure	What it does	Stages
`db.retrieve(req)`	Hybrid retrieval	vector + BM25, fused with RRF
`db.rretrieve(req)`	Retrieve + rerank when reranker is configured	adds cross-encoder rerank if `NORNICDB_SEARCH_RERANK_ENABLED=true`
`db.rerank(req)`	Standalone rerank over user-supplied candidates	cross-encoder only
`db.infer(req)`	LLM generation	calls the configured inference manager

All four take a single map argument. Most fields support both camelCase and snake_case; pick one and stick with it.

`db.retrieve` — hybrid retrieval¶

CALL db.retrieve({
  query:          'authentication patterns',   -- required
  limit:          10,                          -- default 50
  minSimilarity:  0.5,                         -- vector floor
  types:          ['Document', 'Memory'],      -- label filter; alias: labels
  rerankEnabled:  false,                       -- explicit on/off
  rerankTopK:     100,
  rerankMinScore: 0.0,
  embedding:      $queryVector                 -- optional pre-computed; alias: queryEmbedding / query_embedding
})
YIELD node, score, rrf_score, vector_rank, bm25_rank, search_method, fallback_triggered
RETURN node.id, node.title, score, search_method
ORDER BY score DESC

Behavior: - If embedding is omitted and the embedder is configured, NornicDB embeds query server-side. - Adaptive RRF weights are picked from query length: 1–2 words → BM25-favored (V=0.5, B=1.5); 3–5 → balanced; 6+ → vector-favored (V=1.5, B=0.5). - search_method reports the winning path: rrf_hybrid, rrf_hybrid+rerank, vector_only, or bm25_only. - fallback_triggered: true means one strategy returned nothing and the engine fell back.

`db.rretrieve` — retrieve + auto-rerank¶

Same input shape as db.retrieve, but reranks when the reranker is available. Use this when you want "always rerank if you can":

CALL db.rretrieve({
  query:      'zero-trust security architecture',
  limit:      20,
  rerankTopK: 100
})
YIELD node, score, search_method
RETURN node.id, score, search_method

If no reranker is configured, behaves identically to db.retrieve (no error).

`db.rerank` — standalone reranker¶

Use it when you already have candidates from another source (manual ranking, federated search, cached list) and want to apply the cross-encoder:

CALL db.rerank({
  query:      'authentication patterns',
  candidates: [
    { id: 'doc1', content: '...', score: 0.92 },
    { id: 'doc2', content: '...', score: 0.81 }
    -- score is optional; aliases: bi_score, rrf_score
  ],
  rerankTopK: 50
})
YIELD id, content, original_rank, new_rank, bi_score, cross_score, final_score
RETURN id, final_score
ORDER BY final_score DESC

db.rerank exercises the cross-encoder when one is configured (NORNICDB_SEARCH_RERANK_ENABLED=true plus a provider). Without a configured reranker the procedure still succeeds and returns candidates in pass-through order — the cross_score and final_score simply equal the input score, and rank order is unchanged. Candidates must include a non-empty id; missing content is allowed but will hurt rerank quality when the reranker is on.

`db.infer` — LLM generation¶

CALL db.infer({
  prompt:      'Summarize: ' + $context,   -- aliases: query
  model:       'llama3:70b',               -- optional; defaults to manager config
  max_tokens:  256,
  temperature: 0.0,
  top_p:       1.0,
  top_k:       0,
  stop_tokens: ['<|eot|>']
})
YIELD text, structured, model, usage, latencyMs, finishReason
RETURN text, finishReason, latencyMs

Chat form (when the inference manager exposes a chat endpoint):

CALL db.infer({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'system', content: 'You are a concise assistant.' },
    { role: 'user',   content: 'List three graph DB design patterns.' }
  ],
  max_tokens: 200, temperature: 0.0
})
YIELD text, model, usage, finishReason
RETURN text, finishReason

Notes: - If text is valid JSON, structured is its parsed form. Otherwise structured is null. - usage is { prompt_tokens, completion_tokens, total_tokens } when the provider returns it. - db.infer requires an inference manager. If none is configured the procedure errors with inference manager is not configured.

End-to-end RAG in one query¶

CALL db.rretrieve({ query: $userQuestion, limit: 5 })
YIELD node, score
WITH collect(coalesce(node.content, toString(node))) AS context, $userQuestion AS q

CALL db.infer({
  prompt: 'Use only this context:\n' + apoc.text.join(context, '\n---\n')
        + '\n\nQuestion: ' + q + '\nAnswer:',
  max_tokens: 300,
  temperature: 0.0
})
YIELD text
RETURN text AS answer

Argument shorthand¶

For one-off interactive use, db.retrieve accepts a string or $param directly and treats it as {query: <value>}:

CALL db.retrieve('authentication patterns') YIELD node, score RETURN node, score
CALL db.retrieve($q) YIELD node, score RETURN node, score

This shortcut is not available on db.rerank or db.infer, both of which require explicit map arguments.

When to use which¶

"Just give me good results" → db.retrieve (or db.rretrieve if you want the rerank when available).
"I already have candidates from somewhere else" → db.rerank.
"I need the model's answer, not search hits" → db.infer.
"I want everything in one transaction" → chain them in a single Cypher statement.

Tuning knobs¶

Knob	Where	Effect
`limit`	`db.retrieve` request	Number of results returned
`minSimilarity`	`db.retrieve` request	Drops vector results below this cosine score
`types` / `labels`	`db.retrieve` request	Restricts to specific labels
`rerankTopK`	`db.retrieve` / `db.rretrieve` / `db.rerank`	Pool size sent to the cross-encoder
`rerankMinScore`	same	Floor on `final_score` after rerank
`temperature` / `top_p` / `top_k`	`db.infer`	Sampling controls; lower = more deterministic
`max_tokens`	`db.infer`	Length cap

Failure modes worth knowing¶

Empty results. Most often: index missing, no embedder configured, or minSimilarity too high. db.retrieve returns zero rows rather than error.
query is required. You called db.retrieve with a map that doesn't have query (or text).
db.rerank requires non-empty candidates. You passed an empty list. Make sure your candidate-gathering subquery actually produced rows before calling.
inference manager is not configured. db.infer only works when the deployment has an LLM connector wired up. Check the runtime config.
Reranker degrades silently. If reranking fails or is unavailable, the engine returns the pre-rerank ranking and sets search_method to the non-rerank path. Don't expect a hard error.

Configuration recap¶

# Hybrid + adaptive RRF — no extra config; on by default
# Reranking
export NORNICDB_SEARCH_RERANK_ENABLED=true
export NORNICDB_SEARCH_RERANK_PROVIDER=local      # local | ollama | openai | http
export NORNICDB_SEARCH_RERANK_MODEL=bge-reranker-v2-m3-Q4_K_M.gguf
export NORNICDB_SEARCH_RERANK_API_URL=...         # for provider=http (Cohere, TEI, ...)
export NORNICDB_SEARCH_RERANK_API_KEY=...

Inference (db.infer) provider configuration is part of NornicDB's heimdall/inference subsystem and is configured separately from search.