Skip to content

RFC: Heimdall SLM Quality Control for Auto-TLP

Status: Proposal
Author: NornicDB Team
Created: December 2024

Summary

Add an optional LLM-based quality control layer to Auto-TLP (Automatic Topological Link prediction) that validates relationship suggestions before they're created. This "Heimdall" layer uses a small, local instruction-tuned model to review TLP's algorithmic suggestions and can optionally suggest additional relationships.

Motivation

Auto-TLP automatically creates edges based on: - Embedding similarity - Co-access patterns - Temporal proximity - Transitive inference

While these algorithms are fast and effective, they can produce false positives: - Similarity noise: Similar embeddings don't always mean meaningful relationships - Spurious co-access: Users might access unrelated nodes in the same session - Transitive errors: A→B and B→C doesn't always mean A should connect to C

An LLM can provide semantic validation that algorithms can't: - "These two notes are about the same project" βœ… - "These nodes share keywords but aren't actually related" ❌ - "This relationship would be more accurately typed as INSPIRED_BY" πŸ”„

Design Goals

  1. Opt-in via feature flags - Disabled by default, zero impact if not enabled
  2. Small model friendly - Works with 1-3B parameter instruction models
  3. Fail-open - LLM failures don't block edge creation
  4. Batch efficient - Multiple suggestions per LLM call
  5. Size aware - Gracefully handles large nodes that exceed context limits
  6. Augmentation capable - LLM can suggest edges TLP missed (optional)

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Auto-TLP Pipeline                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  Node Created/Accessed                                          β”‚
β”‚         β”‚                                                       β”‚
β”‚         β–Ό                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                          β”‚
β”‚  β”‚ TLP Algorithms   β”‚  Fast, algorithmic candidate generation  β”‚
β”‚  β”‚ β€’ Similarity     β”‚                                          β”‚
β”‚  β”‚ β€’ Co-access      β”‚                                          β”‚
β”‚  β”‚ β€’ Temporal       β”‚                                          β”‚
β”‚  β”‚ β€’ Transitive     β”‚                                          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                          β”‚
β”‚           β”‚                                                     β”‚
β”‚           β–Ό                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ LLM_QC Enabled?  │────▢│ Skip QC, return all candidates  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ No  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚ Yes                                                 β”‚
β”‚           β–Ό                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                          β”‚
β”‚  β”‚ Batch & Check    β”‚  Group candidates, check size limits     β”‚
β”‚  β”‚ Size Limits      β”‚                                          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                          β”‚
β”‚           β”‚                                                     β”‚
β”‚           β–Ό                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Prompt too big?  │────▢│ Log warning, pass batch through β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Yes β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚ No                                                  β”‚
β”‚           β–Ό                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                          β”‚
β”‚  β”‚ Heimdall SLM     β”‚  Local instruct model reviews batch      β”‚
β”‚  β”‚ Batch Review     β”‚                                          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                          β”‚
β”‚           β”‚                                                     β”‚
β”‚           β”œβ”€β”€β”€β”€β”€β”€β”€β”€ LLM Error ──────▢ Log, pass through        β”‚
β”‚           β”‚                                                     β”‚
β”‚           β–Ό                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                          β”‚
β”‚  β”‚ Parse Response   β”‚  Extract approved/rejected indices       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                          β”‚
β”‚           β”‚                                                     β”‚
β”‚           β”œβ”€β”€β”€β”€β”€β”€β”€β”€ Parse Error ────▢ Fuzzy parse or approve   β”‚
β”‚           β”‚                                                     β”‚
β”‚           β–Ό                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Augment Enabled? │────▢│ Include LLM's new suggestions   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Yes β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚ No                                                  β”‚
β”‚           β–Ό                                                     β”‚
β”‚  Return approved edges                                          β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Feature Flags

Flag Default Description
NORNICDB_AUTO_TLP_ENABLED ❌ Off Enable TLP candidate generation
NORNICDB_AUTO_TLP_LLM_QC_ENABLED ❌ Off Enable Heimdall batch review
NORNICDB_AUTO_TLP_LLM_AUGMENT_ENABLED ❌ Off Allow Heimdall to suggest new edges

Progressive enablement:

# Stage 1: TLP only (fast, no LLM)
export NORNICDB_AUTO_TLP_ENABLED=true

# Stage 2: TLP + Heimdall review (higher quality)
export NORNICDB_AUTO_TLP_ENABLED=true
export NORNICDB_AUTO_TLP_LLM_QC_ENABLED=true

# Stage 3: Full hybrid (TLP + review + augmentation)
export NORNICDB_AUTO_TLP_ENABLED=true
export NORNICDB_AUTO_TLP_LLM_QC_ENABLED=true
export NORNICDB_AUTO_TLP_LLM_AUGMENT_ENABLED=true

Unified SLM Architecture

Heimdall QC uses the same SLM instance as Bifrost commands: - Stateless: No context accumulates between calls - One-shot: Each call is independent, complete in single pass - KV Cache: Static system prompt cached, only data varies

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SINGLE SLM INSTANCE                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  KV Cache (static, loaded once):                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚ [Bifrost Commands] [Heimdall QC Instructions]       β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Per-call (dynamic):                                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚ Bifrost: "CREATE  β”‚  β”‚ Heimdall: "SRC:node-1[Note]  β”‚β”‚
β”‚  β”‚ (n:Person)"       β”‚  β”‚ EDGES:0.node-2β†’REL(80%)"    β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Prompt Format

System Prompt (static, KV cached):

Review graph edges. Output JSON only.
Format: {"approved":[indices],"rejected":[indices],"reasoning":"why"}
Approve if nodes are meaningfully related. Reject spam/duplicates.

User Content (dynamic, per-call):

SRC:node-123[Memory,Note]
 title:Machine Learning Basics
 content:Introduction to neural networks...
EDGES:
0.node-456β†’RELATES_TO(85%)
1.node-789β†’RELATES_TO(72%)

Response (JSON only):

{"approved":[0],"rejected":[1],"reasoning":"First related, second unrelated task"}

With augmentation:

{"approved":[0],"additional":[{"target_id":"node-999","type":"INSPIRED_BY","conf":0.8,"reason":"both discuss backprop"}]}

Configuration

type HeimdallQCConfig struct {
    Enabled               bool          // Master switch
    Timeout               time.Duration // Default: 10s
    MaxContextBytes       int           // Default: 4096 (~1000 tokens)
    MaxBatchSize          int           // Default: 5 suggestions per call
    MaxNodeSummaryLen     int           // Default: 200 chars per property
    MinConfidenceToReview float64       // Default: 0.5 (skip weak candidates)
    CacheDecisions        bool          // Default: true
    CacheTTL              time.Duration // Default: 1 hour
}

Error Handling

Principle: Fail-open, log, continue

Error Action
LLM timeout Log warning, approve batch, continue
LLM crash Log error, approve batch, continue
Invalid JSON Fuzzy parse or approve all
Prompt too large Log warning, skip review, pass through
Context cancelled Return immediately with current results

No retries - If the LLM fails, we don't retry. We log the decision made without LLM input and move on.

Usage Example

import (
    "github.com/orneryd/nornicdb/pkg/inference"
    "github.com/orneryd/nornicdb/pkg/config"
    "github.com/orneryd/nornicdb/pkg/heimdall"
)

// Heimdall QC uses the SAME Generator as Bifrost commands
// Direct llama.cpp via localllm - no HTTP calls
func setupHeimdallQC(generator heimdall.Generator) {
    systemPrompt := inference.GetSystemPrompt(config.IsAutoTLPLLMAugmentEnabled())

    heimdallFunc := func(ctx context.Context, userContent string) (string, error) {
        // Combine static system prompt + dynamic user content
        prompt := systemPrompt + "\n\n" + userContent
        return generator.Generate(ctx, prompt, heimdall.GenerateParams{
            MaxTokens:   256,
            Temperature: 0.1, // Low temp for deterministic QC
        })
    }

    qc := inference.NewHeimdallQC(heimdallFunc, nil)
    engine.SetHeimdallQC(qc)
}

// Both Bifrost commands and Heimdall QC share:
// - Same heimdall.Generator (in-memory llama.cpp)
// - Same KV cache (system prompts cached)
// - Stateless one-shot calls

Performance Expectations

Metric Without QC With QC
Latency per node ~5-20ms ~100-500ms
Edge quality Good Better
False positives Some Fewer
LLM calls 0 ~1 per 5 suggestions

Mitigations: - Batch processing reduces calls - Caching prevents redundant reviews - Size limits prevent slow large-context calls - Async processing possible for background indexing

Alternatives Considered

1. Pre-trained classifier

  • Pro: Faster than LLM
  • Con: Requires training data, less flexible
  • Decision: LLM is more adaptable to diverse data

2. Rule-based filtering

  • Pro: Zero latency
  • Con: Can't understand semantics
  • Decision: TLP already does this; LLM adds semantic layer

3. Post-hoc cleanup

  • Pro: Doesn't slow down creation
  • Con: Edges exist until cleaned; user sees noise
  • Decision: Better to validate before creation

Open Questions

  1. Batch size tuning: Is 5 the right default? Should it auto-tune based on model?

  2. Augmentation scope: Should augmented edges have lower initial confidence?

  3. Model recommendations: Which small models work best? (Qwen 1.5B? Phi-3? Gemma 2B?)

  4. Async mode: Should there be an option to review edges asynchronously?

Feedback Requested

  • Does this solve a real problem for your use case?
  • Are the feature flags granular enough?
  • What small models have you had success with?
  • Should there be a "strict mode" that blocks on LLM errors?
  • Other edge types Heimdall should suggest?

Implementation PR: [Link to PR when ready]

Related Issues: - #XXX Auto-TLP implementation - #XXX Edge decay system