Heimdall SLM Quality Control for Auto-TLP¶
Use the Heimdall SLM to validate Auto-TLP relationship suggestions before they are materialized.
This layer is implemented in pkg/inference (HeimdallQC) and is wired through the feature flags below. It is opt-in. When enabled, each batch of TLP-generated candidates is reviewed by the configured Heimdall SLM and only approved suggestions are turned into edges. With augmentation enabled, the SLM may also propose additional edges that TLP missed.
Motivation¶
Auto-TLP automatically creates edges based on: - Embedding similarity - Co-access patterns - Temporal proximity - Transitive inference
While these algorithms are fast and effective, they can produce false positives: - Similarity noise: Similar embeddings don't always mean meaningful relationships - Spurious co-access: Users might access unrelated nodes in the same session - Transitive errors: AβB and BβC doesn't always mean A should connect to C
An LLM can provide semantic validation that algorithms can't: - "These two notes are about the same project" β - "These nodes share keywords but aren't actually related" β - "This relationship would be more accurately typed as INSPIRED_BY" π
Design Goals¶
- Opt-in via feature flags - Disabled by default, zero impact if not enabled
- Small model friendly - Works with 1-3B parameter instruction models
- Fail-open - LLM failures don't block edge creation
- Batch efficient - Multiple suggestions per LLM call
- Size aware - Gracefully handles large nodes that exceed context limits
- Augmentation capable - LLM can suggest edges TLP missed (optional)
Architecture¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Auto-TLP Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Node Created/Accessed β
β β β
β βΌ β
β ββββββββββββββββββββ β
β β TLP Algorithms β Fast, algorithmic candidate generation β
β β β’ Similarity β β
β β β’ Co-access β β
β β β’ Temporal β β
β β β’ Transitive β β
β ββββββββββ¬ββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β LLM_QC Enabled? ββββββΆβ Skip QC, return all candidates β β
β ββββββββββ¬ββββββββββ No βββββββββββββββββββββββββββββββββββ β
β β Yes β
β βΌ β
β ββββββββββββββββββββ β
β β Batch & Check β Group candidates, check size limits β
β β Size Limits β β
β ββββββββββ¬ββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β Prompt too big? ββββββΆβ Log warning, pass batch through β β
β ββββββββββ¬ββββββββββ Yes βββββββββββββββββββββββββββββββββββ β
β β No β
β βΌ β
β ββββββββββββββββββββ β
β β Heimdall SLM β Local instruct model reviews batch β
β β Batch Review β β
β ββββββββββ¬ββββββββββ β
β β β
β βββββββββ LLM Error βββββββΆ Log, pass through β
β β β
β βΌ β
β ββββββββββββββββββββ β
β β Parse Response β Extract approved/rejected indices β
β ββββββββββ¬ββββββββββ β
β β β
β βββββββββ Parse Error βββββΆ Fuzzy parse or approve β
β β β
β βΌ β
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β Augment Enabled? ββββββΆβ Include LLM's new suggestions β β
β ββββββββββ¬ββββββββββ Yes βββββββββββββββββββββββββββββββββββ β
β β No β
β βΌ β
β Return approved edges β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Feature Flags¶
| Flag | Default | Description |
|---|---|---|
NORNICDB_AUTO_TLP_ENABLED | β Off | Enable TLP candidate generation |
NORNICDB_AUTO_TLP_LLM_QC_ENABLED | β Off | Enable Heimdall batch review |
NORNICDB_AUTO_TLP_LLM_AUGMENT_ENABLED | β Off | Allow Heimdall to suggest new edges |
Progressive enablement:
# Stage 1: TLP only (fast, no LLM)
export NORNICDB_AUTO_TLP_ENABLED=true
# Stage 2: TLP + Heimdall review (higher quality)
export NORNICDB_AUTO_TLP_ENABLED=true
export NORNICDB_AUTO_TLP_LLM_QC_ENABLED=true
# Stage 3: Full hybrid (TLP + review + augmentation)
export NORNICDB_AUTO_TLP_ENABLED=true
export NORNICDB_AUTO_TLP_LLM_QC_ENABLED=true
export NORNICDB_AUTO_TLP_LLM_AUGMENT_ENABLED=true
Unified SLM Architecture¶
Heimdall QC uses the same SLM instance as Bifrost commands: - Stateless: No context accumulates between calls - One-shot: Each call is independent, complete in single pass - KV Cache: Static system prompt cached, only data varies
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SINGLE SLM INSTANCE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β KV Cache (static, loaded once): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β [Bifrost Commands] [Heimdall QC Instructions] ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Per-call (dynamic): β
β βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ
β β Bifrost: "CREATE β β Heimdall: "SRC:node-1[Note] ββ
β β (n:Person)" β β EDGES:0.node-2βREL(80%)" ββ
β βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Prompt Format¶
System Prompt (static, KV cached):
Review graph edges. Output JSON only.
Format: {"approved":[indices],"rejected":[indices],"reasoning":"why"}
Approve if nodes are meaningfully related. Reject spam/duplicates.
User Content (dynamic, per-call):
SRC:node-123[Memory,Note]
title:Machine Learning Basics
content:Introduction to neural networks...
EDGES:
0.node-456βRELATES_TO(85%)
1.node-789βRELATES_TO(72%)
Response (JSON only):
With augmentation:
{"approved":[0],"additional":[{"target_id":"node-999","type":"INSPIRED_BY","conf":0.8,"reason":"both discuss backprop"}]}
Configuration¶
type HeimdallQCConfig struct {
Enabled bool // Master switch
Timeout time.Duration // Default: 10s
MaxContextBytes int // Default: 4096 (~1000 tokens)
MaxBatchSize int // Default: 5 suggestions per call
MaxNodeSummaryLen int // Default: 200 chars per property
MinConfidenceToReview float64 // Default: 0.5 (skip weak candidates)
CacheDecisions bool // Default: true
CacheTTL time.Duration // Default: 1 hour
}
Error Handling¶
Principle: Fail-open, log, continue
| Error | Action |
|---|---|
| LLM timeout | Log warning, approve batch, continue |
| LLM crash | Log error, approve batch, continue |
| Invalid JSON | Fuzzy parse or approve all |
| Prompt too large | Log warning, skip review, pass through |
| Context cancelled | Return immediately with current results |
No retries - If the LLM fails, we don't retry. We log the decision made without LLM input and move on.
Usage Example¶
import (
"github.com/orneryd/nornicdb/pkg/inference"
"github.com/orneryd/nornicdb/pkg/config"
"github.com/orneryd/nornicdb/pkg/heimdall"
)
// Heimdall QC uses the SAME Generator as Bifrost commands
// Direct llama.cpp via localllm - no HTTP calls
func setupHeimdallQC(generator heimdall.Generator) {
systemPrompt := inference.GetSystemPrompt(config.IsAutoTLPLLMAugmentEnabled())
heimdallFunc := func(ctx context.Context, userContent string) (string, error) {
// Combine static system prompt + dynamic user content
prompt := systemPrompt + "\n\n" + userContent
return generator.Generate(ctx, prompt, heimdall.GenerateParams{
MaxTokens: 256,
Temperature: 0.1, // Low temp for deterministic QC
})
}
qc := inference.NewHeimdallQC(heimdallFunc, nil)
engine.SetHeimdallQC(qc)
}
// Both Bifrost commands and Heimdall QC share:
// - Same heimdall.Generator (in-memory llama.cpp)
// - Same KV cache (system prompts cached)
// - Stateless one-shot calls
Performance Expectations¶
| Metric | Without QC | With QC |
|---|---|---|
| Latency per node | ~5-20ms | ~100-500ms |
| Edge quality | Good | Better |
| False positives | Some | Fewer |
| LLM calls | 0 | ~1 per 5 suggestions |
Mitigations: - Batch processing reduces calls - Caching prevents redundant reviews - Size limits prevent slow large-context calls - Async processing possible for background indexing
Related¶
- Auto-TLP β overview of automatic relationship inference
- Feature Flags β
NORNICDB_AUTO_TLP_LLM_QC_ENABLED,NORNICDB_AUTO_TLP_LLM_AUGMENT_ENABLED - Heimdall AI Assistant β configuring the Heimdall SLM