Skip to content

LLM & AST Security Patterns

Safe patterns for integrating Large Language Models with NornicDB's query system.

Overview

NornicDB uses a stream parse-execute architecture where queries are parsed and executed in a single pass, with a lazy AST built separately for LLM features. This document covers:

  1. Why stream parse-execute is fast
  2. Security considerations for this approach
  3. Safe LLM integration patterns
  4. Plugin security with AST

Architecture: Stream Parse-Execute + Lazy AST

┌─────────────────────────────────────────────────────────────────────────┐
│                     NornicDB Query Architecture                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Traditional DB (Full Parse → AST → Execute):                           │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  Query → [Lexer] → [Parser] → AST → [Optimizer] → [Executor]    │   │
│  │                               ↑                                  │   │
│  │                    Full tree in memory                          │   │
│  │                    Multiple passes                              │   │
│  │                    ~10-50µs overhead                            │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
│  NornicDB (Stream Parse-Execute + Lazy AST):                            │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  Query → [Stream Parser+Executor] ─────────────────→ Result     │   │
│  │            ↓ (async/lazy)                                        │   │
│  │          [AST Builder] → Cached AST (for LLM features)          │   │
│  │                                                                  │   │
│  │  • Single pass through query                                    │   │
│  │  • Execute as we parse                                          │   │
│  │  • No intermediate allocations for simple queries               │   │
│  │  • ~1-3µs for simple queries (10-50x faster)                   │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Why Stream Parse-Execute is Fast

Performance Benefits

Aspect Traditional AST Stream Parse-Execute
Memory allocations Full tree (~100+ nodes) Minimal (on-demand)
Passes over query 2-4 (lex, parse, optimize, execute) 1 (combined)
Latency to first byte After full parse Immediate
Simple query overhead ~10-50µs ~1-3µs
Complex query overhead ~50-200µs ~10-50µs

Why This Works

// Traditional: Parse everything, then execute
ast := parser.Parse(query)      // Allocate full AST
optimized := optimizer.Optimize(ast)  // Another pass
result := executor.Execute(optimized) // Finally execute

// Stream: Execute as we recognize tokens
// MATCH (n:Person) WHERE n.age > 21 RETURN n.name
//   ↓
// See MATCH → start pattern matching
// See (n:Person) → find nodes with label
// See WHERE → filter in-place
// See RETURN → project results
// No intermediate AST needed!

Benchmarks

BenchmarkSimpleQuery/Traditional-16     50000    25000 ns/op   12000 B/op   150 allocs/op
BenchmarkSimpleQuery/StreamExecute-16  500000     2500 ns/op    1200 B/op    15 allocs/op
                                                   ↑ 10x faster    ↑ 10x less memory

Security Considerations: Stream Parse-Execute

✅ Benefits

Property Explanation
No TOCTOU Check and use happen atomically - no race between validation and execution
Smaller attack surface No intermediate AST to manipulate
Consistent parsing Same code parses AND executes - no semantic drift
Memory safety Less allocation = less chance of buffer issues

⚠️ Considerations

Concern Risk Mitigation
Partial execution on error Side effects before error detected Transaction rollback, implicit transactions
No global semantic check Can't validate entire query before starting Validate syntax first, use explicit transactions for critical ops
Error recovery Harder to provide good error messages Store context during parse for error reporting
Optimization opportunities Can't reorder operations Accept trade-off for latency; complex queries can use AST path

Partial Execution Risk

// Risk: What if error occurs mid-query?
CREATE (a:Node) 
CREATE (b:Node)
CREATE (c:Invalid!)  // ← Syntax error here

// Without protection: a and b created, c fails
// With implicit transaction: all rolled back

Our mitigation:

// Implicit transactions wrap non-explicit queries
func (e *Executor) Execute(ctx, query, params) {
    // For write operations without explicit transaction
    if isWriteQuery && !inExplicitTransaction {
        tx := e.storage.BeginTransaction()
        defer tx.Rollback()  // Rollback on any error

        result, err := e.executeWithTransaction(tx, query)
        if err != nil {
            return nil, err  // Transaction rolled back
        }

        tx.Commit()  // Only commit if fully successful
        return result, nil
    }
    // ...
}

Safe LLM Integration Patterns

Pattern 1: Read-Only AST Analysis (SAFE)

// ✅ SAFE: LLM only reads AST, doesn't generate queries
func AnalyzeQueryComplexity(query string) (*Analysis, error) {
    info := analyzer.Analyze(query)
    ast := info.GetAST()

    // LLM analyzes structure
    complexity := llm.AnalyzeComplexity(ast)
    suggestions := llm.SuggestIndexes(ast)

    return &Analysis{
        Complexity: complexity,
        Suggestions: suggestions,
    }, nil
}

Why safe: LLM output is informational only, never executed.

Pattern 2: Query Correction with Validation (SAFE with care)

// ⚠️ REQUIRES VALIDATION: LLM generates corrected query
func CorrectQuery(originalQuery string, error error) (string, error) {
    info := analyzer.Analyze(originalQuery)
    ast := info.GetAST()

    // LLM suggests correction
    correctedQuery := llm.SuggestCorrection(ast, error)

    // ⚠️ CRITICAL: Validate the corrected query
    if err := validateQuerySafety(correctedQuery); err != nil {
        return "", fmt.Errorf("LLM generated unsafe query: %w", err)
    }

    // ⚠️ CRITICAL: User must approve before execution
    return correctedQuery, nil  // Return for user approval, don't auto-execute
}

func validateQuerySafety(query string) error {
    // 1. Parse with our parser (not LLM's interpretation)
    info := analyzer.Analyze(query)

    // 2. Check for dangerous patterns
    if info.HasDelete && !userHasDeletePermission {
        return errors.New("DELETE not permitted")
    }

    // 3. Validate all identifiers
    for _, label := range info.Labels {
        if !isValidIdentifier(label) {
            return fmt.Errorf("invalid label: %s", label)
        }
    }

    return nil
}

Pattern 3: Query Generation from Natural Language (HIGH RISK)

// ❌ DANGEROUS: Direct execution of LLM-generated queries
func DangerousNLToQuery(naturalLanguage string) (*Result, error) {
    query := llm.GenerateCypher(naturalLanguage)
    return executor.Execute(ctx, query, nil)  // ❌ NO VALIDATION!
}

// ✅ SAFE: Validated execution with constraints
func SafeNLToQuery(naturalLanguage string, constraints QueryConstraints) (*Result, error) {
    query := llm.GenerateCypher(naturalLanguage)

    // 1. Parse and analyze
    info := analyzer.Analyze(query)

    // 2. Enforce constraints
    if !constraints.AllowWrites && info.IsWriteQuery {
        return nil, errors.New("write operations not allowed")
    }

    if !constraints.AllowDelete && info.HasDelete {
        return nil, errors.New("delete operations not allowed")
    }

    // 3. Whitelist labels and relationships
    for _, label := range info.Labels {
        if !constraints.AllowedLabels.Contains(label) {
            return nil, fmt.Errorf("label %s not in whitelist", label)
        }
    }

    // 4. Use read-only transaction for safety
    if !info.IsWriteQuery {
        return executor.ExecuteReadOnly(ctx, query, nil)
    }

    // 5. Require explicit user approval for writes
    return nil, errors.New("write query requires user approval")
}

Pattern 4: Plugin Query Execution (REQUIRES SANDBOXING)

// Plugin-generated queries need strict sandboxing
type PluginQueryConstraints struct {
    MaxResults     int
    TimeoutMs      int
    AllowedLabels  []string
    AllowedTypes   []string
    ReadOnly       bool
    MaxDepth       int  // For path queries
}

func ExecutePluginQuery(plugin Plugin, query string, constraints PluginQueryConstraints) (*Result, error) {
    // 1. Validate plugin has permission for this query type
    info := analyzer.Analyze(query)

    if constraints.ReadOnly && info.IsWriteQuery {
        return nil, errors.New("plugin attempted write in read-only mode")
    }

    // 2. Check labels against plugin's allowed set
    for _, label := range info.Labels {
        if !contains(constraints.AllowedLabels, label) {
            return nil, fmt.Errorf("plugin not authorized for label: %s", label)
        }
    }

    // 3. Inject constraints into query
    constrainedQuery := injectConstraints(query, constraints)

    // 4. Execute with timeout
    ctx, cancel := context.WithTimeout(ctx, time.Duration(constraints.TimeoutMs)*time.Millisecond)
    defer cancel()

    return executor.Execute(ctx, constrainedQuery, nil)
}

func injectConstraints(query string, c PluginQueryConstraints) string {
    // Add LIMIT if not present
    if c.MaxResults > 0 && !strings.Contains(strings.ToUpper(query), "LIMIT") {
        query = query + fmt.Sprintf(" LIMIT %d", c.MaxResults)
    }
    return query
}

AST Cache Security

Cache Key Security

// Cache keys include normalized query + parameter hash
type CacheKey struct {
    NormalizedQuery string
    ParamHash       uint64
}

// This prevents:
// 1. Cache confusion between different parameter values
// 2. Cache poisoning from similar queries

Cache Isolation

// Per-user cache isolation (if multi-tenant)
type UserScopedCache struct {
    userID string
    cache  *QueryCache
}

func (c *UserScopedCache) Get(query string, params map[string]any) (*QueryInfo, bool) {
    key := c.makeKey(c.userID, query, params)
    return c.cache.Get(key)
}

Cache Invalidation Security

// Write operations invalidate relevant caches
func (e *Executor) invalidateCachesAfterWrite(info *QueryInfo) {
    // Don't trust the query to tell us what it modified
    // Use actual affected labels from execution
    affectedLabels := e.getActualAffectedLabels()

    e.cache.InvalidateLabels(affectedLabels)
}

Heimdall Plugin Security

Plugin Query Constraints

# Plugin manifest defines allowed operations
plugin:
  name: analytics-plugin
  permissions:
    queries:
      read_only: true
      allowed_labels: [Event, User, Session]
      allowed_relationships: [TRIGGERED, BELONGS_TO]
      max_results: 10000
      timeout_ms: 5000
    ast_access:
      can_read: true
      can_generate: false  # Cannot generate new queries

Plugin AST Access

// Plugins get read-only AST view
type PluginASTView struct {
    Clauses   []ASTClauseView  // Sanitized view
    IsReadOnly bool
    Labels    []string
}

func (ast *AST) ToPluginView() *PluginASTView {
    return &PluginASTView{
        Clauses:    sanitizeClauses(ast.Clauses),
        IsReadOnly: ast.IsReadOnly,
        Labels:     ast.Labels,
    }
}

// Plugins cannot:
// - Modify AST
// - Generate queries from AST
// - Access raw query text (potential injection source)

Security Checklist

For LLM Integration

  • LLM output is NEVER directly executed
  • All LLM-generated queries are re-parsed by our parser
  • Write operations require explicit user approval
  • Label/relationship whitelisting enforced
  • Timeout and result limits applied
  • Audit logging for all LLM-generated queries

For Plugin Integration

  • Plugin permissions declared in manifest
  • Read-only mode enforced where declared
  • Label/relationship access controlled
  • Query timeout enforced
  • Result count limited
  • AST access is read-only view

For AST Cache

  • Cache keys include parameter hash
  • Per-user isolation (if multi-tenant)
  • Write operations invalidate affected caches
  • Cache TTL prevents stale data

Summary

Component Security Model
Stream Parse-Execute Atomic parse+execute, no intermediate attack surface
Lazy AST Observation only, never in execution path
LLM Integration Re-parse all output, whitelist, require approval
Plugin Queries Sandbox with permissions, timeouts, limits
AST Cache Keyed by query+params, per-user isolation

See Also: - Query Cache Security - Cache-specific security - HTTP Security - Network-level protections - Plugin Development Guide - Building secure plugins