LLM & AST Security Patterns¶
Safe patterns for integrating Large Language Models with NornicDB's query system.
Overview¶
NornicDB uses a stream parse-execute architecture where queries are parsed and executed in a single pass, with a lazy AST built separately for LLM features. This document covers:
- Why stream parse-execute is fast
- Security considerations for this approach
- Safe LLM integration patterns
- Plugin security with AST
Architecture: Stream Parse-Execute + Lazy AST¶
┌─────────────────────────────────────────────────────────────────────────┐
│ NornicDB Query Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Traditional DB (Full Parse → AST → Execute): │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Query → [Lexer] → [Parser] → AST → [Optimizer] → [Executor] │ │
│ │ ↑ │ │
│ │ Full tree in memory │ │
│ │ Multiple passes │ │
│ │ ~10-50µs overhead │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ NornicDB (Stream Parse-Execute + Lazy AST): │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Query → [Stream Parser+Executor] ─────────────────→ Result │ │
│ │ ↓ (async/lazy) │ │
│ │ [AST Builder] → Cached AST (for LLM features) │ │
│ │ │ │
│ │ • Single pass through query │ │
│ │ • Execute as we parse │ │
│ │ • No intermediate allocations for simple queries │ │
│ │ • ~1-3µs for simple queries (10-50x faster) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Why Stream Parse-Execute is Fast¶
Performance Benefits¶
| Aspect | Traditional AST | Stream Parse-Execute |
|---|---|---|
| Memory allocations | Full tree (~100+ nodes) | Minimal (on-demand) |
| Passes over query | 2-4 (lex, parse, optimize, execute) | 1 (combined) |
| Latency to first byte | After full parse | Immediate |
| Simple query overhead | ~10-50µs | ~1-3µs |
| Complex query overhead | ~50-200µs | ~10-50µs |
Why This Works¶
// Traditional: Parse everything, then execute
ast := parser.Parse(query) // Allocate full AST
optimized := optimizer.Optimize(ast) // Another pass
result := executor.Execute(optimized) // Finally execute
// Stream: Execute as we recognize tokens
// MATCH (n:Person) WHERE n.age > 21 RETURN n.name
// ↓
// See MATCH → start pattern matching
// See (n:Person) → find nodes with label
// See WHERE → filter in-place
// See RETURN → project results
// No intermediate AST needed!
Benchmarks¶
BenchmarkSimpleQuery/Traditional-16 50000 25000 ns/op 12000 B/op 150 allocs/op
BenchmarkSimpleQuery/StreamExecute-16 500000 2500 ns/op 1200 B/op 15 allocs/op
↑ 10x faster ↑ 10x less memory
Security Considerations: Stream Parse-Execute¶
✅ Benefits¶
| Property | Explanation |
|---|---|
| No TOCTOU | Check and use happen atomically - no race between validation and execution |
| Smaller attack surface | No intermediate AST to manipulate |
| Consistent parsing | Same code parses AND executes - no semantic drift |
| Memory safety | Less allocation = less chance of buffer issues |
⚠️ Considerations¶
| Concern | Risk | Mitigation |
|---|---|---|
| Partial execution on error | Side effects before error detected | Transaction rollback, implicit transactions |
| No global semantic check | Can't validate entire query before starting | Validate syntax first, use explicit transactions for critical ops |
| Error recovery | Harder to provide good error messages | Store context during parse for error reporting |
| Optimization opportunities | Can't reorder operations | Accept trade-off for latency; complex queries can use AST path |
Partial Execution Risk¶
// Risk: What if error occurs mid-query?
CREATE (a:Node)
CREATE (b:Node)
CREATE (c:Invalid!) // ← Syntax error here
// Without protection: a and b created, c fails
// With implicit transaction: all rolled back
Our mitigation:
// Implicit transactions wrap non-explicit queries
func (e *Executor) Execute(ctx, query, params) {
// For write operations without explicit transaction
if isWriteQuery && !inExplicitTransaction {
tx := e.storage.BeginTransaction()
defer tx.Rollback() // Rollback on any error
result, err := e.executeWithTransaction(tx, query)
if err != nil {
return nil, err // Transaction rolled back
}
tx.Commit() // Only commit if fully successful
return result, nil
}
// ...
}
Safe LLM Integration Patterns¶
Pattern 1: Read-Only AST Analysis (SAFE)¶
// ✅ SAFE: LLM only reads AST, doesn't generate queries
func AnalyzeQueryComplexity(query string) (*Analysis, error) {
info := analyzer.Analyze(query)
ast := info.GetAST()
// LLM analyzes structure
complexity := llm.AnalyzeComplexity(ast)
suggestions := llm.SuggestIndexes(ast)
return &Analysis{
Complexity: complexity,
Suggestions: suggestions,
}, nil
}
Why safe: LLM output is informational only, never executed.
Pattern 2: Query Correction with Validation (SAFE with care)¶
// ⚠️ REQUIRES VALIDATION: LLM generates corrected query
func CorrectQuery(originalQuery string, error error) (string, error) {
info := analyzer.Analyze(originalQuery)
ast := info.GetAST()
// LLM suggests correction
correctedQuery := llm.SuggestCorrection(ast, error)
// ⚠️ CRITICAL: Validate the corrected query
if err := validateQuerySafety(correctedQuery); err != nil {
return "", fmt.Errorf("LLM generated unsafe query: %w", err)
}
// ⚠️ CRITICAL: User must approve before execution
return correctedQuery, nil // Return for user approval, don't auto-execute
}
func validateQuerySafety(query string) error {
// 1. Parse with our parser (not LLM's interpretation)
info := analyzer.Analyze(query)
// 2. Check for dangerous patterns
if info.HasDelete && !userHasDeletePermission {
return errors.New("DELETE not permitted")
}
// 3. Validate all identifiers
for _, label := range info.Labels {
if !isValidIdentifier(label) {
return fmt.Errorf("invalid label: %s", label)
}
}
return nil
}
Pattern 3: Query Generation from Natural Language (HIGH RISK)¶
// ❌ DANGEROUS: Direct execution of LLM-generated queries
func DangerousNLToQuery(naturalLanguage string) (*Result, error) {
query := llm.GenerateCypher(naturalLanguage)
return executor.Execute(ctx, query, nil) // ❌ NO VALIDATION!
}
// ✅ SAFE: Validated execution with constraints
func SafeNLToQuery(naturalLanguage string, constraints QueryConstraints) (*Result, error) {
query := llm.GenerateCypher(naturalLanguage)
// 1. Parse and analyze
info := analyzer.Analyze(query)
// 2. Enforce constraints
if !constraints.AllowWrites && info.IsWriteQuery {
return nil, errors.New("write operations not allowed")
}
if !constraints.AllowDelete && info.HasDelete {
return nil, errors.New("delete operations not allowed")
}
// 3. Whitelist labels and relationships
for _, label := range info.Labels {
if !constraints.AllowedLabels.Contains(label) {
return nil, fmt.Errorf("label %s not in whitelist", label)
}
}
// 4. Use read-only transaction for safety
if !info.IsWriteQuery {
return executor.ExecuteReadOnly(ctx, query, nil)
}
// 5. Require explicit user approval for writes
return nil, errors.New("write query requires user approval")
}
Pattern 4: Plugin Query Execution (REQUIRES SANDBOXING)¶
// Plugin-generated queries need strict sandboxing
type PluginQueryConstraints struct {
MaxResults int
TimeoutMs int
AllowedLabels []string
AllowedTypes []string
ReadOnly bool
MaxDepth int // For path queries
}
func ExecutePluginQuery(plugin Plugin, query string, constraints PluginQueryConstraints) (*Result, error) {
// 1. Validate plugin has permission for this query type
info := analyzer.Analyze(query)
if constraints.ReadOnly && info.IsWriteQuery {
return nil, errors.New("plugin attempted write in read-only mode")
}
// 2. Check labels against plugin's allowed set
for _, label := range info.Labels {
if !contains(constraints.AllowedLabels, label) {
return nil, fmt.Errorf("plugin not authorized for label: %s", label)
}
}
// 3. Inject constraints into query
constrainedQuery := injectConstraints(query, constraints)
// 4. Execute with timeout
ctx, cancel := context.WithTimeout(ctx, time.Duration(constraints.TimeoutMs)*time.Millisecond)
defer cancel()
return executor.Execute(ctx, constrainedQuery, nil)
}
func injectConstraints(query string, c PluginQueryConstraints) string {
// Add LIMIT if not present
if c.MaxResults > 0 && !strings.Contains(strings.ToUpper(query), "LIMIT") {
query = query + fmt.Sprintf(" LIMIT %d", c.MaxResults)
}
return query
}
AST Cache Security¶
Cache Key Security¶
// Cache keys include normalized query + parameter hash
type CacheKey struct {
NormalizedQuery string
ParamHash uint64
}
// This prevents:
// 1. Cache confusion between different parameter values
// 2. Cache poisoning from similar queries
Cache Isolation¶
// Per-user cache isolation (if multi-tenant)
type UserScopedCache struct {
userID string
cache *QueryCache
}
func (c *UserScopedCache) Get(query string, params map[string]any) (*QueryInfo, bool) {
key := c.makeKey(c.userID, query, params)
return c.cache.Get(key)
}
Cache Invalidation Security¶
// Write operations invalidate relevant caches
func (e *Executor) invalidateCachesAfterWrite(info *QueryInfo) {
// Don't trust the query to tell us what it modified
// Use actual affected labels from execution
affectedLabels := e.getActualAffectedLabels()
e.cache.InvalidateLabels(affectedLabels)
}
Heimdall Plugin Security¶
Plugin Query Constraints¶
# Plugin manifest defines allowed operations
plugin:
name: analytics-plugin
permissions:
queries:
read_only: true
allowed_labels: [Event, User, Session]
allowed_relationships: [TRIGGERED, BELONGS_TO]
max_results: 10000
timeout_ms: 5000
ast_access:
can_read: true
can_generate: false # Cannot generate new queries
Plugin AST Access¶
// Plugins get read-only AST view
type PluginASTView struct {
Clauses []ASTClauseView // Sanitized view
IsReadOnly bool
Labels []string
}
func (ast *AST) ToPluginView() *PluginASTView {
return &PluginASTView{
Clauses: sanitizeClauses(ast.Clauses),
IsReadOnly: ast.IsReadOnly,
Labels: ast.Labels,
}
}
// Plugins cannot:
// - Modify AST
// - Generate queries from AST
// - Access raw query text (potential injection source)
Security Checklist¶
For LLM Integration¶
- LLM output is NEVER directly executed
- All LLM-generated queries are re-parsed by our parser
- Write operations require explicit user approval
- Label/relationship whitelisting enforced
- Timeout and result limits applied
- Audit logging for all LLM-generated queries
For Plugin Integration¶
- Plugin permissions declared in manifest
- Read-only mode enforced where declared
- Label/relationship access controlled
- Query timeout enforced
- Result count limited
- AST access is read-only view
For AST Cache¶
- Cache keys include parameter hash
- Per-user isolation (if multi-tenant)
- Write operations invalidate affected caches
- Cache TTL prevents stale data
Summary¶
| Component | Security Model |
|---|---|
| Stream Parse-Execute | Atomic parse+execute, no intermediate attack surface |
| Lazy AST | Observation only, never in execution path |
| LLM Integration | Re-parse all output, whitelist, require approval |
| Plugin Queries | Sandbox with permissions, timeouts, limits |
| AST Cache | Keyed by query+params, per-user isolation |
See Also: - Query Cache Security - Cache-specific security - HTTP Security - Network-level protections - Plugin Development Guide - Building secure plugins