Query Analysis & Cache Security¶

How NornicDB safely analyzes and caches Cypher queries.

Overview¶

NornicDB uses a QueryAnalyzer to extract metadata from Cypher queries for caching optimization. This document explains the security model, what protections exist, and edge cases.

Architecture¶

┌─────────────────────────────────────────────────────────────────────┐
│                         Query Execution Flow                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────┐    ┌──────────────────┐    ┌──────────────────┐   │
│  │   Client    │───▶│  QueryAnalyzer   │───▶│   Cache Check    │   │
│  │   Query     │    │  (Metadata only) │    │  (Read-only?)    │   │
│  └─────────────┘    └──────────────────┘    └──────────────────┘   │
│                              │                        │              │
│                              ▼                        ▼              │
│                     ┌──────────────────┐    ┌──────────────────┐   │
│                     │  Query Executor  │◀───│  Cache Hit/Miss  │   │
│                     │  (Full parsing)  │    │                  │   │
│                     └──────────────────┘    └──────────────────┘   │
│                              │                                       │
│                              ▼                                       │
│                     ┌──────────────────┐                            │
│                     │  Storage Engine  │                            │
│                     │  (Data access)   │                            │
│                     └──────────────────┘                            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Security Model¶

What QueryAnalyzer Does¶

The QueryAnalyzer performs lightweight keyword detection to determine:

Is this query read-only? (cacheable)
Is this query a write operation? (requires cache invalidation)
What labels are affected? (for smart cache invalidation)

What QueryAnalyzer Does NOT Do¶

NOT used for access control - permissions are checked separately
NOT used for query validation - the executor validates syntax
NOT a full parser - just keyword detection for performance

Threat Analysis¶

✅ Protected: Write Operations Hidden as Reads¶

Threat: An attacker crafts a query that deletes/modifies data but is marked as "read-only" by the analyzer.

Protection: This is NOT POSSIBLE because:

Cypher keywords must be whole words with boundaries
We search ALL occurrences of each keyword
There's no way to "hide" a keyword in valid Cypher syntax

Example attacks that are blocked:

-- Hiding DELETE - NOT POSSIBLE
MATCH (n) WHERE n.name = 'safe' DELETE n
-- ✅ DELETE detected, marked as write

-- Splitting keywords - NOT POSSIBLE (invalid Cypher)
MATCH (n) DE/*comment*/LETE n
-- ✅ Invalid syntax, rejected by executor

-- Unicode tricks - NOT POSSIBLE
MATCH (n) ＤＥＬＥＴＥ n
-- ✅ Not ASCII "DELETE", not valid Cypher

⚡ Accepted: Read Operations Marked as Writes (False Positives)¶

Scenario: A read-only query is incorrectly marked as a write operation.

Impact: Performance only (not security) - Query is not cached - May trigger unnecessary cache invalidation - Query still executes correctly

When this happens:

Scenario	Example Query	What Happens
Property named like keyword	`RETURN n.delete`	Marked as write
Keyword in string literal	`WHERE name = 'DELETE me'`	Marked as write
Keyword in comment	`RETURN n // TODO: delete`	Marked as write

Why we accept this:

Conservative is safer - Better to not cache than to cache incorrectly
Rare occurrence - Few users name properties "delete" or "create"
No security impact - Query executes correctly, just not cached
Simple implementation - Complex parsing adds attack surface

✅ Protected: False Positives Cannot Leak Data¶

Concern: Could an attacker use a false positive to gain unauthorized data access?

Answer: No. The QueryAnalyzer is completely separate from access control.

┌─────────────────────────────────────────────────────────────┐
│                    Security Boundary                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   QueryAnalyzer              Access Control (RBAC)           │
│   ┌────────────────┐         ┌────────────────┐             │
│   │ Determines:    │         │ Determines:    │             │
│   │ • Caching      │         │ • Can user     │             │
│   │ • Invalidation │         │   read nodes?  │             │
│   │                │         │ • Can user     │             │
│   │ Does NOT       │         │   write nodes? │             │
│   │ affect:        │         │                │             │
│   │ • Permissions  │         │ ALWAYS         │             │
│   │ • Authorization│         │ ENFORCED       │             │
│   └────────────────┘         └────────────────┘             │
│          │                          │                       │
│          │ Performance              │ Security              │
│          ▼                          ▼                       │
│   ┌────────────────┐         ┌────────────────┐             │
│   │ Cache miss     │         │ Query allowed  │             │
│   │ (slower, but   │         │ or denied      │             │
│   │ still works)   │         │ (enforced)     │             │
│   └────────────────┘         └────────────────┘             │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Attack scenarios that DO NOT work:

Attack	Why It Fails
"Read secret data by triggering write path"	Write path doesn't grant extra permissions
"Learn label exists via cache invalidation"	Query must succeed first (already requires access)
"Bypass RBAC by confusing the analyzer"	Analyzer has no role in authorization
"Force stale data via false positive"	False positive = no caching = always fresh data

Example:

-- Attacker tries to read :Secret nodes they shouldn't access
MATCH (n:Secret) RETURN n.delete

What happens: 1. QueryAnalyzer sees "delete" → marks as write (false positive) 2. Access control checks if user can read :Secret → DENIED 3. Query rejected - attacker gets nothing 4. Cache invalidation never happens (query failed)

The false positive only affects step 1 (caching decision). The security check in step 2 is completely independent and always enforced.

Cache Security¶

Result Caching¶

Only read-only queries are cached:

// Cached (read-only):
MATCH (n:Person) RETURN n.name           // ✅ Cached
CALL db.labels()                          // ✅ Cached (db.* procedures)
SHOW INDEXES                              // ✅ Cached

// NOT cached (write operations):
CREATE (n:Person {name: 'Alice'})         // ❌ Not cached
MATCH (n) SET n.updated = timestamp()     // ❌ Not cached
CALL gds.graph.drop('myGraph')            // ❌ Not cached (non-db.* procedure)

Cache Invalidation¶

Write operations invalidate relevant caches:

// Smart invalidation by label:
CREATE (n:Person)                         // Invalidates :Person cache
MATCH (n:Company) DELETE n                // Invalidates :Company cache

// Full invalidation (no label detected):
MATCH (n) DELETE n                        // Invalidates ALL caches

Cache Key Security¶

Cache keys include: - Normalized query string - Parameter values (hashed)

This prevents: - Cache poisoning across different parameters - Cache confusion between similar queries

Procedure Call Security¶

Read-Only Procedures (Cached)¶

Only CALL db.* procedures are marked as read-only:

CALL db.labels()                    -- ✅ Read-only, cached
CALL db.propertyKeys()              -- ✅ Read-only, cached
CALL db.relationshipTypes()         -- ✅ Read-only, cached

Write Procedures (Not Cached)¶

All other procedures are conservatively treated as writes:

CALL gds.graph.project(...)         -- ❌ May be write, not cached
CALL gds.graph.drop(...)            -- ❌ Is a write, not cached
CALL apoc.create.node(...)          -- ❌ Is a write, not cached
CALL custom.myProcedure(...)        -- ❌ Unknown, not cached

Implementation Details¶

Keyword Detection¶

// containsKeyword searches ALL occurrences, checking word boundaries
func containsKeyword(upper, keyword string) bool {
    // Iterates through string finding each occurrence
    // Checks character before: must not be alphanumeric or _
    // Checks character after: must not be alphanumeric or _
    // Returns true if ANY occurrence is a valid keyword
}

Word Boundary Rules¶

A keyword match requires: - Before: Start of string, or non-alphanumeric (except _) - After: End of string, or non-alphanumeric (except _)

"DELETE n"        → DELETE matched (space before, space after)
"n.delete"        → delete matched (. is not alphanumeric) - FALSE POSITIVE
"ToDelete"        → DELETE not matched (o before)
"DELETED"         → DELETE not matched (D after)

Query Types Reference¶

Write Operations (Never Cached)¶

Keyword	Detection	Example
`CREATE`	Word boundary	`CREATE (n:Node)`
`MERGE`	Word boundary	`MERGE (n:Node {id: 1})`
`DELETE`	Word boundary	`MATCH (n) DELETE n`
`DETACH DELETE`	Phrase	`MATCH (n) DETACH DELETE n`
`SET`	Word boundary	`MATCH (n) SET n.x = 1`
`REMOVE`	Word boundary	`MATCH (n) REMOVE n.label`

Read Operations (Cacheable)¶

Pattern	Requirement	Example
`MATCH ... RETURN`	No write keywords	`MATCH (n) RETURN n`
`CALL db.*`	Starts with `CALL db.`	`CALL db.labels()`
`SHOW`	Starts with `SHOW`	`SHOW INDEXES`

Schema Operations (Special Handling)¶

Operation	Cached	Invalidates
`CREATE INDEX`	No	Schema cache
`DROP INDEX`	No	Schema cache
`CREATE CONSTRAINT`	No	Schema cache
`DROP CONSTRAINT`	No	Schema cache

Testing & Verification¶

Security Tests¶

The following attack patterns are verified in tests:

// All write operations detected:
"MATCH (n) DELETE n"                    // HasDelete = true ✅
"MATCH (n) WHERE type = 'safe' DELETE n" // HasDelete = true ✅
"CREATE (n:Node)"                        // HasCreate = true ✅
"MERGE (n:Node)"                         // HasMerge = true ✅
"MATCH (n) SET n.x = 1"                  // HasSet = true ✅
"MATCH (n) REMOVE n.label"               // HasRemove = true ✅

// False positives (safe, not cached):
"RETURN n.delete"                        // HasDelete = true (false positive)
"WHERE name = 'DELETE'"                  // HasDelete = true (false positive)

Running Security Tests¶

# Run all cypher tests including security checks
cd nornicdb
go test ./pkg/cypher/... -v

# Run specific cache/security tests
go test ./pkg/cypher/... -run "Cache|Security" -v

Best Practices¶

For Query Authors¶

Avoid keyword-named properties - Don't name properties delete, create, set, etc.
Use parameters for values - WHERE name = $name instead of WHERE name = 'DELETE'
Explicit transactions for writes - Use BEGIN/COMMIT for critical write operations

For Administrators¶

Monitor cache hit rates - Low hit rates may indicate false positives
Review slow query logs - Uncached queries appear more frequently
Use production mode - Ensures strict security settings

Summary¶

Aspect	Protection Level	Notes
Write ops hidden as reads	✅ Fully Protected	Not possible in valid Cypher
Read ops marked as writes	⚡ Accepted	Performance impact only
False positive data leakage	✅ Protected	Analyzer doesn't affect access control
Cache poisoning	✅ Protected	Keys include parameters
Procedure writes	✅ Protected	Only `db.*` cached
Access control bypass	✅ N/A	Analyzer not used for auth
Cache timing side-channel	✅ Protected	Query must succeed before cache action

See Also: - HTTP Security - Network-level protections - Compliance Guide - Regulatory requirements