MVCC Lifecycle and Background Work Architecture¶
This document explains how NornicDB now treats storage-touching background work so that maintenance and indexing do not compete unnecessarily with query execution.
It focuses on three paths:
- mutation-driven search indexing
- embedding worker wakeups
- MVCC lifecycle maintenance
Design Goals¶
The current design aims to keep background work predictable under write-heavy load.
Requirements behind this behavior:
- do not spawn unbounded background fan-out from database mutation callbacks
- coalesce bursts of database events before search or embedding work runs
- allow long-running lifecycle work to be scheduled explicitly instead of always running at a fixed cadence
- preserve manual control so operators or the UI can choose quieter maintenance windows
Mutation-Driven Search Index Sync¶
Search-index sync is fed from storage node create, update, and delete callbacks.
Previous failure mode:
- one goroutine could be launched per mutation event
- ready search services could index or remove immediately on each callback
- write-heavy workloads could amplify indexing churn and create avoidable contention
Current behavior:
- storage callbacks enqueue search mutations inline
- per-database search services collect pending operations in a mutation map keyed by logical node id
- a short debounce window coalesces bursts of updates before flush
- the flush worker is single-flight per database service, so concurrent callback bursts do not create unbounded worker fan-out
Operational effect:
- repeated updates to the same node are collapsed before indexing work runs
- search services do not process one immediate mutation per write anymore
- database writes stay lightweight because callbacks only enqueue and schedule flush work
Current debounce constant:
- search mutation flush delay defaults to
250ms
Embedding Worker Trigger Path¶
The embedding worker is already pull-based: it scans and processes nodes that still need embeddings rather than embedding directly inside the mutation path.
Important behavior:
Enqueue(nodeID)wakes the worker instead of embedding that node synchronously- worker wakeups are debounced through
TriggerDebounceDelay - repeated trigger bursts collapse into one wakeup
- worker batch delay and scan interval determine how aggressively background embedding work runs
Recent behavior fix:
- the runtime database initialization path now honors configured embedding worker timing values instead of using hardcoded defaults
That matters because queue debounce only protects the system if the configured runtime actually uses the intended timing.
Relevant controls:
NORNICDB_EMBED_TRIGGER_DEBOUNCENORNICDB_EMBED_BATCH_DELAYNORNICDB_EMBED_SCAN_INTERVAL
MVCC Lifecycle Manager Scheduling¶
MVCC lifecycle work is responsible for prune planning, debt tracking, pressure visibility, and enforcing retained-floor semantics.
Previous limitation:
- lifecycle cadence was fixed to the configured interval at startup
- operators could pause or resume, but could not switch to manual-only mode or change the cadence at runtime
Current behavior:
- lifecycle exposes
automaticandcycle_intervalin status - automatic scheduling can be updated at runtime through an optional schedule-control interface
- an interval of
0sdisables automatic runs entirely - manual prune remains available even when automatic scheduling is disabled
This creates two operating modes:
-
Scheduled mode The lifecycle loop runs on a periodic ticker and records debt, pressure, and prune-run summaries.
-
Manual-only mode Automatic runs are disabled. Operators or the UI decide when to call prune-now.
This is the main mechanism for keeping long-running storage maintenance from impacting query paths during known busy periods.
Admin Control Plane¶
The HTTP control surface for lifecycle now supports runtime schedule changes:
GET /admin/databases/{db}/mvccPOST /admin/databases/{db}/mvcc/pausePOST /admin/databases/{db}/mvcc/resumePOST /admin/databases/{db}/mvcc/prunePOST /admin/databases/{db}/mvcc/schedule
These routes are protected by admin permission when security is enabled.
The schedule endpoint is intentionally operator-facing: it allows the UI to slow, disable, or re-enable automatic lifecycle work without server restart.
Query Protection Model¶
The current protection model is pragmatic rather than fully admission-controlled.
What is protected now:
- search mutation callbacks no longer spawn one goroutine per write event
- embedding wakeups are debounced and remain pull-based
- lifecycle automatic work can be paused or disabled entirely
What still matters operationally:
- manual prune can still be expensive on a heavily churned store
- lifecycle work still touches storage and should be scheduled with awareness of ingest and query peaks
- pinned readers can delay the most aggressive tombstone compaction path
The right operator posture is:
- use scheduled mode for steady-state maintenance
- use manual-only mode for large ingest windows or incident mitigation
- monitor reader age, debt bytes, stale-plan skips, and prune duration before making the cadence more aggressive
Why Debounce and Scheduling Are Separate Controls¶
These controls solve different problems.
Debounce solves burst amplification:
- many writes arrive close together
- background work should collapse that burst into fewer units of work
Scheduling solves maintenance placement:
- some work is intrinsically long-running or storage-heavy
- operators need control over when it runs, not just how often it is triggered
NornicDB now uses both:
- debounce for mutation-driven indexing and worker wakeups
- scheduling for MVCC lifecycle maintenance
Stress Testing Strategy¶
The current storage test strategy for this area focuses on two failure classes:
-
ghost-history retention Repeated update, delete, and recreate churn should not leave an unbounded retained MVCC chain after prune.
-
write-path amplification Mutation-triggered background workers should not scale linearly with every event burst.
Real-engine churn tests in the storage package are the preferred guardrail for the first class because they exercise actual Badger MVCC records, not just planner mocks.
Related documents: