WAL Compaction and Truncation¶
Managing Write-Ahead Log growth in NornicDB
Last Updated: December 2025
Overview¶
NornicDB's Write-Ahead Log (WAL) supports automatic compaction to prevent unbounded growth. Without compaction, the WAL would grow indefinitely in long-running databases, consuming disk space and slowing recovery.
Problem Solved: WAL grows forever until manual snapshot + delete Solution: Automatic periodic snapshots with WAL truncation
Features¶
1. Automatic Compaction (Recommended)¶
Automatic compaction is the recommended approach for production deployments. Enable it through configuration:
YAML configuration:
database:
wal_dir: "data/wal"
wal_sync_mode: "batch"
wal_snapshot_interval: "1h" # Create snapshots hourly
wal_auto_compaction_enabled: true # Enabled by default
wal_snapshot_dir: "data/snapshots"
Environment variables:
Behavior:
- Snapshots created at configured interval (default: 1 hour)
- WAL truncated after each successful snapshot
- Failures logged but don't crash the database
- Automatic retry on next interval
- Old snapshots saved to the snapshot directory as
snapshot-<timestamp>.json
2. Manual WAL Truncation¶
For development or special cases, you can trigger truncation manually. Create a snapshot and then truncate the WAL to remove all entries before the snapshot point.
Safety Guarantees:
- Atomic rename (crash-safe)
- Old WAL remains intact until truncation succeeds
- Can retry truncation if it fails
- Recovery works from partial truncations
3. Disable Automatic Compaction¶
4. Retention Settings (Immutable Segments)¶
NornicDB stores WAL as immutable segments with a manifest. You can retain sealed segments for audit/ledger use cases.
YAML configuration:
Environment variables:
export NORNICDB_WAL_RETENTION_MAX_SEGMENTS=24
export NORNICDB_WAL_RETENTION_MAX_AGE=168h
export NORNICDB_WAL_LEDGER_RETENTION_DEFAULTS=true
These settings retain sealed WAL segments after snapshots. Auto-compaction remains enabled by default to preserve existing behavior; retention is opt-in.
5. Txlog Query Procedures¶
You can query WAL entries directly via Cypher:
// Read entries by sequence range
CALL db.txlog.entries(1000, 1200) YIELD sequence, operation, tx_id, timestamp, data
RETURN sequence, operation, tx_id, timestamp, data
ORDER BY sequence;
// Read entries for a specific transaction
CALL db.txlog.byTxId('tx-123', 200) YIELD sequence, operation, tx_id, timestamp, data
RETURN sequence, operation, tx_id, timestamp, data
ORDER BY sequence;
How It Works¶
Compaction Process¶
When a compaction cycle runs, NornicDB flushes any pending writes, then creates a point-in-time snapshot of the current database state. Once the snapshot is safely persisted, the WAL is rewritten to contain only entries that arrived after the snapshot. The old WAL file is replaced via an atomic rename so the operation is crash-safe — at no point can a crash leave the WAL in a partial or corrupt state.
Crash Safety¶
The truncation process is crash-safe at every step:
- Before rename: Old WAL is intact
- During rename: Atomic operation (old or new, never partial)
- After rename: New WAL is complete and synced
If a crash occurs:
- Before rename: Old WAL used on recovery (full history)
- After rename: New WAL used on recovery (snapshot + delta)
Recovery¶
With auto-compaction enabled:
Example timeline:
T=0: Database starts
T=1h: Snapshot 1 created (100 nodes), WAL truncated
T=2h: Snapshot 2 created (150 nodes), WAL truncated
T=2.5h: Crash occurs (170 nodes in database)
Recovery:
Load Snapshot 2 (150 nodes)
+ Replay WAL since T=2h (20 new nodes)
= 170 nodes recovered
Performance Impact¶
Disk Space¶
Before compaction:
After compaction (hourly):
WAL size bounded by interval:
Maximum size: ~500MB (1 hour of writes)
Average size: ~250MB
Disk savings: 99%+
Recovery Time¶
Before compaction:
After compaction:
Recovery time = Snapshot load + O(interval writes)
Load snapshot: ~2 seconds
Replay WAL: ~1 second
Total: ~3 seconds (constant!)
Runtime Overhead¶
- Snapshot creation: ~2-5ms per 1000 nodes (async, doesn't block writes)
- WAL truncation: ~10-50ms (happens every hour, negligible amortized cost)
- Total overhead: <0.001% of runtime
Configuration¶
WAL Settings¶
| Setting | Default | Description |
|---|---|---|
wal_dir | data/wal | WAL directory |
wal_sync_mode | batch | Sync mode: immediate, batch, or none |
wal_batch_sync_interval | 100ms | Batch sync frequency |
wal_max_file_size | 100MB | File rotation trigger (bytes) |
wal_max_entries | 100000 | File rotation trigger (count) |
wal_snapshot_interval | 1h | Auto-compaction frequency |
wal_auto_compaction_enabled | true | Enable/disable auto-compaction |
Tuning Snapshot Interval¶
Aggressive (every 15 minutes):
- Minimal WAL size
- Faster recovery
- More snapshot overhead
- Good for: High-write, limited disk space
Moderate (every hour — default):
- Balanced disk usage
- Good recovery time
- Low overhead
- Good for: Most use cases
Conservative (every 6 hours):
- Larger WAL size
- Slower recovery
- Minimal overhead
- Good for: Low-write, plenty of disk space
Monitoring¶
NornicDB exposes compaction metrics that you can use to monitor WAL health:
- Total snapshots created — number of successful compaction cycles since startup
- Last snapshot time — timestamp of the most recent snapshot
- WAL entry count — current number of entries in the active WAL
- WAL bytes written — total bytes in the active WAL
These metrics are available through the server's diagnostics and can be monitored via the admin UI or log output.
Troubleshooting¶
Issue: WAL still growing despite auto-compaction¶
Check:
-
Verify auto-compaction is enabled in your configuration (
wal_auto_compaction_enabled: true) -
Check snapshot directory for recent files:
- Check WAL size:
Issue: Truncation errors¶
Symptom: Logs show "failed to truncate WAL"
Causes:
- Disk full
- Permission issues
- WAL file locked by another process
Solution:
# Check disk space
df -h
# Check permissions
ls -l data/wal/
chmod 644 data/wal/wal.log
# Check for locks
lsof | grep wal.log
Issue: Slow recovery after crash¶
Check snapshot age:
If snapshot is old, auto-compaction may not be running. Verify your configuration and check server logs for compaction errors.
Best Practices¶
-
Always enable auto-compaction in production — this is the default and should not be disabled unless you have a specific reason.
-
Monitor snapshot creation — check server logs or metrics to confirm snapshots are being created at the expected interval.
-
Rotate old snapshots to avoid filling disk with historical snapshots:
- Test recovery regularly — periodically verify that your latest snapshot can be loaded and that the WAL replays correctly.