WAL Compaction and Truncation¶

Managing Write-Ahead Log growth in NornicDB

Last Updated: December 2025

Overview¶

NornicDB's Write-Ahead Log (WAL) supports automatic compaction to prevent unbounded growth. Without compaction, the WAL would grow indefinitely in long-running databases, consuming disk space and slowing recovery.

Problem Solved: WAL grows forever until manual snapshot + delete Solution: Automatic periodic snapshots with WAL truncation

Features¶

1. Automatic Compaction (Recommended)¶

Automatic compaction is the recommended approach for production deployments. Enable it through configuration:

YAML configuration:

database:
  wal_dir: "data/wal"
  wal_sync_mode: "batch"
  wal_snapshot_interval: "1h"       # Create snapshots hourly
  wal_auto_compaction_enabled: true  # Enabled by default
  wal_snapshot_dir: "data/snapshots"

Environment variables:

export NORNICDB_WAL_SNAPSHOT_INTERVAL=1h
export NORNICDB_WAL_AUTO_COMPACTION_ENABLED=true

Behavior:

Snapshots created at configured interval (default: 1 hour)
WAL truncated after each successful snapshot
Failures logged but don't crash the database
Automatic retry on next interval
Old snapshots saved to the snapshot directory as snapshot-<timestamp>.json

2. Manual WAL Truncation¶

For development or special cases, you can trigger truncation manually. Create a snapshot and then truncate the WAL to remove all entries before the snapshot point.

Safety Guarantees:

Atomic rename (crash-safe)
Old WAL remains intact until truncation succeeds
Can retry truncation if it fails
Recovery works from partial truncations

3. Disable Automatic Compaction¶

database:
  wal_auto_compaction_enabled: false

export NORNICDB_WAL_AUTO_COMPACTION_ENABLED=false

4. Retention Settings (Immutable Segments)¶

NornicDB stores WAL as immutable segments with a manifest. You can retain sealed segments for audit/ledger use cases.

YAML configuration:

database:
  wal_retention_max_segments: 24
  wal_retention_max_age: "168h" # 7 days

Environment variables:

export NORNICDB_WAL_RETENTION_MAX_SEGMENTS=24
export NORNICDB_WAL_RETENTION_MAX_AGE=168h
export NORNICDB_WAL_LEDGER_RETENTION_DEFAULTS=true

These settings retain sealed WAL segments after snapshots. Auto-compaction remains enabled by default to preserve existing behavior; retention is opt-in.

5. Txlog Query Procedures¶

You can query WAL entries directly via Cypher:

// Read entries by sequence range
CALL db.txlog.entries(1000, 1200) YIELD sequence, operation, tx_id, timestamp, data
RETURN sequence, operation, tx_id, timestamp, data
ORDER BY sequence;

// Read entries for a specific transaction
CALL db.txlog.byTxId('tx-123', 200) YIELD sequence, operation, tx_id, timestamp, data
RETURN sequence, operation, tx_id, timestamp, data
ORDER BY sequence;

How It Works¶

Compaction Process¶

When a compaction cycle runs, NornicDB flushes any pending writes, then creates a point-in-time snapshot of the current database state. Once the snapshot is safely persisted, the WAL is rewritten to contain only entries that arrived after the snapshot. The old WAL file is replaced via an atomic rename so the operation is crash-safe — at no point can a crash leave the WAL in a partial or corrupt state.

Crash Safety¶

The truncation process is crash-safe at every step:

Before rename: Old WAL is intact
During rename: Atomic operation (old or new, never partial)
After rename: New WAL is complete and synced

If a crash occurs:

Before rename: Old WAL used on recovery (full history)
After rename: New WAL used on recovery (snapshot + delta)

Recovery¶

With auto-compaction enabled:

Recovery = Latest Snapshot + Post-Snapshot WAL Entries

Example timeline:

T=0:   Database starts
T=1h:  Snapshot 1 created (100 nodes), WAL truncated
T=2h:  Snapshot 2 created (150 nodes), WAL truncated
T=2.5h: Crash occurs (170 nodes in database)

Recovery:
  Load Snapshot 2 (150 nodes)
  + Replay WAL since T=2h (20 new nodes)
  = 170 nodes recovered

Performance Impact¶

Disk Space¶

Before compaction:

WAL size grows unbounded:
  After 1 day:  ~10GB
  After 1 week: ~70GB
  After 1 month: ~300GB

After compaction (hourly):

WAL size bounded by interval:
  Maximum size: ~500MB (1 hour of writes)
  Average size: ~250MB
  Disk savings: 99%+

Recovery Time¶

Before compaction:

Recovery time = O(total history)
  1 day:  ~30 seconds
  1 week: ~3 minutes
  1 month: ~15 minutes

After compaction:

Recovery time = Snapshot load + O(interval writes)
  Load snapshot: ~2 seconds
  Replay WAL:    ~1 second
  Total:         ~3 seconds (constant!)

Runtime Overhead¶

Snapshot creation: ~2-5ms per 1000 nodes (async, doesn't block writes)
WAL truncation: ~10-50ms (happens every hour, negligible amortized cost)
Total overhead: <0.001% of runtime

Configuration¶

WAL Settings¶

Setting	Default	Description
`wal_dir`	`data/wal`	WAL directory
`wal_sync_mode`	`batch`	Sync mode: `immediate`, `batch`, or `none`
`wal_batch_sync_interval`	`100ms`	Batch sync frequency
`wal_max_file_size`	`100MB`	File rotation trigger (bytes)
`wal_max_entries`	`100000`	File rotation trigger (count)
`wal_snapshot_interval`	`1h`	Auto-compaction frequency
`wal_auto_compaction_enabled`	`true`	Enable/disable auto-compaction

Tuning Snapshot Interval¶

Aggressive (every 15 minutes):

Minimal WAL size
Faster recovery
More snapshot overhead
Good for: High-write, limited disk space

Moderate (every hour — default):

Balanced disk usage
Good recovery time
Low overhead
Good for: Most use cases

Conservative (every 6 hours):

Larger WAL size
Slower recovery
Minimal overhead
Good for: Low-write, plenty of disk space

Monitoring¶

NornicDB exposes compaction metrics that you can use to monitor WAL health:

Total snapshots created — number of successful compaction cycles since startup
Last snapshot time — timestamp of the most recent snapshot
WAL entry count — current number of entries in the active WAL
WAL bytes written — total bytes in the active WAL

These metrics are available through the server's diagnostics and can be monitored via the admin UI or log output.

Troubleshooting¶

Issue: WAL still growing despite auto-compaction¶

Check:

Verify auto-compaction is enabled in your configuration (wal_auto_compaction_enabled: true)
Check snapshot directory for recent files:

ls -lh data/snapshots/
# Should see snapshot-<timestamp>.json files

Check WAL size:
```
ls -lh data/wal/wal.log
```

Issue: Truncation errors¶

Symptom: Logs show "failed to truncate WAL"

Causes:

Disk full
Permission issues
WAL file locked by another process

Solution:

# Check disk space
df -h

# Check permissions
ls -l data/wal/
chmod 644 data/wal/wal.log

# Check for locks
lsof | grep wal.log

Issue: Slow recovery after crash¶

Check snapshot age:

ls -lt data/snapshots/ | head -1

If snapshot is old, auto-compaction may not be running. Verify your configuration and check server logs for compaction errors.

Best Practices¶

Always enable auto-compaction in production — this is the default and should not be disabled unless you have a specific reason.
Monitor snapshot creation — check server logs or metrics to confirm snapshots are being created at the expected interval.
Rotate old snapshots to avoid filling disk with historical snapshots:

find data/snapshots -name "snapshot-*.json" -mtime +7 -delete

Test recovery regularly — periodically verify that your latest snapshot can be loaded and that the WAL replays correctly.