Windows CUDA Setup with yzma¶
NornicDB uses yzma for GPU-accelerated inference on Windows. This means:
✅ No CGO required — Just download prebuilt DLLs
✅ Easy CUDA support — One command to download CUDA libraries
✅ Easy updates — Re-run setup when llama.cpp updates
✅ No ABI conflicts — yzma handles Windows DLL loading
Quick Setup (5 minutes)¶
Step 1: Download CUDA Libraries¶
Run the setup script (or manually use yzma):
# Option 1: Run setup script
.\scripts\setup-windows-cuda.bat
# Option 2: Manual setup
go install github.com/hybridgroup/yzma/cmd/yzma@latest
yzma install --lib .\lib\llama --processor cuda
Step 2: Set Environment Variable¶
# Temporary (for current PowerShell session)
$env:NORNICDB_LIB = ".\lib\llama"
# Or permanent (system-wide, Admin required)
[Environment]::SetEnvironmentVariable('NORNICDB_LIB', '.\lib\llama', 'User')
Step 3: Build and Run¶
# Build NornicDB
go build -o nornicdb.exe ./cmd/nornicdb
# Verify GPU detection
.\nornicdb.exe --check-gpu
# Run with GPU acceleration
.\nornicdb.exe
Expected output:
Prerequisites¶
- Windows 10/11 (64-bit)
- NVIDIA GPU with CUDA Compute Capability 6.0 or higher
- CUDA Toolkit 12.x (Download) — drivers/runtime only
- Go 1.21+ (for building from source)
- PowerShell 5.0+ or Git Bash (for setup script)
Setup CUDA Libraries¶
NornicDB uses yzma for GPU acceleration. This provides a modern, no-CGO interface to llama.cpp with prebuilt CUDA support.
One-Command Setup¶
Run the setup script from your NornicDB installation directory:
PowerShell:
Git Bash / WSL:
This script will: 1. Download and install yzma (one-time only) 2. Download CUDA-optimized llama.cpp libraries to ./lib/llama/ 3. Show environment variable configuration 4. Verify GPU detection
Manual Setup (Advanced)¶
If you prefer to set up manually:
# 1. Install yzma
go get github.com/hybridgroup/yzma@latest
# 2. Download CUDA libraries
yzma install --lib ./lib/llama --processor cuda
# 3. Set environment variable (for cross-machine portability)
[Environment]::SetEnvironmentVariable("NORNICDB_LIB", "$(Get-Location)\lib\llama", "User")
# 4. Restart PowerShell for environment variable to take effect
Verification¶
Check GPU Utilization¶
While NornicDB is running, open another terminal:
You should see: - GPU utilization increasing during queries - Memory usage showing loaded models - Temperature rising during inference
Benchmark Performance¶
# Generate 1000 embeddings
.\nornicdb.exe benchmark --embeddings --count 1000
# Expected output:
# Embeddings: 1000/1000
# Time: 2.3s
# Rate: 434 embeddings/sec
# GPU Utilization: 95%
Compare with CPU-only mode:
# Temporarily disable GPU
$env:NORNICDB_NO_CUDA = "1"
# Run benchmark again
.\nornicdb.exe benchmark --embeddings --count 1000
# Expected output:
# Embeddings: 1000/1000
# Time: 45.8s
# Rate: 21 embeddings/sec
# GPU Utilization: 0%
# Re-enable GPU
$env:NORNICDB_NO_CUDA = ""
Expected Speedup: 10-50x faster with GPU vs CPU (depends on model and hardware)
Troubleshooting¶
GPU Not Detected¶
Symptom: NornicDB reports "GPU backend not found, using CPU"
Solutions:
-
Verify yzma libraries are present:
-
Check NORNICDB_LIB environment variable:
-
Verify GPU is CUDA-capable:
-
Check CUDA Toolkit version:
-
Update GPU Drivers:
- Download latest drivers from NVIDIA
- Restart computer after driver update
DLL Load Error¶
Symptom: "The code execution cannot proceed because libllama.dll was not found"
Solution: Ensure yzma libraries are properly installed:
# Re-run setup script
.\scripts\setup-windows-cuda.bat
# Or manually:
yzma install --lib ./lib/llama --processor cuda
Out of Memory Error¶
Symptom: "CUDA out of memory"
Solutions:
- Use a smaller model:
- Try quantized models (Q4_K_M, Q5_K_M instead of F16)
-
Use fewer GPU layers in config
-
Close other GPU applications:
- Chrome/Edge (hardware acceleration)
- Games
-
Other ML applications
-
Check available VRAM:
Slow Performance¶
Symptom: GPU is detected but performance is slower than expected
Solutions:
-
Verify GPU is actually being used:
-
Check Power Mode:
-
NVIDIA Control Panel → Manage 3D Settings → Power Management Mode → "Prefer Maximum Performance"
-
Check thermal throttling:
Configuration¶
Custom Library Path¶
By default, NornicDB looks for libraries in ./lib/llama/. To use a custom location:
# Windows (Persistent)
[Environment]::SetEnvironmentVariable("NORNICDB_LIB", "D:\nvidia-libs", "User")
# Windows (Current session only)
$env:NORNICDB_LIB = "D:\nvidia-libs"
# Linux/macOS
export NORNICDB_LIB="/opt/nvidia-libs"
GPU Layer Configuration¶
Control how many model layers run on GPU:
# nornicdb.yaml
embed:
model: bge-m3-q8_0.gguf
gpu_layers: -1 # -1 = all layers (default)
# 0 = CPU only
# 20 = 20 layers on GPU, rest on CPU
Building from Source (Developers)¶
If you want to build NornicDB from source with CUDA support:
# Ensure CUDA libraries are installed
.\scripts\setup-windows-cuda.bat
# Build with CUDA support
go build -o nornicdb.exe ./cmd/nornicdb
# Or build without CGO (uses yzma automatically)
go build -tags nolocalllm -o nornicdb.exe ./cmd/nornicdb
The build system automatically detects available CUDA libraries via yzma.
More Information¶
- yzma Project: github.com/hybridgroup/yzma
- llama.cpp Project: github.com/ggerganov/llama.cpp
- NVIDIA CUDA Documentation: docs.nvidia.com/cuda
- NornicDB Architecture: See Architecture Guide