Qdrant Collection-to-Database: Architecture Diagrams¶
Overview¶
This document provides detailed visual diagrams comparing the legacy collection-based architecture (removed) with the current database-based architecture (what exists today in code).
Breakage Policy (Explicit)¶
- No migration.
- No backwards compatibility.
- Only databases created via the new
CreateCollectionflow (i.e., containing_collection_meta) are treated as Qdrant collections.
Legacy Architecture (Before, removed)¶
High-Level Component Diagram¶
graph TB
subgraph "Client Layer"
QC[Qdrant Client<br/>gRPC]
CC[Cypher Client<br/>Bolt/HTTP]
end
subgraph "Qdrant gRPC Server"
CS[CollectionsService]
PS[PointsService]
CR[CollectionRegistry<br/>In-Memory Map]
end
subgraph "Storage Layer"
SE[Storage Engine<br/>Shared Namespace: 'nornic']
CM[Collection Metadata<br/>Label: _QdrantCollection<br/>ID: _qdrant_collection:bench_col]
PN[Point Nodes<br/>Labels: QdrantPoint, bench_col<br/>Mixed in shared namespace]
end
subgraph "Search Layer"
SS[Search Service<br/>Shared Index<br/>All Collections]
end
QC -->|CreateCollection| CS
QC -->|UpsertPoints| PS
CC -->|Cypher Queries| SE
CS --> CR
CR -->|Load/Save| SE
PS -->|Create/Update| SE
PS -->|Index| SS
SE --> CM
SE --> PN
SS -->|Search| PN
style CR fill:#ffcccc
style CM fill:#ffcccc
style PN fill:#ffcccc
style SE fill:#ffeeee Data Storage Structure¶
Storage Engine (Namespace: 'nornic')
│
├── Collection Metadata Nodes
│ ├── _qdrant_collection:bench_col
│ │ └── Properties: {name: "bench_col", dimensions: 1024, distance: "Cosine"}
│ ├── _qdrant_collection:documents
│ │ └── Properties: {name: "documents", dimensions: 768, distance: "Cosine"}
│ └── ...
│
└── Point Nodes (All Collections Mixed)
├── point-1
│ ├── Labels: ["QdrantPoint", "bench_col"]
│ ├── NamedEmbeddings: {"default": [0.1, 0.2, ...]}
│ └── Properties: {payload data}
├── point-2
│ ├── Labels: ["QdrantPoint", "bench_col"]
│ └── ...
├── point-1000
│ ├── Labels: ["QdrantPoint", "documents"]
│ └── ...
└── ...
Collection Delete Flow (Current - SLOW)¶
sequenceDiagram
participant Client
participant CS as CollectionsService
participant CR as CollectionRegistry
participant SE as Storage Engine
participant SS as Search Service
Client->>CS: DeleteCollection("bench_col")
CS->>CR: DeleteCollection()
CR->>SE: GetNodesByLabel("bench_col")
Note over SE: ⚠️ LOADS ALL 20K NODES<br/>INTO MEMORY (SLOW)
SE-->>CR: [20,000 nodes]
CR->>CR: Filter QdrantPoint nodes
CR->>SE: BulkDeleteNodes([20k IDs])
Note over SE: ⚠️ TRIGGERS 20K NOTIFICATIONS<br/>(BLOCKS)
SE->>SS: notifyNodeDeleted() × 20k
Note over SS: ⚠️ REMOVES FROM INDEX<br/>(BLOCKS)
CR->>SE: DeleteNode("_qdrant_collection:bench_col")
CR->>CR: Remove from cache
CR-->>CS: Success
CS-->>Client: Success
Note over Client,SS: ⏱️ Total Time: 5-30+ seconds<br/>for 20k points Point Upsert Flow (Current)¶
sequenceDiagram
participant Client
participant PS as PointsService
participant CR as CollectionRegistry
participant SE as Storage Engine
participant SS as Search Service
Client->>PS: UpsertPoints("bench_col", points)
PS->>CR: GetCollection("bench_col")
CR-->>PS: CollectionMeta
PS->>SE: CreateNode(point)
Note over SE: Stored in shared namespace<br/>with collection label
SE->>SS: IndexNode(point)
Note over SS: Indexed in shared index
SS-->>PS: Success
PS-->>Client: Success Current Architecture (After, implemented)¶
High-Level Component Diagram¶
graph TB
subgraph "Client Layer"
QC[Qdrant Client<br/>gRPC]
CC[Cypher Client<br/>Bolt/HTTP]
end
subgraph "Qdrant gRPC Server"
CS[CollectionsService]
PS[PointsService]
DM[DatabaseManager<br/>from nornicdb]
end
subgraph "Database Layer"
DB1[Database: bench_col<br/>Namespace Isolation]
DB2[Database: documents<br/>Namespace Isolation]
DB3[Database: ...]
end
subgraph "Storage Layer"
SE1[Storage: bench_col<br/>Namespaced Engine]
SE2[Storage: documents<br/>Namespaced Engine]
end
subgraph "Search Layer"
SS1[Search Service: bench_col<br/>Isolated Index]
SS2[Search Service: documents<br/>Isolated Index]
end
QC -->|CreateCollection| CS
QC -->|UpsertPoints| PS
CC -->|USE bench_col| DB1
CC -->|USE documents| DB2
CS --> DM
PS --> DM
DM -->|Create/Get| DB1
DM -->|Create/Get| DB2
DM -->|Create/Get| DB3
DB1 --> SE1
DB2 --> SE2
DB1 --> SS1
DB2 --> SS2
style DM fill:#ccffcc
style DB1 fill:#ccffcc
style DB2 fill:#ccffcc
style SE1 fill:#ccffcc
style SE2 fill:#ccffcc
style SS1 fill:#ccffcc
style SS2 fill:#ccffcc Data Storage Structure¶
Storage Engine (Multi-Database)
│
├── Database: "bench_col" (Collection = Database)
│ ├── _collection_meta (Required)
│ │ ├── Label: _CollectionMeta
│ │ └── Properties: {dimensions: 1024, distance: "Cosine", schema_version: 1}
│ │
│ └── Point Nodes (Isolated in bench_col namespace)
│ ├── point-1
│ │ ├── NamedEmbeddings: {"default": [0.1, 0.2, ...]}
│ │ └── Properties: {payload data}
│ ├── point-2
│ └── ... (20k points, all in bench_col namespace)
│
├── Database: "documents" (Another Collection)
│ ├── _collection_meta
│ └── Point Nodes (Isolated in documents namespace)
│ └── ...
│
└── Database: "nornic" (Default, non-Qdrant)
└── Regular graph nodes
Node ID Hygiene (Current)¶
- Namespace isolates by collection; point IDs must not embed the collection name.
- Reserve
_collection_metaand all IDs starting with_for internal metadata nodes. - Point node IDs use
qdrant:point:<raw-id>(collection name is not embedded; the database namespace already scopes it). _collection_metais reserved for metadata; IDs beginning with_are reserved for internal nodes.
Collection Delete Flow (Current - FAST)¶
sequenceDiagram
participant Client
participant CS as CollectionsService
participant DM as DatabaseManager
participant SE as Storage Engine (Badger)
Client->>CS: DeleteCollection("bench_col")
CS->>DM: DeleteDatabase("bench_col")
Note over DM: ✅ Drops namespace prefix "bench_col:"<br/>No per-point deletes
DM->>SE: DeleteByPrefix("bench_col:")
Note over SE: ✅ Uses Badger DropPrefix<br/>+ targeted index cleanup
DM-->>CS: Success
CS-->>Client: Success
Note over Client,SE: ⏱️ Total Time: <100ms<br/>regardless of point count Point Upsert Flow (Proposed)¶
sequenceDiagram
participant Client
participant PS as PointsService
participant DM as DatabaseManager
participant DB as Database: bench_col
participant SE as Storage Engine
participant SS as Search Service: bench_col
Client->>PS: UpsertPoints("bench_col", points)
PS->>DM: GetDatabase("bench_col")
DM-->>PS: Database: bench_col
PS->>DB: CreateNode(point)
Note over DB: Automatically namespaced<br/>to bench_col
DB->>SE: CreateNode(point)
Note over SE: Stored in bench_col namespace<br/>automatic isolation
DB->>SS: IndexNode(point)
Note over SS: Indexed in bench_col<br/>isolated index
SS-->>PS: Success
PS-->>Client: Success Comparison Matrix¶
| Aspect | Before (Collection Registry) | After (Database-Based) |
|---|---|---|
| Storage | Shared namespace, label-based | Namespaced per collection |
| Isolation | Label filtering | Namespace isolation |
| Delete Performance | O(n) - scan all nodes | O(1) - delete namespace |
| Collection Management | Separate registry | DatabaseManager |
| Cypher Access | ❌ No direct access | ✅ USE collection MATCH (n) |
| Metadata Storage | Metadata nodes in shared space | Optional metadata node in DB |
| Search Index | Shared index, label filtering | Per-database isolated indexes |
| Scalability | Single shared index | Per-collection indexes |
| Complexity | Registry + Storage | Just DatabaseManager |
Data Flow: Collection Creation¶
Before¶
CreateCollection("bench_col")
↓
CollectionRegistry.CreateCollection()
↓
Storage.CreateNode(metadataNode)
├── ID: "_qdrant_collection:bench_col"
├── Label: "_QdrantCollection"
└── Properties: {name, dimensions, distance}
↓
Registry.collections["bench_col"] = meta
After¶
CreateCollection("bench_col")
↓
DatabaseManager.CreateDatabase("bench_col")
↓
Database created with namespace "bench_col"
↓
(Optional) Store metadata node in database
├── ID: "_collection_meta"
├── Label: "_CollectionMeta"
└── Properties: {dimensions, distance}
Data Flow: Point Query¶
Before¶
SearchPoints("bench_col", query)
↓
SearchService.Search(query, embedding)
↓
Search in shared index (all collections)
↓
Filter results by label "bench_col"
↓
Return filtered results
After¶
SearchPoints("bench_col", query)
↓
DatabaseManager.GetDatabase("bench_col")
↓
Database.GetSearchService()
↓
SearchService.Search(query, embedding)
↓
Search in bench_col index (already isolated)
↓
Return results (no filtering needed)
Benefits Visualization¶
Performance Improvement¶
Delete Collection (20,000 points):
Before: ████████████████████████████████ 30+ seconds
After: █ <100ms
Improvement: 300x+ faster
Memory Isolation¶
Before:
┌─────────────────────────────────┐
│ Shared Search Index │
│ - All collections mixed │
│ - 20k points from bench_col │
│ - 10k points from documents │
│ - 5k points from other │
└─────────────────────────────────┘
After:
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ bench_col Index │ │ documents Index │ │ other Index │
│ - 20k points │ │ - 10k points │ │ - 5k points │
│ Isolated │ │ Isolated │ │ Isolated │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Query Capabilities¶
Before:
┌─────────────────────────────────────┐
│ Qdrant gRPC API │
│ ✅ Search within collection │
│ ❌ No Cypher access │
│ ❌ No cross-collection queries │
└─────────────────────────────────────┘
After:
┌─────────────────────────────────────┐
│ Qdrant gRPC API │
│ ✅ Search within collection │
│ ✅ Cypher access (NEW!) │
│ ✅ Cross-collection queries (NEW!) │
│ ✅ Graph queries on points (NEW!) │
└─────────────────────────────────────┘
Deployment Strategy¶
Deployment: Clean Break
┌─────────────────────────────────────┐
│ 1. Remove CollectionRegistry │
│ 2. Remove PersistentCollectionRegistry│
│ 3. Update all services to use │
│ DatabaseManager │
│ 4. All collections = databases │
└─────────────────────────────────────┘
Note: Existing collections will not be migrated.
They will need to be recreated as databases.
Example: Querying Collections via Cypher¶
Before: Not Possible¶
After: Full Cypher Support¶
-- ✅ Query points in a collection
USE bench_col
MATCH (n)
RETURN n
LIMIT 10
-- ✅ Vector search in collection
USE bench_col
CALL db.index.vector.queryNodes('embeddings', 10, 'machine learning')
YIELD node, score
RETURN node, score
-- ✅ Filter by properties
USE bench_col
MATCH (n)
WHERE n.category = 'technology'
RETURN n
-- ✅ Create relationships between points
USE bench_col
MATCH (p1:Point), (p2:Point)
WHERE p1.id = 'point-1' AND p2.id = 'point-2'
CREATE (p1)-[:SIMILAR_TO {score: 0.95}]->(p2)
-- ✅ Cross-collection queries (NEW!)
CALL {
USE bench_col
MATCH (n) RETURN 'bench_col' AS db, count(n) AS c
UNION ALL
USE documents
MATCH (n) RETURN 'documents' AS db, count(n) AS c
}
RETURN db, c
Implementation Sequence¶
graph LR
A[1. Update CollectionsService<br/>Use DatabaseManager] --> B[2. Update PointsService<br/>Use Collection Databases]
B --> C[3. Update DeleteCollection<br/>Use DeleteDatabase]
C --> D[4. Update Search<br/>Per-Database Indexes]
D --> E[5. Remove Old Registry<br/>Cleanup]
E --> F[6. Verify drop performance<br/>and docs]
style A fill:#e1f5ff
style B fill:#e1f5ff
style C fill:#e1f5ff
style D fill:#fff4e1
style E fill:#fff4e1
style F fill:#ffe1e1 Risk Assessment¶
| Risk | Impact | Mitigation |
|---|---|---|
| Breaking Qdrant API compatibility | High | Keep API surface unchanged; update mapping tests and docs |
| Performance regression | Medium | Benchmark + profile; optimize drops and hot search paths |
| Unexpected name validation edge | Medium | Validate collection/database names; reserved names rejected |
Success Metrics¶
After refactoring, we should see:
- Delete Performance: Collection delete <100ms (vs 5-30+ seconds)
- Query Performance: Same or better (namespace isolation = automatic filtering)
- Memory Usage: Similar or better (better isolation = can free per-collection)
- Code Complexity: Reduced (remove registry, use existing DatabaseManager)
- Feature Completeness: ✅ Cypher access to collections (new capability)
Next Steps¶
- Review this plan with the team
- Prototype database-based collection creation
- Benchmark delete performance improvement
- Implement Phase 1 (foundation)
- Test thoroughly before full rollout