Dynavera/docs/deployment-topologies.md

39 lines
1.4 KiB
Markdown
Raw Permalink Normal View History

2026-02-27 02:09:54 +00:00
# Deployment Topologies
This page compares local and distributed deployment shapes.
## Local Development Topology
Purpose: fast iteration and debugging.
- App services run via `compose/dev/docker-compose.yml`
- Django, Celery, Redis, Postgres, Node, and inference can run together
- Suitable for feature work and integration checks
## Distributed Topology (VPS + GPU Node)
Purpose: production-like separation of concerns.
- **VPS node**: web app, orchestration, API, websocket handling, task queue, database
- **GPU node**: dedicated inference service (chat + embeddings + chunking helpers)
- Request direction is primarily **VPS -> GPU** for model tasks
## Why Split Nodes?
- Keeps model latency/VRAM pressure away from user/session services
- Allows independent scaling of orchestration and inference
- Improves operational clarity around failures and bottlenecks
## Operational Notes
2026-03-22 12:42:33 +00:00
- Confirm inference host/port/protocol values in runtime container env
- Set `INFERENCE_USERNAME` and `INFERENCE_PASSWORD` — the GPU node requires HTTP Basic Auth on all endpoints
2026-02-27 02:09:54 +00:00
- Confirm pgvector extension is enabled in target database
- Keep role flow generation permissions constrained to trusted user types
## Navigation
- [Distributed Runtime Flow](distributed-runtime-flow.md)
- [Application Structure (Detailed)](application-structure.md)
- [Project README](../README.md)