Dynavera/docs/deployment-topologies.md

# Deployment Topologies

This page compares local and distributed deployment shapes.

## Local Development Topology

Purpose: fast iteration and debugging.

- App services run via `compose/dev/docker-compose.yml`
- Django, Celery, Redis, Postgres, Node, and inference can run together
- Suitable for feature work and integration checks

## Distributed Topology (VPS + GPU Node)

Purpose: production-like separation of concerns.

- **VPS node**: web app, orchestration, API, websocket handling, task queue, database
- **GPU node**: dedicated inference service (chat + embeddings + chunking helpers)
- Request direction is primarily **VPS -> GPU** for model tasks

## Why Split Nodes?

- Keeps model latency/VRAM pressure away from user/session services
- Allows independent scaling of orchestration and inference
- Improves operational clarity around failures and bottlenecks

## Operational Notes

- Confirm inference host/port/protocol values in runtime container env
- Set `INFERENCE_USERNAME` and `INFERENCE_PASSWORD` — the GPU node requires HTTP Basic Auth on all endpoints
- Confirm pgvector extension is enabled in target database
- Keep role flow generation permissions constrained to trusted user types

## Navigation

- [Distributed Runtime Flow](distributed-runtime-flow.md)
- [Application Structure (Detailed)](application-structure.md)
- [Project README](../README.md)
Updated readme and added subdocs 2026-02-27 02:09:54 +00:00			`# Deployment Topologies`

			`This page compares local and distributed deployment shapes.`

			`## Local Development Topology`

			`Purpose: fast iteration and debugging.`

			- App services run via `compose/dev/docker-compose.yml`
			`- Django, Celery, Redis, Postgres, Node, and inference can run together`
			`- Suitable for feature work and integration checks`

			`## Distributed Topology (VPS + GPU Node)`

			`Purpose: production-like separation of concerns.`

			`- VPS node: web app, orchestration, API, websocket handling, task queue, database`
			`- GPU node: dedicated inference service (chat + embeddings + chunking helpers)`
			`- Request direction is primarily VPS -> GPU for model tasks`

			`## Why Split Nodes?`

			`- Keeps model latency/VRAM pressure away from user/session services`
			`- Allows independent scaling of orchestration and inference`
			`- Improves operational clarity around failures and bottlenecks`

			`## Operational Notes`

Fixed readme errors 2026-03-22 12:42:33 +00:00			`- Confirm inference host/port/protocol values in runtime container env`
			- Set `INFERENCE_USERNAME` and `INFERENCE_PASSWORD` — the GPU node requires HTTP Basic Auth on all endpoints
Updated readme and added subdocs 2026-02-27 02:09:54 +00:00			`- Confirm pgvector extension is enabled in target database`
			`- Keep role flow generation permissions constrained to trusted user types`

			`## Navigation`

			`- [Distributed Runtime Flow](distributed-runtime-flow.md)`
			`- [Application Structure (Detailed)](application-structure.md)`
			`- [Project README](../README.md)`