Dynavera/docs/deployment-topologies.md
Viswamedha Nalabotu 43618ee3f4 Fixed readme errors
2026-03-22 12:42:33 +00:00

1.4 KiB

Deployment Topologies

This page compares local and distributed deployment shapes.

Local Development Topology

Purpose: fast iteration and debugging.

  • App services run via compose/dev/docker-compose.yml
  • Django, Celery, Redis, Postgres, Node, and inference can run together
  • Suitable for feature work and integration checks

Distributed Topology (VPS + GPU Node)

Purpose: production-like separation of concerns.

  • VPS node: web app, orchestration, API, websocket handling, task queue, database
  • GPU node: dedicated inference service (chat + embeddings + chunking helpers)
  • Request direction is primarily VPS -> GPU for model tasks

Why Split Nodes?

  • Keeps model latency/VRAM pressure away from user/session services
  • Allows independent scaling of orchestration and inference
  • Improves operational clarity around failures and bottlenecks

Operational Notes

  • Confirm inference host/port/protocol values in runtime container env
  • Set INFERENCE_USERNAME and INFERENCE_PASSWORD — the GPU node requires HTTP Basic Auth on all endpoints
  • Confirm pgvector extension is enabled in target database
  • Keep role flow generation permissions constrained to trusted user types

Navigation