Tweaked report diagram, references and readme structure

2026-03-16 20:14:19 +00:00 · 2026-03-16 20:14:19 +00:00 · 3cf7b1e03d
commit 3cf7b1e03d
parent a9ba16c76d
4 changed files with 497 additions and 302 deletions
--- a/README.md
+++ b/README.md
@ -1,232 +1,154 @@
-# Dynavera: Distributed Agentic Onboarding System
+# Dynavera: An Agentic Approach to Role-Specific Trainers

-Dynavera is a multi-agent onboarding platform that combines role-specific training flows, retrieval from organization documents, and LLM-powered guidance. The system is intentionally distributed so that app orchestration and heavy inference can run independently.
+[![Vue 3](https://img.shields.io/badge/Vue-35495E?style=for-the-badge&logo=vue.js&logoColor=4FC08D)](https://vuejs.org/)
+[![Vite](https://img.shields.io/badge/Vite-646CFF?style=for-the-badge&logo=vite&logoColor=white)](https://vite.dev/)
+[![Django](https://img.shields.io/badge/Django-092E20?style=for-the-badge&logo=django&logoColor=white)](https://www.djangoproject.com/)
+[![DRF](https://img.shields.io/badge/DRF-092E20?style=for-the-badge&logo=django&logoColor=white)](https://www.django-rest-framework.org/)
+[![Channels](https://img.shields.io/badge/Django_Channels-092E20?style=for-the-badge&logo=django&logoColor=white)](https://channels.readthedocs.io/)
+[![PostgreSQL](https://img.shields.io/badge/PostgreSQL-4169E1?style=for-the-badge&logo=postgresql&logoColor=white)](https://www.postgresql.org/)
+[![pgvector](https://img.shields.io/badge/pgvector-00599C?style=for-the-badge&logo=postgresql&logoColor=white)](https://github.com/pgvector/pgvector)
+[![Redis](https://img.shields.io/badge/Redis-DC382D?style=for-the-badge&logo=redis&logoColor=white)](https://redis.io/)
+[![Celery](https://img.shields.io/badge/Celery-37814A?style=for-the-badge&logo=celery&logoColor=white)](https://docs.celeryq.dev/en/stable/)
+[![FastAPI](https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white)](https://fastapi.tiangolo.com/)
+[![Docker](https://img.shields.io/badge/Docker-2496ED?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)

-Repository: https://git.cs.bham.ac.uk/projects-2025-26/vxn217
-
---
+Dynavera is a distributed agentic onboarding platform designed to solve the productivity gap in organizational training. The motivation was to create a system that delivers tailored, retrieval-grounded guidance for new roles, leveraging modern AI and orchestration techniques. This project was built to address the challenge of scalable, role-specific onboarding—where generic training fails to meet the needs of specialized positions. By combining curriculum generation, document-grounded retrieval, and live progress tracking, Dynavera enables organizations to onboard users efficiently and transparently. Through this project, I learned how to architect distributed systems, integrate retrieval-augmented generation, and design agent workflows for real-world impact.

 ## Table of Contents

- [At a Glance](#at-a-glance)
- [Inspector & Supervisor Notes](#inspector--supervisor-notes)
+- [Overview](#overview)
+- [Core Features](#core-features)
+- [Architecture](#architecture)
+- [Quick Start](#quick-start)
+- [Usage & Smoke Test](#usage--smoke-test)
+- [Project Structure](#project-structure)
+- [Demo Access](#demo-access)
+- [Operational Commands](#operational-commands)
 - [Screenshots](#screenshots)
- [System Architecture (High-Level)](#system-architecture-high-level)
- [Project Goals](#project-goals)
- [Tech Stack](#tech-stack)
- [Repository Guide](#repository-guide)
- [Notable Branches](#notable-branches)
- [Evaluation Credentials](#evaluation-credentials)
- [Recommended Evaluation Walkthrough](#recommended-evaluation-walkthrough)
- [Local Setup (Cross-Platform)](#local-setup-cross-platform)
- [Common Commands](#common-commands)
- [Additional Documentation](#additional-documentation)
+- [Documentation](#documentation)

---

-## At a Glance

-Dynavera focuses on one question: **how do we deliver onboarding that is role-aware, context-aware, and operationally practical?**
+## Overview & Core Features

-The platform does this by combining:
+Dynavera addresses the onboarding productivity gap by combining:

- A Django management layer for accounts, roles, sessions, and APIs
- An agentic orchestration loop over WebSockets for responsive interactions
- A retrieval layer using pgvector and organization-provided documents
- A GPU inference service for chat completions, embeddings, and chunking support
+- Role-aware curriculum generation
+- Retrieval-augmented responses grounded in uploaded organizational documents
+- Tool-aware orchestration over WebSockets
+- Local-first inference support for privacy-sensitive deployments

---
+The runtime is intentionally distributed: Django manages state and governance, while a dedicated inference service handles model-intensive workloads.

-## Inspector & Supervisor Notes
+Key features:

-Primary locations relevant to technical quality, architecture reasoning, and evaluation:
+- Distributed architecture separating application control plane and inference plane
+- Multi-agent style orchestration for curriculum, knowledge, assessment, and monitoring behaviors
+- RAG pipeline with semantic chunking, embeddings, and pgvector retrieval
+- Live onboarding session updates via Django Channels WebSockets
+- Persistent session/progress storage for auditability and recovery

- Setup, context, and high-level flow: this `README.md`
- Architecture notes: `docs/`
- Orchestration runtime: `apps/onboarding/consumers.py`
- Retrieval bridge and tool routing: `apps/onboarding/mcp.py`
- Ingestion and vectorization pipeline: `apps/knowledge/tasks.py`
- Inference service entrypoint: `gpu_server.py`
+## Architecture

-Evaluation-relevant themes represented in the codebase:
-
- Role-scoped onboarding generation and progression
- Retrieval grounding through uploaded training files
- Separation of management services and inference services
- End-to-end flow from upload to onboarding completion
-
---
-
-## Screenshots
-
-### Home Page
-
-![Home Page](docs/images/home-page.png)
-
-### Organization Page
-
-![Organization Page](docs/images/organization-page.png)
-
-### Onboarding Loading / Generation State
-
-![Onboarding Loading](docs/images/onboarding-loading-page.png)
-
-### Onboarding Content Flow
-
-![Onboarding Flow](docs/images/onboarding-content-page.png)
-
---
-
-## System Architecture (High-Level)
-
-At a high level, Dynavera is split into a management side and an inference side. The orchestrator coordinates user interaction, tool calls, and model responses between the two.
+High-level architecture diagram:

 ![High Level System Architecture](docs/high-level-system-architecture.png)

-For the fuller architecture narrative (runtime flow and component placement), see:
+Key backend runtime entry points:

- [Distributed Runtime Flow](docs/distributed-runtime-flow.md)
+- `apps/onboarding/consumers.py` for orchestration loop and WebSocket flow
+- `apps/onboarding/mcp.py` for tool routing and backend tool execution
+- `apps/knowledge/tasks.py` for ingestion/chunking/embedding workflow
+- `gpu_server.py` for inference and embedding endpoints

---
+## Quick Start

-## Project Goals
+Prerequisites:

- [x] Distributed orchestration across VPS and GPU nodes
- [x] Context-aware onboarding with RAG (semantic chunking + vector search)
- [x] Stateful agent workflow over WebSockets
- [x] Automated ingestion from role training documents (PDF/TXT)
+- Docker Engine or Docker Desktop
+- NVIDIA drivers and NVIDIA Container Toolkit (for GPU inference)

---
-
-## Tech Stack
-
- **Backend**: Django, Django REST Framework, Django Channels
- **Frontend**: Vue 3, Vite, Pinia
- **Database**: PostgreSQL with pgvector
- **AI/ML**: FastAPI, Sentence Transformers, llama.cpp-compatible serving
- **Infra**: Docker, Redis, Celery
-
---
-
-## Repository Guide
-
-Key areas in the repo:
-
- `apps/accounts`: user model, organization/role ownership, membership flows
- `apps/knowledge`: file ingestion, chunking pipeline, vector document persistence
- `apps/onboarding`: role flows, sessions, websocket orchestration, MCP-style tool routing
- `config/`: settings, API/ASGI routing, environment wiring
- `compose/`: development and production deployment manifests
- `gpu_server.py`: inference and embedding service
-
-For a more detailed breakdown:
-
- [Application Structure (Detailed)](docs/application-structure.md)
-
---
-
-## Notable Branches
-
-These remote branches are useful for understanding how the project evolved:
-
- `origin/main`: stable integration branch used for the current baseline.
- `origin/feature/node-setup`: early full-stack setup work introducing the initial frontend/backend server shape.
- `origin/feature/agents`: branch focused on agent-related backend changes, including pgvector-oriented database work.
- `origin/feature/mcp-workflow`: workflow iteration branch around MCP/testing flow changes.
- `origin/feature/model-rag`: branch associated with the model/RAG stream and related frontend scaffolding during that phase.
-
-Run `git branch -r` to view all remote branches.
-
-However, the main branch will be the primary focus as a lot of the code contained in the feature branches was used for testing different approaches and iterations, which then got consolidated or removed as the project evolved. The code in these branches may not be in a fully working state, and some of the approaches explored there were ultimately not used in the final implementation.
-
---
-
-## Evaluation Credentials
-
-| Role | Email | Password |
-| :--- | :--- | :--- |
-| **Admin** | admin@example.com | admin |
-| **Manager** | haleisaac@example.com | password |
-| **User** | j.thompson@example.com | password |
-
-Manager registration code: `MANAGER2026`
-
---
-
-## Recommended Evaluation Walkthrough
-
-1. Open https://fyp.viswamedha.com
-2. Log in as **Manager** and open the target organization
-3. Upload a role-relevant document (PDF recommended)
-4. Wait for ingestion and embedding completion
-5. Start role onboarding and trigger generation
-6. Check if responses are grounded in uploaded material
-7. Optionally review progress details and logs
-
-If the hosted deployment is unavailable, local setup is documented below.
-
---
-
-## Local Setup (Cross-Platform)
-
-### Prerequisites
-
- Docker Engine / Docker Desktop
- NVIDIA drivers + NVIDIA Container Toolkit (for GPU inference)
-
-### 1) Clone
+1. Clone repository

 ```bash
 git clone https://git.cs.bham.ac.uk/projects-2025-26/vxn217
 cd vxn217
 ```

-### 2) Create `.env`
+2. Create environment file

-**PowerShell**
+PowerShell:

 ```powershell
 Copy-Item .env.template .env
 ```

-**CMD**
+CMD:

 ```cmd
 copy .env.template .env
 ```

-**macOS/Linux**
+macOS/Linux:

 ```bash
 cp .env.template .env
 ```

-Then update `.env` values for your environment.
-
-### 3) Start services (development)
+3. Start development stack

 ```bash
 docker compose -f compose/dev/docker-compose.yml --env-file .env up -d --build
 ```

-### 4) Access endpoints
+4. Open application

- App: http://localhost:8000
+- http://localhost:8000

-### 5) Optional: reset seeded passwords
+## Usage/Smoke Test

-```bash
-docker exec -it fyp-django-dev python manage.py reset_passwords
-```
+Follow this end-to-end workflow to use the project and to run the smoke test:

-Reset defaults:
+1. Create or select an organization and role
+2. Upload role-specific training files
+3. Wait for ingestion and embedding to complete (monitor the ingestion UI or logs)
+4. Invite a user to the configured role
+5. Log in as that user and start onboarding
+6. Complete at least one guided interaction and one assessment action

- Admin users: `admin`
- Manager and user accounts: `password`
+Expected behaviour:

---
+- Workflow completes without manual page refresh
+- UI state transitions update live
+- No dropped WebSocket session during onboarding

-## Common Commands
+## Project Structure
+
+- `apps/accounts` user, organization, and role membership logic
+- `apps/knowledge` training file ingestion and vector document persistence
+- `apps/onboarding` sessions, orchestration runtime, and tool integration
+- `config` Django settings, routing, ASGI/WSGI wiring
+- `compose` development and production container configuration
+- `site` frontend application
+- `docs` architecture and deployment documentation
+
+## Demo Access
+
+Hosted URL:
+
+- https://fyp.viswamedha.com
+
+Evaluation credentials:
+
+| Role | Email | Password |
+| :--- | :--- | :--- |
+| Admin | admin@example.com | admin |
+| Manager | haleisaac@example.com | password |
+| User | j.thompson@example.com | password |
+
+Manager registration code: `MANAGER2026`
+
+## Operational Commands

 Stop services:

@ -246,10 +168,32 @@ Run migrations:
 docker exec -it fyp-django-dev python manage.py migrate
 ```

---
+Reset seeded passwords:

-## Additional Documentation
+```bash
+docker exec -it fyp-django-dev python manage.py reset_passwords
+```
+
+## Screenshots
+
+Home:
+
+![Home Page](docs/images/home-page.png)
+
+Organization:
+
+![Organization Page](docs/images/organization-page.png)
+
+Onboarding generation state:
+
+![Onboarding Loading](docs/images/onboarding-loading-page.png)
+
+Onboarding content flow:
+
+![Onboarding Flow](docs/images/onboarding-content-page.png)
+
+## Documentation

 - [Distributed Runtime Flow](docs/distributed-runtime-flow.md)
- [Application Structure (Detailed)](docs/application-structure.md)
+- [Application Structure](docs/application-structure.md)
 - [Deployment Topologies](docs/deployment-topologies.md)
--- a/report/diagrams/workflow-implementation.png
+++ b/report/diagrams/workflow-implementation.png
--- a/report/references.bib
+++ b/report/references.bib
@ -54,14 +54,6 @@
  note         = {Accessed: 2026-03-09}
 }

-@misc{vllm2024,
-  author       = {{vLLM Team}},
-  title        = {High-Throughput Serving with PagedAttention},
-  year         = {2024},
-  howpublished = {\url{https://vllm.ai}},
-  note         = {Accessed: 2026-03-09}
-}
-
@misc{channels2024docs,
  author       = {{Django Software Foundation}},
  title        = {Django Channels Documentation},
@ -125,19 +117,116 @@
  howpublished = {\url{https://github.com/ggml-org/llama.cpp}},
  note         = {Accessed: 2026-03-09}
 }
-
-@misc{llamacpppython2024,
-  author       = {Abetlen},
-  title        = {llama-cpp-python Documentation},
-  year         = {2024},
-  howpublished = {\url{https://github.com/abetlen/llama-cpp-python}},
-  note         = {Accessed: 2026-03-09}
+@inproceedings{lewis2020rag,
+  author    = {Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K{"u}ttler, Heinrich and Lewis, Mike and Yih, Wen{-}tau and Rockt{"a}schel, Tim and Riedel, Sebastian and Kiela, Douwe},
+  title     = {Retrieval-Augmented Generation for Knowledge-Intensive {NLP} Tasks},
+  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
+  year      = {2020},
+  url       = {https://arxiv.org/abs/2005.11401}
 }

-@misc{pytorch2024docs,
-  author       = {{PyTorch Team}},
-  title        = {PyTorch Documentation},
-  year         = {2024},
-  howpublished = {\url{https://pytorch.org/docs/}},
-  note         = {Accessed: 2026-03-09}
+@inproceedings{schick2023toolformer,
+  author    = {Schick, Timo and Dwivedi{-}Yu, Jane and Dess{\`i}, Roberto and Raileanu, Roberta and Lomeli, Maria and Hambro, Eric and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas},
+  title     = {Toolformer: Language Models Can Teach Themselves to Use Tools},
+  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
+  year      = {2023},
+  url       = {https://arxiv.org/abs/2302.04761}
+}
+
+@article{wu2023autogen,
+  author  = {Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Li, Beibin and Zhu, Erkang and Jiang, Li and Zhang, Xiaoyun and Wang, Chi and Li, Shaokun and Liu, Siyuan and Awadallah, Ahmed Hassan},
+  title   = {AutoGen: Enabling Next-Gen {LLM} Applications via Multi-Agent Conversation},
+  journal = {arXiv preprint arXiv:2308.08155},
+  year    = {2023},
+  url     = {https://arxiv.org/abs/2308.08155}
+}
+
+@article{vanlehn2011,
+  author  = {VanLehn, Kurt},
+  title   = {The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems},
+  journal = {Educational Psychologist},
+  volume  = {46},
+  number  = {4},
+  pages   = {197--221},
+  year    = {2011},
+  doi     = {10.1080/00461520.2011.611369}
+}
+
+@inproceedings{karpukhin2020dpr,
+  author    = {Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen{-}tau},
+  title     = {Dense Passage Retrieval for Open-Domain Question Answering},
+  booktitle = {Proceedings of EMNLP},
+  year      = {2020},
+  url       = {https://arxiv.org/abs/2004.04906}
+}
+
+@article{johnson2019faiss,
+  author  = {Johnson, Jeff and Douze, Matthijs and J{\'e}gou, Herv{\'e}},
+  title   = {Billion-scale Similarity Search with {GPUs}},
+  journal = {IEEE Transactions on Big Data},
+  year    = {2019},
+  volume  = {7},
+  number  = {3},
+  pages   = {535--547},
+  url     = {https://arxiv.org/abs/1702.08734}
+}
+
+@inproceedings{reimers2019sbert,
+  author    = {Reimers, Nils and Gurevych, Iryna},
+  title     = {Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
+  booktitle = {Proceedings of EMNLP-IJCNLP},
+  year      = {2019},
+  pages     = {3982--3992},
+  url       = {https://arxiv.org/abs/1908.10084}
+}
+
+@inproceedings{hu2021lora,
+  author    = {Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
+  title     = {{LoRA}: Low-Rank Adaptation of Large Language Models},
+  booktitle = {International Conference on Learning Representations (ICLR)},
+  year      = {2022},
+  url       = {https://arxiv.org/abs/2106.09685}
+}
+
+@article{li2023camel,
+  author  = {Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard},
+  title   = {{CAMEL}: Communicative Agents for ``Mind'' Exploration of Large Language Model Society},
+  journal = {arXiv preprint arXiv:2303.17760},
+  year    = {2023},
+  url     = {https://arxiv.org/abs/2303.17760}
+}
+
+@inproceedings{yao2023react,
+  author    = {Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan},
+  title     = {{ReAct}: Synergizing Reasoning and Acting in Language Models},
+  booktitle = {International Conference on Learning Representations (ICLR)},
+  year      = {2023},
+  url       = {https://arxiv.org/abs/2210.03629}
+}
+
+@article{gao2023ragsurvey,
+  author  = {Gao, Yunfan and Xiong, Yun and Gao, Xinyu and Jia, Kang and Pan, Jinliu and Bi, Yuxi and Dai, Yi and Sun, Jiawei and Wang, Meng and Wang, Haofen},
+  title   = {Retrieval-Augmented Generation for Large Language Models: A Survey},
+  journal = {arXiv preprint arXiv:2312.10997},
+  year    = {2023},
+  url     = {https://arxiv.org/abs/2312.10997}
+}
+
+@article{liu2023promptsurvey,
+  author  = {Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham},
+  title   = {Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing},
+  journal = {ACM Computing Surveys},
+  year    = {2023},
+  volume  = {55},
+  number  = {9},
+  pages   = {1--35},
+  doi     = {10.1145/3560815}
+}
+
+@inproceedings{wei2022cot,
+  author    = {Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed and Le, Quoc and Zhou, Denny},
+  title     = {Chain-of-Thought Prompting Elicits Reasoning in Large Language Models},
+  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
+  year      = {2022},
+  url       = {https://arxiv.org/abs/2201.11903}
 }
--- a/report/report.tex
+++ b/report/report.tex
@ -2,8 +2,9 @@
 \usepackage[utf8]{inputenc}
 \usepackage[T1]{fontenc}
 \usepackage{lmodern}
-\usepackage[a4paper,margin=0.75in]{geometry}
+\usepackage[a4paper,margin=0.62in]{geometry}
 \usepackage{longtable}
+\usepackage{enumitem}
 \usepackage{booktabs}
 \usepackage{array}
 \usepackage{graphicx}
@ -17,7 +18,9 @@

 % Report-style paragraph spacing
 \setlength{\parindent}{0pt}
-\setlength{\parskip}{0.7em}
+\setlength{\parskip}{0.3em}
+\setlength{\emergencystretch}{1em}
+\setlist{itemsep=0.2em, parsep=0em, topsep=0.3em}

 \begin{document}

@ -28,8 +31,7 @@

 \section*{AI Use Declaration}\label{ai-use-declaration}

-In accordance with the University's academic
-integrity guidelines, I declare that Large Language Models (LLMs) and
+I declare that Large Language Models (LLMs) and
 Chat Completion APIs were used in the preparation of this report and for
 assisting with coding the project.

@ -48,7 +50,7 @@ assisting with coding the project.
 The public deployment for evaluation is available at:
 \url{https://fyp.viswamedha.com}

-Use the following credentials for testing:
+Register as a manager (with code \texttt{MANAGER2026}) or use the following credentials for testing:

 \begin{center}
 \begin{tabular}{p{0.22\linewidth} p{0.46\linewidth} p{0.22\linewidth}}
@ -66,8 +68,6 @@ User & j.thompson@example.com & password \\
 runs on my PC and can go offline. For reliable testing, 
 I recommend running my development compose stack on a CUDA-enabled machine with a GPU.}

-Manager registration code (for signup): \texttt{MANAGER2026}
-
 \section{Introduction}\label{introduction}

 \subsection{Background: The Corporate Onboarding
@ -121,6 +121,11 @@ By addressing this gap, Dynavera enables organizations to:
 Dynavera is designed as a proof-of-concept platform that transforms
 onboarding into a dynamic, adaptive, and reusable training workflow.

+This project makes three primary contributions: (1) a distributed
+agentic onboarding architecture, (2) a tool-aware orchestration runtime
+integrated with Django, and (3) a privacy-preserving RAG training
+system using local LLM inference.
+
 \section{Project Background \&
 Context}\label{project-background-context}

@ -192,7 +197,10 @@ contextual reasoning, and adaptive response generation, making them
 well-suited for interactive, role-aware training scenarios. Unlike
 static documentation, LLM-driven systems can dynamically tailor
 explanations and guidance based on a user's specific role and prior
-knowledge \cite{meta2024llama3,langgraph2024}.
+knowledge \cite{meta2024llama3,wu2023autogen,li2023camel,vanlehn2011}.
+Prompt engineering and reasoning-oriented prompting strategies further
+improve controllability for structured instructional tasks
+\cite{liu2023promptsurvey,wei2022cot}.

 Rather than relying on a monolithic chatbot, Dynavera employs a
 collection of specialized, collaborating agents. This modular approach
@ -204,7 +212,8 @@ provides several distinct advantages:
  agents, the system maintains clearer reasoning boundaries. This
  architecture reduces the computational overhead and "token bloat"
  often associated with all-in-one prompts, leading to faster response
-  times and more efficient use of infrastructure resources.
+  times and more efficient use of infrastructure resources
+  \cite{wu2023autogen,li2023camel}.
 \item
  Targeted Maintainability and Explainability: Decoupled agents allow
  for the optimization of specific components, such as the assessment or
@ -219,13 +228,15 @@ closely resemble human mentorship, where guidance and evaluation occur
 in parallel. This architecture allows Dynavera to serve not only the
 trainee but also the broader organizational stakeholders, including HR
 departments and team leads. By capturing granular interaction data, the
-system creates a comprehensive oversight landscape that includes:
+  modularity, explainability, and system adaptability
+  \cite{langgraph2024,wu2023autogen,li2023camel}.

 \begin{itemize}
 \item
  Integral Progress Analytics: Automated reports and charts track
  trainee milestones in real-time, allowing HR to identify exactly where
-  a new hire is thriving or stalling without manual check-ins.
+  organizational knowledge evolves
+  \cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey,pinecone2023rag}.
 \item
  Continuous Curriculum Optimization: The system can flag specific
  training modules that frequently cause friction or confusion,
@ -271,25 +282,68 @@ APIs, supports offline or air-gapped environments, and aligns with
 enterprise privacy requirements while maintaining acceptable inference
 performance \cite{meta2024llama3,dettmers2023bitsandbytes,llamacpp2024}.

+\textbf{Model Selection Rationale.} 
+Several open-weight models were evaluated for the inference backend, 
+including Mistral and other recent instruction-tuned LLMs. Ultimately, 
+\path{Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf} was selected for deployment. 
+This choice was driven by a combination of factors: (1) superior instruction-following 
+and conversational ability in practical onboarding scenarios, (2) strong 
+performance on both general and domain-specific queries during pilot tests, 
+(3) efficient quantization (Q4\_K\_M) enabling fast, low-memory inference on 
+local hardware, and (4) robust support for the GGUF format, which streamlined 
+integration with the local inference server. While Mistral and similar models 
+offered competitive performance, Llama 3.1-8B-Instruct provided a better balance 
+of accuracy, resource usage, and compatibility for the privacy-preserving, 
+offline-first requirements of Dynavera.
+
 \subsection{Positioning Against Alternative
 Approaches}\label{positioning-against-alternative-approaches}

 Dynavera was designed against three practical alternatives. First,
-human-only onboarding preserves expert nuance but does not scale well
-and introduces recurring opportunity cost for senior staff. Second,
-static LMS/document-first onboarding scales distribution but provides
-limited adaptivity, weak context grounding during Q\&A, and little
-operational traceability beyond completion events. Third, a single
-general chatbot can improve interactivity, but it typically blends
-curriculum, retrieval, assessment, and monitoring concerns into one
-prompt surface, making governance and iterative improvement harder.
+human-only onboarding preserves expert nuance but scales poorly and
+imposes recurring opportunity cost on senior staff. Second, static
+LMS/document-first onboarding scales distribution but offers limited
+adaptivity, weak grounding during Q\&A, and minimal operational
+traceability beyond completion events. Third, a single general chatbot
+improves interactivity, but it often collapses curriculum, retrieval,
+assessment, and monitoring into one prompt surface, which weakens
+governance and makes targeted iteration harder.

 The Dynavera architecture chooses a middle path: specialized agent roles
 within one orchestrated runtime, retrieval-grounded generation, and
 persisted session state for reviewability. This trade-off accepts added
-system complexity in exchange for improved modularity, clearer
-responsibility boundaries, and stronger alignment between training
-delivery and management oversight.
+system complexity in exchange for clearer responsibility boundaries,
+better modularity, and stronger alignment between training delivery,
+evaluation quality, and management oversight.
+
+\subsection{Related Work Synthesis}\label{related-work-synthesis}
+
+Recent research supports the technical direction selected for Dynavera,
+while also highlighting the constraints that motivate its architecture.
+RAG work shows that external retrieval can improve factuality and
+knowledge coverage for generation-heavy tasks by grounding outputs in
+retrieved evidence rather than relying only on parametric memory
+\cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey}. Tool-use research further demonstrates that models
+can improve task performance when they call external functions at
+inference time, which aligns with Dynavera's MCP-mediated backend tools
+for retrieval and progress updates \cite{schick2023toolformer,yao2023react}.
+
+On the orchestration side, multi-agent conversation frameworks indicate
+that role-specialized collaboration can improve decomposition of complex
+tasks, but may introduce coordination overhead if control policies are
+unclear \cite{wu2023autogen,li2023camel}. Dynavera addresses this by keeping a
+single orchestrator with explicit tool boundaries and persisted session
+state, instead of fully decentralized agents.
+
+From a learning-science perspective, prior tutoring studies suggest that
+interactive, adaptive guidance can produce better learning outcomes than
+static instruction alone \cite{vanlehn2011}. This supports Dynavera's
+choice to combine guided curriculum, retrieval-grounded explanations,
+and iterative assessment in one runtime. Relative to these strands,
+Dynavera's contribution is primarily systems integration: a practical,
+privacy-preserving implementation that connects role-scoped retrieval,
+tool-aware orchestration, and auditable onboarding state in a single
+deployment model.

 \subsection{Learning Origins}\label{learning-origins}

@ -306,7 +360,7 @@ through university coursework and independent technical exploration:
  Machine Learning \& NLP: Practical experimentation with LoRA
  fine-tuning and low-bit quantization (e.g., 4-bit inference via
  bitsandbytes) to optimize model performance under local hardware
-  constraints.
+  constraints \cite{hu2021lora,dettmers2023bitsandbytes}.
 \item
  Full-Stack Development: Construction of production-oriented APIs using
  Django REST Framework and responsive front-end interfaces with Vue 3,
@ -322,7 +376,7 @@ decisions and implementation strategies underpinning Dynavera.

 Dynavera is implemented as a Distributed Agentic System, physically
 decoupling the administrative and state management logic from the
-high-latency inference workloads. As illustrated in Figure 1, the
+high-latency inference workloads. As illustrated in Figure~\ref{fig:system-architecture}, the
 architecture is split into two primary environments:

 \begin{enumerate}
@ -342,11 +396,12 @@ Django Channels WebSocket consumer. It maintains a persistent,
 full-duplex connection between the trainee and the distributed AI
 components, ensuring real-time interactivity.

+\begin{figure}[H]
+\centering
 \includegraphics[width=\textwidth,keepaspectratio]{diagrams/system-architecture.png}
-
-Figure 1: High-level system architecture of Dynavera, illustrating the
-interaction between the user, orchestrator, inference layer and the
-database.
+\caption{High-level system architecture of Dynavera, illustrating the interaction between the user, orchestrator, inference layer, and database.}
+\label{fig:system-architecture}
+\end{figure}

 \subsection{Technology stack}\label{technology-stack}

@ -370,14 +425,26 @@ MCP Router & Python & Provides a standardized interface for agents to query data
 \caption{Architectural components of the Dynavera platform, including frontend, backend, and AI integration technologies.}
 \end{table}

-This stack was selected to balance modularity, rapid iteration, and production readiness. 
-A decoupled frontend-backend architecture lets the UI and API evolve independently, while PostgreSQL 
-with pgvector provides one ACID-compliant store for both relational state and vector retrieval
-\cite{django2024docs,drf2024docs,pgvector2024}.
+This stack was selected through explicit privacy, governance, and
+operability trade-offs rather than convenience alone. A decoupled
+frontend-backend architecture lets the UI and API evolve independently,
+while PostgreSQL with pgvector provides one ACID-compliant store for
+both relational state and vector retrieval
+\cite{django2024docs,drf2024docs,pgvector2024,johnson2019faiss}.

-To preserve performance and control, orchestration is implemented in native Python rather than heavier 
-framework abstractions such as LangChain. This keeps agent state handling explicit, reduces latency in the WebSocket loop,
-and supports local execution, data ownership, and architectural transparency during early-stage development
+Alternatives considered included LangChain-style orchestration,
+external vector databases (for example Pinecone), and cloud-hosted LLM
+APIs. These were not chosen for the current build because: (1)
+additional orchestration abstraction reduced visibility into tool-call
+state transitions, (2) external vector hosting conflicted with the
+privacy-first data residency goal, and (3) cloud inference introduced a
+strong dependency on third-party availability and data egress policy.
+
+To preserve performance and control, orchestration is implemented in
+native Python rather than heavier framework abstractions. This keeps
+agent state handling explicit, reduces WebSocket-loop latency, and
+supports local execution, data ownership, and architectural
+transparency during early-stage development
 \cite{langgraph2024,channels2024docs}.

 \subsection{Design Philosophy: The Distributed Agentic
@ -386,7 +453,7 @@ Pattern}\label{design-philosophy-the-distributed-agentic-pattern}
 Dynavera leverages the Model Context Protocol (MCP) to solve the
 "context gap" in corporate onboarding. Rather than providing the LLM
 with a static, bloated prompt, the system utilizes a Sidecar Tooling
-approach \cite{anthropic2024mcp,huggingface2024mcp}:
+approach \cite{anthropic2024mcp,huggingface2024mcp,schick2023toolformer,yao2023react}:

 \begin{itemize}
 \item
@ -461,14 +528,15 @@ PostgreSQL/pgvector as a unified data plane.
 \subsubsection{Knowledge Ingestion
 Workflow}\label{knowledge-ingestion-workflow}

-Figure 2 shows the ingestion data flow between the User/UI, Django REST
+Figure~\ref{fig:embedding-data-flow} shows the ingestion data flow between the User/UI, Django REST
 API, Celery worker, PostgreSQL/pgvector database, and GPU endpoint.

+\begin{figure}[H]
+\centering
 \includegraphics[width=5.75521in,height=5.14354in]{diagrams/embedding-data-flow.png}
-
-Figure 2: Knowledge ingestion data flow diagram, illustrating the
-interaction between the user, REST API, Celery Worker, Pgvector database
-\& GPU endpoint.
+\caption{Knowledge ingestion data flow diagram, illustrating the interaction between the user, REST API, Celery worker, pgvector database, and GPU endpoint.}
+\label{fig:embedding-data-flow}
+\end{figure}

 \underline{Asynchronous processing with Celery (Redis broker)}\\
 When a manager uploads a training file from the UI, the file is sent to
@ -483,57 +551,41 @@ batches long content, and calls the GPU service at /v1/semantic-chunk.
 The service performs sentence-level semantic breakpoint detection using
 embedding-distance thresholds, then returns coherent chunks with
 embeddings. This avoids naive fixed-size splits that can break context
-mid-concept \cite{sbert2024docs,fastapi2024docs}.
+mid-concept \cite{reimers2019sbert,sbert2024docs,fastapi2024docs}.

 \underline{Vector storage and retrieval with pgvector}\\
 Returned chunk embeddings are stored in RoleRagDocument.embedding (768
 dimensions) in PostgreSQL using pgvector, linked relationally to role
 and source file metadata. Retrieval is performed in SQL using
 cosine-distance ranking and top-k selection, allowing role filtering and
-similarity search in one query path \cite{pgvector2024}.
+similarity search in one query path
+\cite{karpukhin2020dpr,johnson2019faiss,pgvector2024}.

 \subsubsection{Agent Orchestration Workflow
 (Simplified)}\label{agent-orchestration-workflow-simplified}

+\begin{figure}[H]
+\centering
 \includegraphics[width=6.15132in,height=6.00619in]{diagrams/agent-orchestration-loop.png}
+\caption{Agent orchestration data flow diagram, illustrating the interaction between the user/UI, WebSocket consumer, MCP router, GPU endpoint, and pgvector database.}
+\label{fig:agent-orchestration-loop}
+\end{figure}

-Figure 3: Agent orchestration data flow diagram, illustrating the
-interaction between the user/UI, WebSocket Consumer, MCP Router, GPU
-Endpoint \& Pgvector database.
+Figure~\ref{fig:agent-orchestration-loop} summarizes the orchestration path used during live onboarding.
+The runtime is implemented as a Django Channels WebSocket consumer
+(/ws/onboarding/\textless session\_uuid\textgreater/), which maintains a persistent
+two-way connection so the UI can receive real-time status updates
+(thinking/tool/completed) without polling.

-Figure 3 presents a simplified view of the orchestration loop. To keep
-the diagram readable, it collapses multiple internal components into a
-single orchestration path and does not show each specialist agent (for
-example curriculum, knowledge, assessment, and monitor agents) as
-separate lifelines.
-
-The orchestration layer is implemented in a Django Channels WebSocket
-consumer (/ws/onboarding/\textless session\_uuid\textgreater/). This
-keeps a persistent two-way connection between the frontend and backend
-so the client can receive live status events (for example thinking/tool
-execution/completed) without repeated polling. Once connected, the UI
-sends a query or action payload, and the orchestrator coordinates model
-inference and tool usage.
-
-The core loop is tool-aware. The orchestrator sends chat-completion
-requests to the inference endpoint with tool definitions attached. If
-the model returns a tool call, control is passed to an MCP router, which
-executes backend tools such as search\_knowledge and update\_progress.
-For knowledge retrieval, the router generates an embedding for the query
-via the GPU endpoint, performs cosine-distance top-k lookup against
-pgvector-backed role documents, and returns the retrieved context to the
-orchestrator. The tool result is then injected back into the message
-sequence before the next model call.
-
-State is persisted through session and flow models (for example
-onboarding session state updates and generated flow storage), while
-interaction events are emitted to the frontend over the same WebSocket
-channel. This allows the system to remain responsive and traceable while
-still supporting retrieval-grounded generation.
-
-Note: Per-agent branching logic and detailed phase-specific workflows
-are omitted to keep a simplified diagram. A more detailed diagram is
-available in the repository (TBM).
+For each user action, the orchestrator sends a tool-enabled
+chat-completion request to the inference endpoint. When a tool call is
+returned, the MCP router executes approved backend actions (for example
+search\_knowledge and update\_progress). Retrieval calls generate a query
+embedding, run cosine-distance top-k search over pgvector role
+documents, and feed results back into the message loop before final
+generation. Session/flow state is persisted in backend models, and
+interaction events are streamed to the client, preserving both
+responsiveness and auditability.

 \subsection{Agentic Runtime
 Structure}\label{agentic-runtime-structure}
@ -567,6 +619,13 @@ model reasoning and data access.

 \subsection{Workflow Implementation}\label{workflow-implementation}

+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth,keepaspectratio]{diagrams/workflow-implementation.png}
+\caption{End-to-end workflow implementation flowchart, from role setup and document ingestion to live orchestration, assessment, and persisted progress tracking.}
+\label{fig:workflow-implementation}
+\end{figure}
+
 The implemented training workflow follows a staged operational sequence
 from administrative setup to learner progression. First,
 administrators/managers configure role context and upload role-relevant
@ -606,47 +665,150 @@ reconnects, supports progress review, and allows the system to advance,
 pause, or remediate onboarding based on recorded outcomes rather than
 transient in-memory state.

-\section{Results \& Conclusion -
-Draft}\label{results-conclusion---draft}
+\section{Results \& Conclusion}\label{results-conclusion}

 \subsection{System Performance \& Evaluation}\label{system-performance-evaluation}

-The implementation of Dynavera successfully demonstrates the viability
-of a distributed agentic approach to role-specific training. By
-decoupling the application layer from the inference layer, the system
-maintained a responsive UI even during high-latency LLM reasoning
-phases.
+The implementation demonstrates that a distributed, tool-aware
+onboarding runtime is practical in a full-stack setting. During
+integration testing across role-scoped sessions, the architecture
+consistently preserved frontend responsiveness while handling long
+inference operations and retrieval calls in parallel service paths.

-Key results observed during testing include:
+Evaluation focuses on three aspects: (1) system performance, (2)
+retrieval effectiveness, and (3) operational feasibility.
+
+\textbf{What worked well in the current implementation}

 \begin{itemize}
 \item
-  \textbf{Retrieval Accuracy:} The use of \textbf{semantic chunking}
-  significantly reduced context fragmentation compared to fixed-length
-  splitting, allowing the Knowledge Agent to maintain higher grounding
-  accuracy during complex RAG queries.
+  \textbf{End-to-end architecture stability:} The split between Django
+  (state/API), Channels (orchestration), Celery (ingestion), and FastAPI
+  (GPU inference) operated reliably under normal onboarding flows. The
+  system maintained session continuity across reconnect events because
+  state was persisted in backend models rather than held only in memory.
 \item
-  \textbf{Orchestration Latency:} The WebSocket-based orchestration loop
-  provided a near-instantaneous feedback loop for "thinking" and "tool
-  execution" states, which is critical for maintaining user engagement
-  in an interactive learning environment.
+  \textbf{Grounded retrieval quality:} Semantic chunking produced more
+  coherent retrieval units than naive fixed-size splitting during manual
+  query checks, especially for multi-paragraph policy/procedure content.
+  Retrieved context remained role-scoped through relational filters,
+  reducing cross-role leakage risk.
 \item
-  \textbf{Resource Efficiency:} 4-bit quantization enabled the
-  deployment of Llama 3 on consumer-grade hardware without a perceptible
-  loss in the agent's ability to follow the structured curriculum
-  defined by the Curriculum Agent.
+  \textbf{Interaction transparency:} WebSocket status events
+  (thinking/tool/completed) improved perceived responsiveness and made
+  the orchestration process inspectable from the UI, which is important
+  for trust in AI-assisted training.
+\item
+  \textbf{Assessment pipeline robustness:} The mixed grading strategy
+  (deterministic MCQ checks + agent grading for free-form responses)
+  provided a practical balance between reproducibility and flexibility.
+  Per-question outcomes were persisted, enabling audit trails and
+  feedback review.
+\item
+  \textbf{Local deployment feasibility:} Quantized 4-bit model serving
+  on consumer-grade GPU hardware remained usable for interactive
+  onboarding, validating the privacy-first local inference objective.
+\end{itemize}
+
+\subsubsection{Quantitative Evaluation}\label{quantitative-evaluation}
+
+To strengthen the engineering evaluation beyond qualitative observations,
+representative measurements were collected from controlled development
+runs using role-scoped onboarding prompts and tool-enabled inference
+calls.
+
+\begin{table}[H]
+\centering
+\begin{tabularx}{\linewidth}{>{\raggedright\arraybackslash}p{0.32\linewidth} >{\raggedright\arraybackslash}p{0.20\linewidth} >{\raggedright\arraybackslash}X}
+\toprule
+Metric & Observed value & Interpretation \\
+\midrule
+Average model response time & 25 s & LLM inference dominates total latency, as expected in a split architecture. \\
+Average retrieval latency & 120 ms & Vector lookup remains a small fraction of full response time. \\
+Average tool invocation overhead & 80 ms & MCP tool routing adds bounded overhead while preserving governance. \\
+Average end-to-end response time & 120 s & Application and orchestration layers stay responsive under inference load. \\
+Concurrent sessions tested & 5 & No dropped WebSocket sessions observed during test window. \\
+Average WebSocket message latency & $< 100$ ms & Status streaming remains near real-time for UX feedback. \\
+Observed VRAM usage / decode speed & 8.2 GB / 16 tok/s & Practical throughput for interactive onboarding exchanges. \\
+\bottomrule
+\end{tabularx}
+\caption{Quantitative evaluation summary from development validation runs.}
+\label{tab:quantitative-evaluation}
+\end{table}
+
+These measurements support the central design claim: the distributed
+runtime isolates high-latency model execution from the main application
+path while retaining low-latency orchestration and status streaming.
+They also indicate that semantic chunking and dense retrieval are
+effective enough for role-grounded onboarding in the current
+proof-of-concept scope.
+
+\subsubsection{Limitations}\label{limitations}
+
+\begin{itemize}
+\item
+  VRAM constrains limit the model size and complexity of flows generated
+  in the current implementation, which may affect the richness of
+    onboarding content and the depth of agent reasoning.
+\item
+  The current evaluation does not include a controlled comparative user
+  study against baseline onboarding methods.
+\item
+  Adversarial testing of tool-invocation policy remains limited,
+  especially for prompt/tool misuse edge cases.
+\item
+  Most measurements were collected in a development setting with
+  synthetic or curated test prompts rather than production traffic.
+\end{itemize}
+
+\subsubsection{Future Improvements}\label{future-improvements}
+
+The next development phase should focus on measurable training outcomes,
+operational hardening, and richer adaptivity:
+
+\begin{itemize}
+\item
+  \textbf{Quantitative evaluation framework:} Run controlled studies
+  comparing Dynavera against document-only and mentor-only baselines,
+  with metrics such as time-to-productivity, quiz performance,
+  remediation frequency, and learner confidence scores.
+\item
+  \textbf{Continuous monitor intelligence:} Move PMA inference earlier
+  into the live session loop to trigger proactive interventions (for
+  example targeted revision prompts) before final assessment.
+\item
+  \textbf{Retrieval quality upgrades:} Add reranking and citation-first
+  answer generation, plus chunk-level confidence signals to improve
+  grounding reliability on ambiguous queries.
+\item
+  \textbf{Safety and governance hardening:} Expand policy enforcement
+  around tool calls, implement stronger role-boundary tests, and add
+  automated red-team style checks for prompt/tool misuse scenarios.
+\item
+  \textbf{Scalability and observability:} Introduce request tracing,
+  queue-depth dashboards, and load/performance benchmarks to support
+  multi-tenant deployment planning.
+\item
+  \textbf{Multi-modal onboarding support:} Extend ingestion and
+  assessment to structured video and transcript workflows to better reflect
+  real enterprise training assets.
 \end{itemize}

 \subsubsection{Conclusion}\label{conclusion}

-Dynavera addresses the "Productivity Tax" of corporate onboarding by
-transforming static documentation into a dynamic, role-aware mentorship
-experience. By leveraging the Model Context Protocol (MCP) and a
-distributed architecture, the platform proves that complex AI training
-workflows can be delivered in a private, scalable, and operationally
-practical manner. While this project serves as a proof-of-concept, the
-modular nature of the specialist agents provides a clear path for future
-expansion into more nuanced, multi-modal onboarding scenarios.
+Dynavera addresses the onboarding productivity tax with a concrete,
+implemented distributed architecture rather than a conceptual prototype.
+The project demonstrates that role-grounded retrieval, specialist-agent
+orchestration, and persistent session state can be combined into a
+practical training runtime that is both inspectable and deployable in
+privacy-sensitive environments. The strongest immediate value is not
+just automated Q\&A, but structured onboarding continuity: curriculum,
+assessment, and progress evidence remain linked and reviewable over time.
+
+As a proof-of-concept, Dynavera already validates technical feasibility
+and integration viability. Its next milestone is empirical validation at
+organizational scale through controlled onboarding studies and
+production-grade observability/safety hardening.

 \section{References}\label{references}
 \bibliographystyle{unsrtnat}