# Dynavera Benchmark Results **Date:** 2026-03-24 13:28:54 **Inference endpoint:** `http://fyp-inference-dev:8001` **Repetitions per benchmark:** 5 ## 1. GPU Server Health | Field | Value | |---|---| | Status | OK | | LLM Ready | True | | Embed Ready | True | | Health check RTT | 51.0 ms | ## 2. Embedding Latency | Query type | Chars | Mean (ms) | Median (ms) | P95 (ms) | Min (ms) | Max (ms) | |---|---|---|---|---|---|---| | short | 19 | 95.5 | 25.1 | 378.6 | 23.0 | 378.6 | | medium | 172 | 25.7 | 24.7 | 29.4 | 24.3 | 29.4 | | long | 428 | 27.5 | 26.7 | 32.2 | 24.8 | 32.2 | ## 3. Semantic Chunking Latency | Input size | Chars | Chunks produced | Latency (ms) | |---|---|---|---| | small (~200 c) | 200 | 1 | 28.4 | | medium (~2k c) | 1810 | 1 | 77.0 | | large (~8k c) | 7740 | 1 | 206.3 | ## 4. LLM Inference Latency | Prompt type | Elapsed (s) | Prompt tokens | Completion tokens | Tok/s | |---|---|---|---|---| | short_qa | 1.5 | 55 | 69 | 46.0 | | progress_summary | 1.36 | 74 | 71 | 52.3 | | curriculum_gen | 1.67 | 79 | 82 | 49.0 | | assessment_gen | 5.03 | 83 | 235 | 46.7 | | knowledge_explanation | 9.31 | 83 | 496 | 53.3 | > **Note on end-to-end session time:** A full onboarding session invokes multiple sequential > inference calls (curriculum generation → knowledge explanation × N modules → assessment generation → progress summary). > Total wall-clock time accumulates across all turns plus retrieval and tool-call overhead. ## 5. Database Statistics | Entity | Count | |---|---| | Organizations | 3 | | Roles | 10 | | Users | 12 | | Training Files (total) | 0 | | Training Files (embedded) | 0 | | Knowledge Chunks (with embeddings) | 0 | | Onboarding Sessions | 4 | ## Raw JSON ```json { "health": { "status": "OK", "llm_ready": true, "embed_ready": true, "latency_ms": 51.0 }, "embeddings": { "short": { "query_chars": 19, "mean_ms": 95.5, "median_ms": 25.1, "p95_ms": 378.6, "min_ms": 23.0, "max_ms": 378.6 }, "medium": { "query_chars": 172, "mean_ms": 25.7, "median_ms": 24.7, "p95_ms": 29.4, "min_ms": 24.3, "max_ms": 29.4 }, "long": { "query_chars": 428, "mean_ms": 27.5, "median_ms": 26.7, "p95_ms": 32.2, "min_ms": 24.8, "max_ms": 32.2 } }, "chunking": { "small (~200 c)": { "chars": 200, "chunks_produced": 1, "latency_ms": 28.4 }, "medium (~2k c)": { "chars": 1810, "chunks_produced": 1, "latency_ms": 77.0 }, "large (~8k c)": { "chars": 7740, "chunks_produced": 1, "latency_ms": 206.3 } }, "llm": { "short_qa": { "elapsed_s": 1.5, "prompt_tokens": 55, "completion_tokens": 69, "tokens_per_sec": 46.0, "response_preview": "A Kubernetes pod is a logical host for one or more containers, providing a shared network namespace," }, "progress_summary": { "elapsed_s": 1.36, "prompt_tokens": 74, "completion_tokens": 71, "tokens_per_sec": 52.3, "response_preview": "The trainee has made significant progress in their onboarding journey, demonstrating a strong founda" }, "curriculum_gen": { "elapsed_s": 1.67, "prompt_tokens": 79, "completion_tokens": 82, "tokens_per_sec": 49.0, "response_preview": "[ \"Module 1: Introduction to Backend Services and Infrastructure\", \"Module 2: Designing and Impl" }, "assessment_gen": { "elapsed_s": 5.03, "prompt_tokens": 83, "completion_tokens": 235, "tokens_per_sec": 46.7, "response_preview": "```json [ { \"question\": \"What is the primary purpose of a Continuous Integration (CI) pipeline" }, "knowledge_explanation": { "elapsed_s": 9.31, "prompt_tokens": 83, "completion_tokens": 496, "tokens_per_sec": 53.3, "response_preview": "**Git Branching Strategy Best Practices** As a new engineer, understanding a Git branching strategy" } }, "database": { "organizations": 3, "roles": 10, "users": 12, "training_files_total": 0, "training_files_embedded": 0, "knowledge_chunks_with_embeddings": 0, "onboarding_sessions": 4 }, "retrieval": { "skipped": "No embedded chunks found in database." } } ```