docs: FE cache hit/miss + percentage calculation handoff

Calculation-only courier doc for ../dispatch-web: Usage field semantics, hitRate = cacheReadTokens/inputTokens, where to source it (usage/done events + GET /conversations/:id/metrics, per-step/turn/cumulative), live accumulate + done.usage reconcile, replay seeding, and the cacheWriteTokens-absent caveat. No backend change required; UI design left to the FE.
author: Adam Malczewski <[email protected]> 2026-06-10 09:03:43 +0900
committer: Adam Malczewski <[email protected]> 2026-06-10 09:03:43 +0900
commit: 52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723 (patch)
tree: 538ebecb3208e9bf5d081c3188557d893122912f
parent: 2e583ea78a7efa6e9a31b7c5b4dfcf792007e418 (diff)
download: dispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.tar.gz
dispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.zip
1 files changed, 100 insertions, 0 deletions
diff --git a/frontend-cache-rate-handoff.md b/frontend-cache-rate-handoff.md
new file mode 100644
index 0000000..158e199
--- /dev/null
+++ b/frontend-cache-rate-handoff.md
@@ -0,0 +1,100 @@
+# FE handoff — cache hit/miss + percentage (calculation guide)
+
+> **Courier doc** (backend → `../dispatch-web`, via the user). Per ORCHESTRATOR §7
+> the backend does not write the FE repo. This describes ONLY how to compute cache
+> hit/miss + percentages from data the backend ALREADY exposes — **no UI design here**
+> (the look is specified separately) and **no backend change is required**.
+> Contracts: `@dispatch/wire` + `@dispatch/transport-contract` `0.4.0`.
+
+## TL;DR
+The cache hit rate is `cacheReadTokens / inputTokens`. Everything you need is already
+on the `usage` + `done` live events and in `GET /conversations/:id/metrics`. There is
+**no separate cache endpoint or boolean** — it's derived from token counts, exactly as
+the old `CacheRatePanel` did.
+
+## The data shape (`Usage`, from `@dispatch/wire`)
+```ts
+interface Usage {
+  inputTokens: number;       // TOTAL prompt tokens this step/turn, INCLUDING cached ones
+  outputTokens: number;
+  cacheReadTokens?: number;  // input tokens served FROM cache (the "hit" count). Optional.
+  cacheWriteTokens?: number; // cache-creation count. Optional; usually ABSENT (see caveats).
+}
+```
+Field semantics that matter for the math:
+- `inputTokens` is the **whole** prompt, so `cacheReadTokens ≤ inputTokens` and the rate is in `[0,1]`.
+- The cache fields are **optional** — treat `undefined` as `0` in all arithmetic.
+
+## Formulas
+```ts
+const read  = u.cacheReadTokens  ?? 0;
+const write = u.cacheWriteTokens  ?? 0;
+
+const isHit   = read > 0;                               // hit vs miss
+const hitRate = u.inputTokens > 0 ? read / u.inputTokens : 0;   // 0..1  (guard /0)
+const hitPct  = Math.round(hitRate * 100);
+const fresh   = Math.max(0, u.inputTokens - read - write);      // uncached input tokens
+```
+(These are byte-identical to the old `CacheRatePanel.svelte` formulas: hit rate =
+`cacheReadTokens/inputTokens` clamped; uncached = `max(0, input − read − write)`.)
+
+## Where to get `Usage` — three granularities, two channels
+
+| Scope | LIVE (WS `chat.delta` / NDJSON) | REPLAY (`GET /conversations/:id/metrics`) |
+|---|---|---|
+| **Per step** | `usage` event (`type:"usage"`, carries `stepId`, `usage`) | `TurnMetrics.steps[].usage` (each has `stepId`) |
+| **Per turn** (authoritative aggregate) | `done` event (`type:"done"`, carries `usage`, `durationMs`) | `TurnMetrics.usage` |
+| **Cumulative** (conversation) | Σ of each turn's `done.usage` | Σ of `turns[].usage` |
+
+Notes:
+- The **per-turn aggregate IS the sum of its steps** (the runtime aggregates). So when
+  summing a cumulative figure, pick ONE granularity — sum `done.usage`/`TurnMetrics.usage`
+  per turn, **or** sum all steps — never both (double-count).
+- `done.usage` is the authoritative per-turn total. (`turn-sealed` does NOT carry usage in
+  this backend — it's just `{conversationId, turnId}`; the numbers ride the immediately
+  preceding `done` event.)
+- `step-complete` is timing only (ttft/decode) — no tokens; ignore it for cache.
+
+## Live accumulation + reconcile (recommended pattern)
+1. **In-progress turn (optional live counter):** as `usage` events stream, you may sum
+   `read`/`input` across the turn's steps to show a live-updating hit % for the current turn.
+2. **Turn finished:** take that turn's authoritative totals from its `done.usage`. Use it as
+   the turn's final value (replace any live partial for that turn).
+3. **Cumulative (session/conversation):** add each completed turn's `done.usage` to a running
+   total. Compute the cumulative hit % from the running totals (`ΣcacheRead / Σinput`).
+4. **"Last request" rate:** the most recent turn's `done.usage` (or most recent step's `usage`
+   if you want per-round-trip granularity).
+
+## Replay / reopening a conversation
+On open, `GET /conversations/:id/metrics` → `ConversationMetricsResponse { turns: TurnMetrics[] }`.
+Seed the cumulative totals from `Σ turns[].usage`, the "last request" from `turns.at(-1).usage`,
+and you can render a per-turn (and per-step, via `steps[]`) breakdown — a superset of what the
+old session-cumulative-only panel could show.
+
+## Caveats (be honest in the UI)
+- **`cacheWriteTokens` is usually absent.** The current provider is OpenAI-compatible
+  (OpenCode Go): it reports a cache **read** count (`cached_tokens`) but **no cache-creation**
+  count. So the old panel's separate "write" row will be 0/empty. Hit/miss and the read
+  percentage are unaffected. It would populate only if an Anthropic-native (or
+  `cache_write`-reporting) provider is added.
+- **Optional fields:** any of the cache fields can be `undefined` (provider-dependent). Default
+  to 0; never assume presence.
+- **A legitimate 0% is not a bug.** OpenAI-style providers auto-cache (no `cache_control`
+  breakpoints), and short prompts below the provider's cache threshold simply won't be cached —
+  `cacheReadTokens: 0` is a real "miss", not missing data. Cache reads grow as a conversation's
+  resent prefix gets large enough.
+
+## Worked example (real numbers, captured live against OpenCode Go flash)
+| Turn | inputTokens | cacheReadTokens | hit % |
+|---|---|---|---|
+| 1 | 2669 | 384 | 14% |
+| 2 (history resent) | 2737 | 2560 | **93%** |
+
+Cumulative: read `2944` / input `5406` → **54%**. These exact values appear both on the live
+`done.usage` stream and in `GET /conversations/:id/metrics` (`turns[].usage`).
+
+## Type references
+- `@dispatch/wire`: `Usage`, `TurnUsageEvent` (`usage`), `TurnDoneEvent` (`done`),
+  `TurnMetrics`, `StepMetrics`.
+- `@dispatch/transport-contract`: `ConversationMetricsResponse`, and the WS `chat.delta`
+  envelope carrying each `AgentEvent`.
author	Adam Malczewski <[email protected]>	2026-06-10 09:03:43 +0900
committer	Adam Malczewski <[email protected]>	2026-06-10 09:03:43 +0900
commit	52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723 (patch)
tree	538ebecb3208e9bf5d081c3188557d893122912f
parent	2e583ea78a7efa6e9a31b7c5b4dfcf792007e418 (diff)
download	dispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.tar.gz dispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.zip