diff options
| author | Adam Malczewski <[email protected]> | 2026-06-10 09:03:43 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-10 09:03:43 +0900 |
| commit | 52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723 (patch) | |
| tree | 538ebecb3208e9bf5d081c3188557d893122912f | |
| parent | 2e583ea78a7efa6e9a31b7c5b4dfcf792007e418 (diff) | |
| download | dispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.tar.gz dispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.zip | |
docs: FE cache hit/miss + percentage calculation handoff
Calculation-only courier doc for ../dispatch-web: Usage field semantics,
hitRate = cacheReadTokens/inputTokens, where to source it (usage/done events +
GET /conversations/:id/metrics, per-step/turn/cumulative), live accumulate +
done.usage reconcile, replay seeding, and the cacheWriteTokens-absent caveat.
No backend change required; UI design left to the FE.
| -rw-r--r-- | frontend-cache-rate-handoff.md | 100 |
1 files changed, 100 insertions, 0 deletions
diff --git a/frontend-cache-rate-handoff.md b/frontend-cache-rate-handoff.md new file mode 100644 index 0000000..158e199 --- /dev/null +++ b/frontend-cache-rate-handoff.md @@ -0,0 +1,100 @@ +# FE handoff — cache hit/miss + percentage (calculation guide) + +> **Courier doc** (backend → `../dispatch-web`, via the user). Per ORCHESTRATOR §7 +> the backend does not write the FE repo. This describes ONLY how to compute cache +> hit/miss + percentages from data the backend ALREADY exposes — **no UI design here** +> (the look is specified separately) and **no backend change is required**. +> Contracts: `@dispatch/wire` + `@dispatch/transport-contract` `0.4.0`. + +## TL;DR +The cache hit rate is `cacheReadTokens / inputTokens`. Everything you need is already +on the `usage` + `done` live events and in `GET /conversations/:id/metrics`. There is +**no separate cache endpoint or boolean** — it's derived from token counts, exactly as +the old `CacheRatePanel` did. + +## The data shape (`Usage`, from `@dispatch/wire`) +```ts +interface Usage { + inputTokens: number; // TOTAL prompt tokens this step/turn, INCLUDING cached ones + outputTokens: number; + cacheReadTokens?: number; // input tokens served FROM cache (the "hit" count). Optional. + cacheWriteTokens?: number; // cache-creation count. Optional; usually ABSENT (see caveats). +} +``` +Field semantics that matter for the math: +- `inputTokens` is the **whole** prompt, so `cacheReadTokens ≤ inputTokens` and the rate is in `[0,1]`. +- The cache fields are **optional** — treat `undefined` as `0` in all arithmetic. + +## Formulas +```ts +const read = u.cacheReadTokens ?? 0; +const write = u.cacheWriteTokens ?? 0; + +const isHit = read > 0; // hit vs miss +const hitRate = u.inputTokens > 0 ? read / u.inputTokens : 0; // 0..1 (guard /0) +const hitPct = Math.round(hitRate * 100); +const fresh = Math.max(0, u.inputTokens - read - write); // uncached input tokens +``` +(These are byte-identical to the old `CacheRatePanel.svelte` formulas: hit rate = +`cacheReadTokens/inputTokens` clamped; uncached = `max(0, input − read − write)`.) + +## Where to get `Usage` — three granularities, two channels + +| Scope | LIVE (WS `chat.delta` / NDJSON) | REPLAY (`GET /conversations/:id/metrics`) | +|---|---|---| +| **Per step** | `usage` event (`type:"usage"`, carries `stepId`, `usage`) | `TurnMetrics.steps[].usage` (each has `stepId`) | +| **Per turn** (authoritative aggregate) | `done` event (`type:"done"`, carries `usage`, `durationMs`) | `TurnMetrics.usage` | +| **Cumulative** (conversation) | Σ of each turn's `done.usage` | Σ of `turns[].usage` | + +Notes: +- The **per-turn aggregate IS the sum of its steps** (the runtime aggregates). So when + summing a cumulative figure, pick ONE granularity — sum `done.usage`/`TurnMetrics.usage` + per turn, **or** sum all steps — never both (double-count). +- `done.usage` is the authoritative per-turn total. (`turn-sealed` does NOT carry usage in + this backend — it's just `{conversationId, turnId}`; the numbers ride the immediately + preceding `done` event.) +- `step-complete` is timing only (ttft/decode) — no tokens; ignore it for cache. + +## Live accumulation + reconcile (recommended pattern) +1. **In-progress turn (optional live counter):** as `usage` events stream, you may sum + `read`/`input` across the turn's steps to show a live-updating hit % for the current turn. +2. **Turn finished:** take that turn's authoritative totals from its `done.usage`. Use it as + the turn's final value (replace any live partial for that turn). +3. **Cumulative (session/conversation):** add each completed turn's `done.usage` to a running + total. Compute the cumulative hit % from the running totals (`ΣcacheRead / Σinput`). +4. **"Last request" rate:** the most recent turn's `done.usage` (or most recent step's `usage` + if you want per-round-trip granularity). + +## Replay / reopening a conversation +On open, `GET /conversations/:id/metrics` → `ConversationMetricsResponse { turns: TurnMetrics[] }`. +Seed the cumulative totals from `Σ turns[].usage`, the "last request" from `turns.at(-1).usage`, +and you can render a per-turn (and per-step, via `steps[]`) breakdown — a superset of what the +old session-cumulative-only panel could show. + +## Caveats (be honest in the UI) +- **`cacheWriteTokens` is usually absent.** The current provider is OpenAI-compatible + (OpenCode Go): it reports a cache **read** count (`cached_tokens`) but **no cache-creation** + count. So the old panel's separate "write" row will be 0/empty. Hit/miss and the read + percentage are unaffected. It would populate only if an Anthropic-native (or + `cache_write`-reporting) provider is added. +- **Optional fields:** any of the cache fields can be `undefined` (provider-dependent). Default + to 0; never assume presence. +- **A legitimate 0% is not a bug.** OpenAI-style providers auto-cache (no `cache_control` + breakpoints), and short prompts below the provider's cache threshold simply won't be cached — + `cacheReadTokens: 0` is a real "miss", not missing data. Cache reads grow as a conversation's + resent prefix gets large enough. + +## Worked example (real numbers, captured live against OpenCode Go flash) +| Turn | inputTokens | cacheReadTokens | hit % | +|---|---|---|---| +| 1 | 2669 | 384 | 14% | +| 2 (history resent) | 2737 | 2560 | **93%** | + +Cumulative: read `2944` / input `5406` → **54%**. These exact values appear both on the live +`done.usage` stream and in `GET /conversations/:id/metrics` (`turns[].usage`). + +## Type references +- `@dispatch/wire`: `Usage`, `TurnUsageEvent` (`usage`), `TurnDoneEvent` (`done`), + `TurnMetrics`, `StepMetrics`. +- `@dispatch/transport-contract`: `ConversationMetricsResponse`, and the WS `chat.delta` + envelope carrying each `AgentEvent`. |
