summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-06-10 09:03:43 +0900
committerAdam Malczewski <[email protected]>2026-06-10 09:03:43 +0900
commit52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723 (patch)
tree538ebecb3208e9bf5d081c3188557d893122912f
parent2e583ea78a7efa6e9a31b7c5b4dfcf792007e418 (diff)
downloaddispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.tar.gz
dispatch-52ce1bb8b25cc6ba4ba7d2734c35c95e0a08d723.zip
docs: FE cache hit/miss + percentage calculation handoff
Calculation-only courier doc for ../dispatch-web: Usage field semantics, hitRate = cacheReadTokens/inputTokens, where to source it (usage/done events + GET /conversations/:id/metrics, per-step/turn/cumulative), live accumulate + done.usage reconcile, replay seeding, and the cacheWriteTokens-absent caveat. No backend change required; UI design left to the FE.
-rw-r--r--frontend-cache-rate-handoff.md100
1 files changed, 100 insertions, 0 deletions
diff --git a/frontend-cache-rate-handoff.md b/frontend-cache-rate-handoff.md
new file mode 100644
index 0000000..158e199
--- /dev/null
+++ b/frontend-cache-rate-handoff.md
@@ -0,0 +1,100 @@
+# FE handoff — cache hit/miss + percentage (calculation guide)
+
+> **Courier doc** (backend → `../dispatch-web`, via the user). Per ORCHESTRATOR §7
+> the backend does not write the FE repo. This describes ONLY how to compute cache
+> hit/miss + percentages from data the backend ALREADY exposes — **no UI design here**
+> (the look is specified separately) and **no backend change is required**.
+> Contracts: `@dispatch/wire` + `@dispatch/transport-contract` `0.4.0`.
+
+## TL;DR
+The cache hit rate is `cacheReadTokens / inputTokens`. Everything you need is already
+on the `usage` + `done` live events and in `GET /conversations/:id/metrics`. There is
+**no separate cache endpoint or boolean** — it's derived from token counts, exactly as
+the old `CacheRatePanel` did.
+
+## The data shape (`Usage`, from `@dispatch/wire`)
+```ts
+interface Usage {
+ inputTokens: number; // TOTAL prompt tokens this step/turn, INCLUDING cached ones
+ outputTokens: number;
+ cacheReadTokens?: number; // input tokens served FROM cache (the "hit" count). Optional.
+ cacheWriteTokens?: number; // cache-creation count. Optional; usually ABSENT (see caveats).
+}
+```
+Field semantics that matter for the math:
+- `inputTokens` is the **whole** prompt, so `cacheReadTokens ≤ inputTokens` and the rate is in `[0,1]`.
+- The cache fields are **optional** — treat `undefined` as `0` in all arithmetic.
+
+## Formulas
+```ts
+const read = u.cacheReadTokens ?? 0;
+const write = u.cacheWriteTokens ?? 0;
+
+const isHit = read > 0; // hit vs miss
+const hitRate = u.inputTokens > 0 ? read / u.inputTokens : 0; // 0..1 (guard /0)
+const hitPct = Math.round(hitRate * 100);
+const fresh = Math.max(0, u.inputTokens - read - write); // uncached input tokens
+```
+(These are byte-identical to the old `CacheRatePanel.svelte` formulas: hit rate =
+`cacheReadTokens/inputTokens` clamped; uncached = `max(0, input − read − write)`.)
+
+## Where to get `Usage` — three granularities, two channels
+
+| Scope | LIVE (WS `chat.delta` / NDJSON) | REPLAY (`GET /conversations/:id/metrics`) |
+|---|---|---|
+| **Per step** | `usage` event (`type:"usage"`, carries `stepId`, `usage`) | `TurnMetrics.steps[].usage` (each has `stepId`) |
+| **Per turn** (authoritative aggregate) | `done` event (`type:"done"`, carries `usage`, `durationMs`) | `TurnMetrics.usage` |
+| **Cumulative** (conversation) | Σ of each turn's `done.usage` | Σ of `turns[].usage` |
+
+Notes:
+- The **per-turn aggregate IS the sum of its steps** (the runtime aggregates). So when
+ summing a cumulative figure, pick ONE granularity — sum `done.usage`/`TurnMetrics.usage`
+ per turn, **or** sum all steps — never both (double-count).
+- `done.usage` is the authoritative per-turn total. (`turn-sealed` does NOT carry usage in
+ this backend — it's just `{conversationId, turnId}`; the numbers ride the immediately
+ preceding `done` event.)
+- `step-complete` is timing only (ttft/decode) — no tokens; ignore it for cache.
+
+## Live accumulation + reconcile (recommended pattern)
+1. **In-progress turn (optional live counter):** as `usage` events stream, you may sum
+ `read`/`input` across the turn's steps to show a live-updating hit % for the current turn.
+2. **Turn finished:** take that turn's authoritative totals from its `done.usage`. Use it as
+ the turn's final value (replace any live partial for that turn).
+3. **Cumulative (session/conversation):** add each completed turn's `done.usage` to a running
+ total. Compute the cumulative hit % from the running totals (`ΣcacheRead / Σinput`).
+4. **"Last request" rate:** the most recent turn's `done.usage` (or most recent step's `usage`
+ if you want per-round-trip granularity).
+
+## Replay / reopening a conversation
+On open, `GET /conversations/:id/metrics` → `ConversationMetricsResponse { turns: TurnMetrics[] }`.
+Seed the cumulative totals from `Σ turns[].usage`, the "last request" from `turns.at(-1).usage`,
+and you can render a per-turn (and per-step, via `steps[]`) breakdown — a superset of what the
+old session-cumulative-only panel could show.
+
+## Caveats (be honest in the UI)
+- **`cacheWriteTokens` is usually absent.** The current provider is OpenAI-compatible
+ (OpenCode Go): it reports a cache **read** count (`cached_tokens`) but **no cache-creation**
+ count. So the old panel's separate "write" row will be 0/empty. Hit/miss and the read
+ percentage are unaffected. It would populate only if an Anthropic-native (or
+ `cache_write`-reporting) provider is added.
+- **Optional fields:** any of the cache fields can be `undefined` (provider-dependent). Default
+ to 0; never assume presence.
+- **A legitimate 0% is not a bug.** OpenAI-style providers auto-cache (no `cache_control`
+ breakpoints), and short prompts below the provider's cache threshold simply won't be cached —
+ `cacheReadTokens: 0` is a real "miss", not missing data. Cache reads grow as a conversation's
+ resent prefix gets large enough.
+
+## Worked example (real numbers, captured live against OpenCode Go flash)
+| Turn | inputTokens | cacheReadTokens | hit % |
+|---|---|---|---|
+| 1 | 2669 | 384 | 14% |
+| 2 (history resent) | 2737 | 2560 | **93%** |
+
+Cumulative: read `2944` / input `5406` → **54%**. These exact values appear both on the live
+`done.usage` stream and in `GET /conversations/:id/metrics` (`turns[].usage`).
+
+## Type references
+- `@dispatch/wire`: `Usage`, `TurnUsageEvent` (`usage`), `TurnDoneEvent` (`done`),
+ `TurnMetrics`, `StepMetrics`.
+- `@dispatch/transport-contract`: `ConversationMetricsResponse`, and the WS `chat.delta`
+ envelope carrying each `AgentEvent`.