diff options
| author | Adam Malczewski <[email protected]> | 2026-06-11 14:21:08 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-11 14:21:08 +0900 |
| commit | 58e2ad559cccc8b35c513818e253b04e60af69b8 (patch) | |
| tree | a2b417861d26c41958e71abddccc224f7db6917a | |
| parent | 7ffb6b28f5b6bdbfc53ebed94fc68af557612189 (diff) | |
| download | dispatch-58e2ad559cccc8b35c513818e253b04e60af69b8.tar.gz dispatch-58e2ad559cccc8b35c513818e253b04e60af69b8.zip | |
docs(handoff): prune cache-warming FE handoff to what's unconsumed
Per the FE's backend-handoff.md (2026-06-11) the frontend shipped the NumberField
renderer, conversation-scoped subscriptions, the Cache Warming view, and warmNow().
Removed those sections; kept only the new cache-rate fix + expectedCacheRate
(retention) metric the FE has not yet consumed.
| -rw-r--r-- | frontend-cache-warming-handoff.md | 179 |
1 files changed, 43 insertions, 136 deletions
diff --git a/frontend-cache-warming-handoff.md b/frontend-cache-warming-handoff.md index 64b94d6..dedb13d 100644 --- a/frontend-cache-warming-handoff.md +++ b/frontend-cache-warming-handoff.md @@ -1,143 +1,45 @@ -# FE handoff — cache warming controls + surface protocol (NumberField, per-conversation scoping) +# FE handoff — cache warming: cache-rate fix + "expected cache" metric > **Courier doc** (backend → `../dispatch-web`, via the user). Per ORCHESTRATOR §7 the backend does -> NOT write the FE repo; the FE applies this delta on its side (regenerate the in-repo -> `.dispatch/*.reference.md` surface snapshots + bump the `file:` deps for `@dispatch/ui-contract` / -> `@dispatch/transport-contract`). `lsp references` does not span the two repos. -> Backend commits: `c2b4c05` (warming engine), `27fd0be` (manual `/chat/warm`), `ffbbcf6` (surface -> framework + cache-warming controls surface). +> NOT write the FE repo. `lsp references` does not span the two repos. +> Backend commits: `7ffb6b2` (arch-rewrite), `0e9d118` (`../claude/provider-anthropic`). -## What this delivers (and what the FE must do) -A **prompt-cache warming** feature: the backend periodically re-sends an idle conversation's prefix -to keep the provider cache warm, plus a manual trigger. The FE needs to (1) render a new **`number`** -surface field, (2) make the surface WS protocol **conversation-aware** (send/handle `conversationId`), -(3) render the **cache-warming control surface**, and (4) optionally wire a **"warm now" button** to a -new HTTP endpoint. All backend changes are **additive / backward-compatible** — existing global -surfaces (e.g. `loaded-extensions`) are unchanged. +## Status — most of the original handoff is DONE (removed) +Per the FE's `backend-handoff.md` (2026-06-11), the frontend has already consumed the bulk of the +earlier version of this doc — those sections are **removed**: +- ✅ `NumberField` (`kind:"number"`) renderer. +- ✅ Conversation-scoped surface subscriptions (focused `conversationId` on subscribe/invoke + + staleness rule; re-scope on conversation switch). +- ✅ The "Cache Warming" sidebar view: enabled toggle, minutes+seconds interval (`cache-warming/ + set-interval`), `cache-warming/toggle`, manual **Warm now** (`POST /chat/warm`), live countdown, + hit-% history. +- ✅ `warmNow()` posting `/chat/warm` with the conversation's model. ---- +What remains below is the ONE piece the FE has not yet consumed: a cache-rate **correctness fix** and +a new **retention** metric. -## A. `@dispatch/ui-contract` — new `NumberField` (RENDER THIS) -A new variant was added to the `SurfaceField` union: -```ts -export interface NumberField { - readonly kind: "number"; - readonly label: string; - readonly value: number; - readonly min?: number; // semantic lower bound (validate/step) - readonly max?: number; // semantic upper bound (may be absent = free value) - readonly step?: number; - readonly unit?: string; // display hint, e.g. "s" - readonly action: ActionRef;// invoke this with the new number as payload -} -``` -**FE action:** add a renderer case for `field.kind === "number"` (a numeric input/stepper). On -change, send an `invoke` (see §B) with the new number as the payload. It is the free-value -counterpart to `selector`. Until you add the case, your field switch should already gracefully skip -unknown kinds (it does for `custom`) — but the interval control won't show without it. - -## B. `@dispatch/ui-contract` — surface WS protocol is now conversation-aware -A surface can be **global** (one state for everyone, e.g. `loaded-extensions`) or **conversation- -scoped** (state differs per conversation, e.g. cache-warming). To support the latter, an optional -`conversationId` was added to the messages — **all optional, fully backward-compatible**: - -- **Client → server**: `SubscribeMessage`, `UnsubscribeMessage`, `InvokeMessage` each gained - `conversationId?: string`. -- **Server → client**: `SurfaceMessage` (the full-spec reply) and `SurfaceUpdate` (live patch) each - gained `conversationId?: string` (echoes which conversation the spec/update is for; absent for - global surfaces). - -**FE action / rules:** -1. When subscribing to a **conversation-scoped** surface, include the **currently-focused - `conversationId`**: `{ type: "subscribe", surfaceId, conversationId }`. The server replies with - `{ type: "surface", spec, conversationId }` and pushes `{ type: "update", update: { surfaceId, - spec, conversationId } }` for that conversation only. -2. **On conversation switch:** unsubscribe the old `(surfaceId, conversationId)` and resubscribe with - the new id (the server keys subscriptions by the pair). For **global** surfaces, just omit - `conversationId` — behaves exactly as today; no resubscribe needed on switch. -3. **Route incoming `surface`/`update` by `conversationId`** so a stale conversation's update doesn't - overwrite the focused one. -4. There is **no `scope` flag** on the catalog — the simplest correct FE policy is: always send the - focused `conversationId` on subscribe/invoke. Global surfaces ignore it; scoped ones use it. (If - no conversation is focused, omit it — a scoped surface then returns a default/empty spec.) - -## C. The cache-warming control surface (RENDER THIS) -- **Catalog entry:** `id: "cache-warming"`, `region: "side"`, `title: "Cache Warming"`. - **Conversation-scoped** → subscribe with the focused `conversationId`. -- **Spec fields (per conversation):** - | kind | label | meaning | action | - |---|---|---|---| - | `toggle` | enabled on/off | warming on for this conversation | `cache-warming/toggle` | - | `number` | refresh interval | **seconds** (`unit:"s"`, `min:1`, `step:1`, no `max` = free value) | `cache-warming/set-interval` | - | `stat` | last cache rate | most recent warm's `cachePct` (`"—"` when none yet) | — (read-only) | - | `stat` | cache retention | most recent warm's `expectedCacheRate` — the **health** signal (~100% = cache stayed warm; 0% = it expired) | — (read-only) | -- **Invoke payloads:** - - `cache-warming/toggle` → **flips** the current enabled state. Send `{ type: "invoke", surfaceId: - "cache-warming", actionId: "cache-warming/toggle", conversationId }` (payload is ignored — it - toggles, it does not set). - - `cache-warming/set-interval` → send the new interval **in seconds** as the payload: either a bare - number (`payload: 120`) or `{ value: 120 }`. The backend converts to ms and floors at 1000 ms - (1 s); NaN/non-positive are ignored. -- **Live updates:** the surface pushes an `update` (with `conversationId`) whenever the toggle/interval - changes or a warm completes (so the "last cache %" stat refreshes). Just re-render from the pushed - spec. - -## D. Manual warm trigger — `POST /chat/warm` (the "warm now" button) -For an on-demand warm (e.g. a button) without waiting for the automatic timer: -``` -POST /chat/warm - body WarmRequest { conversationId: string; model?: string; cwd?: string } - 200 WarmResponse { inputTokens; outputTokens; cacheReadTokens; cacheWriteTokens; - cachePct; expectedCacheRate } - 409 { error } // the conversation is currently generating — try again when idle - 400 { error } // missing/invalid conversationId -``` -- Pass the **same `model`** (`<credentialName>/<model>`) the conversation chats with, so the warm - request's prefix matches the real turn (that's what makes the cache hit). `cwd` only matters if the - conversation uses cwd-scoped tools. -- `cachePct` = `round(cacheReadTokens / inputTokens * 100)` — the cache RATE of the warm request. -- `expectedCacheRate` = `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` — the - **retention / health** signal: ~**100%** when the cache was still warm (read back, ~nothing - rewritten), **0%** when it had expired (rewrote everything). This is the one to headline for a - "is warming working?" indicator. -- The warm is **never** persisted or streamed and is **never** folded into the conversation's real - usage/cache-rate (keep it visually distinct from the real cache rate in §F / `frontend-cache-rate-handoff.md`). -- Types live in `@dispatch/transport-contract` (`WarmRequest`, `WarmResponse`). - -## E. Behavior model (for the UX) -- Warming is **per-conversation**: each conversation that has had a turn arms its own timer - (default **4 min**, under the provider's ~5-min cache TTL); it cancels while a turn is generating - and re-arms when the turn settles. Default **enabled = true**. -- The toggle/interval in the surface control THIS conversation's automatic warming; the button (§D) - fires one immediately regardless. -- Verified live against Claude (`claude/claude-haiku-4-5-...`): an idle conversation's warm reports - ~100% cache read once its prefix exceeds the provider's min-cacheable size. - -## F. Cache-rate metric — a correctness fix + the "expected cache" metric (READ THIS) +## Cache-rate metric — a correctness fix + the "expected cache" metric (TO CONSUME) A backend bug made the cache-hit % read **100% on Claude whenever anything was cached** (it inflated). Root cause: Anthropic's `input_tokens` is the *uncached remainder*, with cache read/creation reported separately — but the wire `Usage.inputTokens` convention (which the flash/OpenAI-compat provider already follows) is the **TOTAL prompt incl. cached**. Fixed in `../claude/provider-anthropic` -(`inputTokens = input + cacheRead + cacheWrite`). **No FE change needed** — your existing -`cacheRead/inputTokens` math (see `frontend-cache-rate-handoff.md`) now yields the *true* rate on -Claude. (Note: that older handoff's caveat "cacheWriteTokens is usually absent" is **not** true for +(`inputTokens = input + cacheRead + cacheWrite`). **No FE change needed for the fix itself** — your +existing `cacheRead/inputTokens` math (in `frontend-cache-rate-handoff.md`) now yields the *true* rate +on Claude. (That older handoff's caveat "cacheWriteTokens is usually absent" is **not** true for Claude — it reports both.) -Two distinct cache numbers — show them as different things: +Show two distinct cache numbers: - **Cache rate** = `cacheReadTokens / inputTokens` — *what fraction of THIS turn's prompt came from - cache*. It legitimately **drops when a turn adds a lot of new content** (e.g. a turn that pastes a - big file reads back the old prefix but also writes the new file → rate < 100%). This is the - per-turn efficiency number, available on every `usage`/`done` event and in persisted metrics. -- **Expected cache (retention)** = *of the cache that existed going into this turn, how much did we - read back* — ideally **~100% every turn after the first** (you re-read the entire prefix you - cached). It is a **cross-turn** derivation: + cache*. Legitimately **drops when a turn adds a lot of new content** (e.g. pasting a big file: reads + the old prefix back but also writes the new file → rate < 100%). Per-turn efficiency; on every + `usage`/`done` event + persisted metrics. +- **Expected cache (retention)** = *of the cache that existed going into this turn, how much we read + back* — ideally **~100% every turn after the first**. **<100% = the cache busted/expired.** It is a + **cross-turn** derivation (FE-side, from two consecutive turns' usage you already have): ``` - expectedCacheRate(turn N) = cacheRead_N / (cacheRead_{N-1} + cacheWrite_{N-1}) // clamp [0,1] + expectedCache(turn N) = clamp01( cacheRead_N / (cacheRead_{N-1} + cacheWrite_{N-1}) ) ``` - (denominator = the prior turn's cached prefix = what it read + what it wrote). **<100% means the - cache busted/expired** between turns. The FE derives this from two consecutive turns' usage (which - you already have, live + persisted). For the WARM endpoint/surface this same idea is the single-shot - `expectedCacheRate` (§C/§D) the backend already computes. + (denominator = the prior turn's cached prefix = what it read + what it wrote). **Worked example (live, Claude haiku), one chat, two real turns:** | turn | inputTokens (total) | cacheRead | cacheWrite | cache rate `cr/input` | expected cache (cross-turn) | @@ -145,14 +47,19 @@ Two distinct cache numbers — show them as different things: | 1 (fresh) | 5149 | 0 | 5146 | 0% | — | | 2 (new msg) | 8462 | 5146 | 3313 | **61%** | `5146/(0+5146)` = **100%** | -So on turn 2 the prompt was 61% cache (the rest was the new message), yet you successfully read back -**100%** of what turn 1 cached — two true, complementary signals. (Pre-fix, the rate wrongly showed -100% because the denominator excluded the 5146 cached tokens.) +So on turn 2 the prompt was 61% cache (the rest was the new message), yet you read back **100%** of +what turn 1 cached — two true, complementary signals. (Pre-fix, the rate wrongly showed 100% because +the denominator excluded the 5146 cached tokens.) + +### Warming-specific (already on the wire — small additions) +For the warming feature, the backend now also reports a **single-shot** retention so you don't have to +track cross-turn state there: +- **`WarmResponse.expectedCacheRate`** (new field on `POST /chat/warm`) = + `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` — ~**100%** when the warm + found the cache still warm, **0%** when it had expired (rewrote everything). This is the **"is + warming working?"** signal — headline this for the Warm-now result rather than `cachePct`. +- The conversation-scoped `cache-warming` surface gained a matching **`stat` "cache retention"** field + (alongside the existing "last cache rate" stat). It's a generic `stat`, so your existing renderer + already shows it — just relabel/position as desired. -## Versions / type references -- `@dispatch/ui-contract`: `NumberField` (new `SurfaceField` variant); `conversationId?` on - `SubscribeMessage`/`UnsubscribeMessage`/`InvokeMessage`/`SurfaceMessage`/`SurfaceUpdate`. -- `@dispatch/transport-contract`: `WarmRequest`, `WarmResponse` (now incl. `expectedCacheRate`). -- Cache-% fix: `../claude/provider-anthropic` now reports `inputTokens` as the total prompt — the - real (non-warming) cache rate in `frontend-cache-rate-handoff.md` becomes accurate on Claude with - no FE change; ignore that doc's "cacheWriteTokens usually absent" caveat for Claude. +Types: `@dispatch/transport-contract` `WarmResponse` now carries `expectedCacheRate` (additive). |
