diff options
| -rw-r--r-- | frontend-cache-warming-handoff.md | 59 | ||||
| -rw-r--r-- | packages/cache-warming/src/extension.ts | 7 | ||||
| -rw-r--r-- | packages/cache-warming/src/index.ts | 1 | ||||
| -rw-r--r-- | packages/cache-warming/src/pure.test.ts | 44 | ||||
| -rw-r--r-- | packages/cache-warming/src/pure.ts | 25 | ||||
| -rw-r--r-- | packages/cache-warming/src/warmer.test.ts | 47 | ||||
| -rw-r--r-- | packages/cache-warming/src/warmer.ts | 6 | ||||
| -rw-r--r-- | packages/transport-contract/src/index.ts | 15 | ||||
| -rw-r--r-- | packages/transport-http/src/app.test.ts | 52 | ||||
| -rw-r--r-- | packages/transport-http/src/app.ts | 2 | ||||
| -rw-r--r-- | packages/transport-http/src/logic.test.ts | 24 | ||||
| -rw-r--r-- | packages/transport-http/src/logic.ts | 9 | ||||
| -rw-r--r-- | tasks.md | 8 |
13 files changed, 280 insertions, 19 deletions
diff --git a/frontend-cache-warming-handoff.md b/frontend-cache-warming-handoff.md index e5d50b3..64b94d6 100644 --- a/frontend-cache-warming-handoff.md +++ b/frontend-cache-warming-handoff.md @@ -69,7 +69,8 @@ scoped** (state differs per conversation, e.g. cache-warming). To support the la |---|---|---|---| | `toggle` | enabled on/off | warming on for this conversation | `cache-warming/toggle` | | `number` | refresh interval | **seconds** (`unit:"s"`, `min:1`, `step:1`, no `max` = free value) | `cache-warming/set-interval` | - | `stat` | last cache % | most recent warm's hit % (`"—"` when none yet) | — (read-only) | + | `stat` | last cache rate | most recent warm's `cachePct` (`"—"` when none yet) | — (read-only) | + | `stat` | cache retention | most recent warm's `expectedCacheRate` — the **health** signal (~100% = cache stayed warm; 0% = it expired) | — (read-only) | - **Invoke payloads:** - `cache-warming/toggle` → **flips** the current enabled state. Send `{ type: "invoke", surfaceId: "cache-warming", actionId: "cache-warming/toggle", conversationId }` (payload is ignored — it @@ -86,17 +87,21 @@ For an on-demand warm (e.g. a button) without waiting for the automatic timer: ``` POST /chat/warm body WarmRequest { conversationId: string; model?: string; cwd?: string } - 200 WarmResponse { inputTokens; outputTokens; cacheReadTokens; cacheWriteTokens; cachePct } + 200 WarmResponse { inputTokens; outputTokens; cacheReadTokens; cacheWriteTokens; + cachePct; expectedCacheRate } 409 { error } // the conversation is currently generating — try again when idle 400 { error } // missing/invalid conversationId ``` - Pass the **same `model`** (`<credentialName>/<model>`) the conversation chats with, so the warm request's prefix matches the real turn (that's what makes the cache hit). `cwd` only matters if the conversation uses cwd-scoped tools. -- `cachePct` = `round(clamp(cacheReadTokens / inputTokens, 0, 1) * 100)` — show it as the "last - warming" hit indicator. The warm is **never** persisted or streamed and is **never** folded into - the conversation's real usage/cache-rate (keep it visually distinct from the real cache rate in - §`frontend-cache-rate-handoff.md`). +- `cachePct` = `round(cacheReadTokens / inputTokens * 100)` — the cache RATE of the warm request. +- `expectedCacheRate` = `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` — the + **retention / health** signal: ~**100%** when the cache was still warm (read back, ~nothing + rewritten), **0%** when it had expired (rewrote everything). This is the one to headline for a + "is warming working?" indicator. +- The warm is **never** persisted or streamed and is **never** folded into the conversation's real + usage/cache-rate (keep it visually distinct from the real cache rate in §F / `frontend-cache-rate-handoff.md`). - Types live in `@dispatch/transport-contract` (`WarmRequest`, `WarmResponse`). ## E. Behavior model (for the UX) @@ -108,8 +113,46 @@ POST /chat/warm - Verified live against Claude (`claude/claude-haiku-4-5-...`): an idle conversation's warm reports ~100% cache read once its prefix exceeds the provider's min-cacheable size. +## F. Cache-rate metric — a correctness fix + the "expected cache" metric (READ THIS) +A backend bug made the cache-hit % read **100% on Claude whenever anything was cached** (it inflated). +Root cause: Anthropic's `input_tokens` is the *uncached remainder*, with cache read/creation reported +separately — but the wire `Usage.inputTokens` convention (which the flash/OpenAI-compat provider +already follows) is the **TOTAL prompt incl. cached**. Fixed in `../claude/provider-anthropic` +(`inputTokens = input + cacheRead + cacheWrite`). **No FE change needed** — your existing +`cacheRead/inputTokens` math (see `frontend-cache-rate-handoff.md`) now yields the *true* rate on +Claude. (Note: that older handoff's caveat "cacheWriteTokens is usually absent" is **not** true for +Claude — it reports both.) + +Two distinct cache numbers — show them as different things: +- **Cache rate** = `cacheReadTokens / inputTokens` — *what fraction of THIS turn's prompt came from + cache*. It legitimately **drops when a turn adds a lot of new content** (e.g. a turn that pastes a + big file reads back the old prefix but also writes the new file → rate < 100%). This is the + per-turn efficiency number, available on every `usage`/`done` event and in persisted metrics. +- **Expected cache (retention)** = *of the cache that existed going into this turn, how much did we + read back* — ideally **~100% every turn after the first** (you re-read the entire prefix you + cached). It is a **cross-turn** derivation: + ``` + expectedCacheRate(turn N) = cacheRead_N / (cacheRead_{N-1} + cacheWrite_{N-1}) // clamp [0,1] + ``` + (denominator = the prior turn's cached prefix = what it read + what it wrote). **<100% means the + cache busted/expired** between turns. The FE derives this from two consecutive turns' usage (which + you already have, live + persisted). For the WARM endpoint/surface this same idea is the single-shot + `expectedCacheRate` (§C/§D) the backend already computes. + +**Worked example (live, Claude haiku), one chat, two real turns:** +| turn | inputTokens (total) | cacheRead | cacheWrite | cache rate `cr/input` | expected cache (cross-turn) | +|---|---|---|---|---|---| +| 1 (fresh) | 5149 | 0 | 5146 | 0% | — | +| 2 (new msg) | 8462 | 5146 | 3313 | **61%** | `5146/(0+5146)` = **100%** | + +So on turn 2 the prompt was 61% cache (the rest was the new message), yet you successfully read back +**100%** of what turn 1 cached — two true, complementary signals. (Pre-fix, the rate wrongly showed +100% because the denominator excluded the 5146 cached tokens.) + ## Versions / type references - `@dispatch/ui-contract`: `NumberField` (new `SurfaceField` variant); `conversationId?` on `SubscribeMessage`/`UnsubscribeMessage`/`InvokeMessage`/`SurfaceMessage`/`SurfaceUpdate`. -- `@dispatch/transport-contract`: `WarmRequest`, `WarmResponse`. -- Cache-% math + the real (non-warming) cache rate: see `frontend-cache-rate-handoff.md` (unchanged). +- `@dispatch/transport-contract`: `WarmRequest`, `WarmResponse` (now incl. `expectedCacheRate`). +- Cache-% fix: `../claude/provider-anthropic` now reports `inputTokens` as the total prompt — the + real (non-warming) cache rate in `frontend-cache-rate-handoff.md` becomes accurate on Claude with + no FE change; ignore that doc's "cacheWriteTokens usually absent" caveat for Claude. diff --git a/packages/cache-warming/src/extension.ts b/packages/cache-warming/src/extension.ts index 26d429b..802618a 100644 --- a/packages/cache-warming/src/extension.ts +++ b/packages/cache-warming/src/extension.ts @@ -77,7 +77,12 @@ export function activate(host: HostAPI): void { return buildDefaultSpec(); } const state = warmer.getState(convId); - return buildConversationSpec(state.enabled, state.intervalMs, state.lastPct); + return buildConversationSpec( + state.enabled, + state.intervalMs, + state.lastPct, + state.lastExpectedPct, + ); } async function invoke( diff --git a/packages/cache-warming/src/index.ts b/packages/cache-warming/src/index.ts index d77f4ec..88cab3b 100644 --- a/packages/cache-warming/src/index.ts +++ b/packages/cache-warming/src/index.ts @@ -5,6 +5,7 @@ export { type ConversationSettings, type ConversationState, computeCachePct, + computeExpectedCacheRate, DEFAULT_INTERVAL_MS, isTokenCurrent, MIN_INTERVAL_MS, diff --git a/packages/cache-warming/src/pure.test.ts b/packages/cache-warming/src/pure.test.ts index 1c912f2..f5e2f1d 100644 --- a/packages/cache-warming/src/pure.test.ts +++ b/packages/cache-warming/src/pure.test.ts @@ -4,6 +4,7 @@ import { buildConversationSpec, buildDefaultSpec, computeCachePct, + computeExpectedCacheRate, isTokenCurrent, MIN_INTERVAL_MS, msToSeconds, @@ -29,6 +30,20 @@ describe("computeCachePct", () => { }); }); +describe("computeExpectedCacheRate", () => { + it("cacheRead/(cacheRead+cacheWrite) rounded", () => { + expect(computeExpectedCacheRate(800, 200)).toBe(80); + expect(computeExpectedCacheRate(500, 500)).toBe(50); + expect(computeExpectedCacheRate(1000, 0)).toBe(100); + expect(computeExpectedCacheRate(0, 1000)).toBe(0); + expect(computeExpectedCacheRate(333, 667)).toBe(33); + }); + + it("0 when cacheRead+cacheWrite is 0", () => { + expect(computeExpectedCacheRate(0, 0)).toBe(0); + }); +}); + describe("shouldWarm", () => { it("returns true when enabled, idle, and token matches", () => { const state: ConversationState = { @@ -36,6 +51,7 @@ describe("shouldWarm", () => { intervalMs: 240_000, active: false, lastPct: null, + lastExpectedPct: null, token: 5, }; expect(shouldWarm(state, 5)).toBe(true); @@ -47,6 +63,7 @@ describe("shouldWarm", () => { intervalMs: 240_000, active: false, lastPct: null, + lastExpectedPct: null, token: 5, }; expect(shouldWarm(state, 5)).toBe(false); @@ -58,6 +75,7 @@ describe("shouldWarm", () => { intervalMs: 240_000, active: true, lastPct: null, + lastExpectedPct: null, token: 5, }; expect(shouldWarm(state, 5)).toBe(false); @@ -69,6 +87,7 @@ describe("shouldWarm", () => { intervalMs: 240_000, active: false, lastPct: null, + lastExpectedPct: null, token: 5, }; expect(shouldWarm(state, 6)).toBe(false); @@ -162,12 +181,12 @@ describe("parseIntervalPayload", () => { }); describe("buildConversationSpec", () => { - it("builds a per-conversation spec with toggle + number(interval) + last-% fields", () => { - const spec = buildConversationSpec(true, 240_000, 80); + it("builds a per-conversation spec with toggle + number(interval) + last-% + retention fields", () => { + const spec = buildConversationSpec(true, 240_000, 80, 95); expect(spec.id).toBe("cache-warming"); expect(spec.region).toBe("side"); expect(spec.title).toBe("Cache Warming"); - expect(spec.fields).toHaveLength(3); + expect(spec.fields).toHaveLength(4); const toggle = spec.fields[0]; expect(toggle).toEqual({ @@ -194,20 +213,33 @@ describe("buildConversationSpec", () => { label: "Last Cache %", value: "80%", }); + + const retention = spec.fields[3]; + expect(retention).toEqual({ + kind: "stat", + label: "Cache retention", + value: "95%", + }); }); - it("shows — when lastPct is null", () => { - const spec = buildConversationSpec(true, 240_000, null); + it("shows — when lastPct and lastExpectedPct are null", () => { + const spec = buildConversationSpec(true, 240_000, null, null); const stat = spec.fields[2]; expect(stat).toEqual({ kind: "stat", label: "Last Cache %", value: "—", }); + const retention = spec.fields[3]; + expect(retention).toEqual({ + kind: "stat", + label: "Cache retention", + value: "—", + }); }); it("reflects disabled state", () => { - const spec = buildConversationSpec(false, 120_000, 50); + const spec = buildConversationSpec(false, 120_000, 50, 75); const toggle = spec.fields[0]; expect(toggle).toEqual({ kind: "toggle", diff --git a/packages/cache-warming/src/pure.ts b/packages/cache-warming/src/pure.ts index 7b91b11..ab6fc79 100644 --- a/packages/cache-warming/src/pure.ts +++ b/packages/cache-warming/src/pure.ts @@ -17,6 +17,7 @@ export interface ConversationSettings { export interface ConversationState extends ConversationSettings { readonly active: boolean; readonly lastPct: number | null; + readonly lastExpectedPct: number | null; readonly token: number; } @@ -43,6 +44,21 @@ export function computeCachePct(inputTokens: number, cacheReadTokens: number): n } /** + * Compute expected cache retention rate from token counts. + * Of the cacheable prefix the warm touched, how much was still warm (read back) + * vs. had to be (re)written. + * Returns an integer in [0, 100]. cacheRead + cacheWrite ≤ 0 → 0. + */ +export function computeExpectedCacheRate( + cacheReadTokens: number, + cacheWriteTokens: number, +): number { + const total = cacheReadTokens + cacheWriteTokens; + if (total <= 0) return 0; + return Math.round((cacheReadTokens / total) * 100); +} + +/** * Decide whether a conversation should be warmed right now. * Requires: enabled, idle (not active), and the token is current (not superseded). */ @@ -120,8 +136,10 @@ export function buildConversationSpec( enabled: boolean, intervalMs: number, lastPct: number | null, + lastExpectedPct: number | null, ): SurfaceSpec { const pctDisplay = lastPct === null ? "—" : `${lastPct}%`; + const retentionDisplay = lastExpectedPct === null ? "—" : `${lastExpectedPct}%`; const toggle: ToggleField = { kind: "toggle", label: "Enabled", @@ -142,11 +160,16 @@ export function buildConversationSpec( label: "Last Cache %", value: pctDisplay, }; + const retentionStat: StatField = { + kind: "stat", + label: "Cache retention", + value: retentionDisplay, + }; return { id: "cache-warming", region: "side", title: "Cache Warming", - fields: [toggle, interval, stat], + fields: [toggle, interval, stat, retentionStat], }; } diff --git a/packages/cache-warming/src/warmer.test.ts b/packages/cache-warming/src/warmer.test.ts index 9865877..86908a2 100644 --- a/packages/cache-warming/src/warmer.test.ts +++ b/packages/cache-warming/src/warmer.test.ts @@ -182,6 +182,30 @@ describe("CacheWarmer", () => { expect(state.lastPct).toBe(80); }); + it("a completed warm stores both lastPct (rate) and lastExpectedPct (retention)", async () => { + const timers = fakeTimers(); + const warmer = createCacheWarmer({ + warm: async () => ({ + inputTokens: 1000, + outputTokens: 10, + cacheReadTokens: 700, + cacheWriteTokens: 300, + }), + storage: memStorage(), + logger: makeLogger(), + timers, + onSurfaceChange: () => {}, + }); + + warmer.onTurnSettled("conv-1", {}); + timers.flush(); + + await new Promise((r) => setTimeout(r, 10)); + const state = warmer.getState("conv-1"); + expect(state.lastPct).toBe(70); + expect(state.lastExpectedPct).toBe(70); + }); + it("re-arms timer after warm completes", async () => { const timers = fakeTimers(); let warmCount = 0; @@ -316,4 +340,27 @@ describe("CacheWarmer", () => { await warmer.setIntervalMs("conv-1", 30_000); expect(changeCount).toBe(2); }); + + it("the per-conversation spec includes a cache-retention stat", async () => { + const timers = fakeTimers(); + const warmer = createCacheWarmer({ + warm: async () => ({ + inputTokens: 1000, + outputTokens: 10, + cacheReadTokens: 900, + cacheWriteTokens: 100, + }), + storage: memStorage(), + logger: makeLogger(), + timers, + onSurfaceChange: () => {}, + }); + + warmer.onTurnSettled("conv-1", {}); + timers.flush(); + await new Promise((r) => setTimeout(r, 10)); + + const state = warmer.getState("conv-1"); + expect(state.lastExpectedPct).toBe(90); + }); }); diff --git a/packages/cache-warming/src/warmer.ts b/packages/cache-warming/src/warmer.ts index 31dd41e..f50f346 100644 --- a/packages/cache-warming/src/warmer.ts +++ b/packages/cache-warming/src/warmer.ts @@ -5,6 +5,7 @@ import { type ConversationSettings, type ConversationState, computeCachePct, + computeExpectedCacheRate, DEFAULT_INTERVAL_MS, isTokenCurrent, MIN_INTERVAL_MS, @@ -63,6 +64,7 @@ const DEFAULT_STATE: ConversationState = { intervalMs: DEFAULT_INTERVAL_MS, active: false, lastPct: null, + lastExpectedPct: null, token: 0, }; @@ -145,11 +147,13 @@ export function createCacheWarmer(deps: CacheWarmerDeps): CacheWarmer { }); } else { const pct = computeCachePct(result.inputTokens, result.cacheReadTokens); - setState(conversationId, { ...currentState, lastPct: pct }); + const expectedPct = computeExpectedCacheRate(result.cacheReadTokens, result.cacheWriteTokens); + setState(conversationId, { ...currentState, lastPct: pct, lastExpectedPct: expectedPct }); deps.onSurfaceChange(); deps.logger.debug("cache-warming: warm complete", { conversationId, pct, + expectedPct, }); } diff --git a/packages/transport-contract/src/index.ts b/packages/transport-contract/src/index.ts index fbb61fc..95111ae 100644 --- a/packages/transport-contract/src/index.ts +++ b/packages/transport-contract/src/index.ts @@ -192,10 +192,21 @@ export interface WarmResponse { readonly cacheReadTokens: number; readonly cacheWriteTokens: number; /** - * Cache-hit percent: `round(clamp(cacheReadTokens / inputTokens, 0, 1) * 100)` - * (0 when `inputTokens <= 0`). + * **Cache rate** — what fraction of THIS request's prompt was served from cache: + * `round(cacheReadTokens / inputTokens * 100)` (0 when `inputTokens <= 0`). + * (`inputTokens` is the TOTAL prompt incl. cached, so this is in [0,100].) */ readonly cachePct: number; + /** + * **Expected cache (retention)** — of the cacheable prefix this warm touched, how + * much was still warm and read back vs. had to be (re)written: + * `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` (0 when the + * sum is 0). For a healthy warm this is ~**100%** (the whole prefix was still + * cached); it drops toward 0 as the cache expires/busts and the warm has to rewrite + * it. This is the warming HEALTH signal — distinct from `cachePct` (which a warm's + * tiny fresh probe makes ~equal, but which on a real turn reflects new content). + */ + readonly expectedCacheRate: number; } // ─── WebSocket chat ops ─────────────────────────────────────────────────────── diff --git a/packages/transport-http/src/app.test.ts b/packages/transport-http/src/app.test.ts index 7352b5d..22b26fc 100644 --- a/packages/transport-http/src/app.test.ts +++ b/packages/transport-http/src/app.test.ts @@ -449,12 +449,64 @@ describe("POST /chat/warm", () => { cacheReadTokens: number; cacheWriteTokens: number; cachePct: number; + expectedCacheRate: number; }; expect(body.inputTokens).toBe(1000); expect(body.outputTokens).toBe(200); expect(body.cacheReadTokens).toBe(800); expect(body.cacheWriteTokens).toBe(100); expect(body.cachePct).toBe(80); + expect(body.expectedCacheRate).toBe(89); + }); + + it("POST /chat/warm returns expectedCacheRate = round(cacheRead/(cacheRead+cacheWrite)*100)", async () => { + const app = createApp({ + conversationStore: createFakeConversationStore(), + orchestrator: createFakeOrchestrator([]), + credentialStore: createFakeCredentialStore([]), + warmService: createFakeWarmService({ + inputTokens: 500, + outputTokens: 100, + cacheReadTokens: 400, + cacheWriteTokens: 100, + }), + logger: noopLogger, + }); + + const res = await app.request("/chat/warm", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ conversationId: "conv1" }), + }); + + expect(res.status).toBe(200); + const body = (await res.json()) as { expectedCacheRate: number }; + expect(body.expectedCacheRate).toBe(80); + }); + + it("POST /chat/warm returns expectedCacheRate = 0 when cacheRead+cacheWrite is 0", async () => { + const app = createApp({ + conversationStore: createFakeConversationStore(), + orchestrator: createFakeOrchestrator([]), + credentialStore: createFakeCredentialStore([]), + warmService: createFakeWarmService({ + inputTokens: 100, + outputTokens: 50, + cacheReadTokens: 0, + cacheWriteTokens: 0, + }), + logger: noopLogger, + }); + + const res = await app.request("/chat/warm", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ conversationId: "conv1" }), + }); + + expect(res.status).toBe(200); + const body = (await res.json()) as { expectedCacheRate: number }; + expect(body.expectedCacheRate).toBe(0); }); it("POST /chat/warm returns 409 when the warm service reports the conversation is generating", async () => { diff --git a/packages/transport-http/src/app.ts b/packages/transport-http/src/app.ts index a8cef51..84c7d20 100644 --- a/packages/transport-http/src/app.ts +++ b/packages/transport-http/src/app.ts @@ -10,6 +10,7 @@ import { Hono } from "hono"; import { cors } from "hono/cors"; import { computeCachePct, + computeExpectedCacheRate, isParseError, isSinceSeqError, parseChatBody, @@ -284,6 +285,7 @@ export function createApp(opts: CreateServerOptions): Hono { cacheReadTokens: result.cacheReadTokens, cacheWriteTokens: result.cacheWriteTokens, cachePct: computeCachePct(result.inputTokens, result.cacheReadTokens), + expectedCacheRate: computeExpectedCacheRate(result.cacheReadTokens, result.cacheWriteTokens), }; return c.json(response, 200); }); diff --git a/packages/transport-http/src/logic.test.ts b/packages/transport-http/src/logic.test.ts index 1e33f40..19b47ef 100644 --- a/packages/transport-http/src/logic.test.ts +++ b/packages/transport-http/src/logic.test.ts @@ -1,6 +1,7 @@ import type { AgentEvent } from "@dispatch/kernel"; import { describe, expect, it } from "vitest"; import { + computeExpectedCacheRate, isParseError, isSinceSeqError, parseChatBody, @@ -197,3 +198,26 @@ describe("serializeEventLine", () => { expect(parsed.reason).toBe("stop"); }); }); + +describe("computeExpectedCacheRate", () => { + it("returns round(cacheRead/(cacheRead+cacheWrite)*100)", () => { + expect(computeExpectedCacheRate(800, 200)).toBe(80); + }); + + it("returns 0 when cacheRead+cacheWrite is 0", () => { + expect(computeExpectedCacheRate(0, 0)).toBe(0); + }); + + it("returns 100 when all tokens are cacheRead", () => { + expect(computeExpectedCacheRate(500, 0)).toBe(100); + }); + + it("returns 0 when all tokens are cacheWrite", () => { + expect(computeExpectedCacheRate(0, 500)).toBe(0); + }); + + it("rounds to nearest integer", () => { + expect(computeExpectedCacheRate(1, 2)).toBe(33); + expect(computeExpectedCacheRate(2, 1)).toBe(67); + }); +}); diff --git a/packages/transport-http/src/logic.ts b/packages/transport-http/src/logic.ts index bb827e2..bddedf0 100644 --- a/packages/transport-http/src/logic.ts +++ b/packages/transport-http/src/logic.ts @@ -113,3 +113,12 @@ export function computeCachePct(inputTokens: number, cacheReadTokens: number): n if (inputTokens <= 0) return 0; return Math.round(Math.max(0, Math.min(1, cacheReadTokens / inputTokens)) * 100); } + +export function computeExpectedCacheRate( + cacheReadTokens: number, + cacheWriteTokens: number, +): number { + const denom = cacheReadTokens + cacheWriteTokens; + if (denom <= 0) return 0; + return Math.round((cacheReadTokens / denom) * 100); +} @@ -162,6 +162,14 @@ arm-on-settle/cancel-on-start; `pct = round(clamp(cacheRead/input,0,1)*100)`). - **LIVE-VERIFIED against Claude haiku:** automatic timer warm → journal `warm complete pct:100`; manual `POST /chat/warm` → `cacheReadTokens:6799, cachePct:100` (100% hit), HTTP 200. The external `../claude` provider-anthropic is loaded via `bin/up` (`DISPATCH_EXTERNAL_EXTENSIONS`). +- **Cache-metric fix + retention metric:** `provider-anthropic` (in `../claude`, commit `0e9d118`) + now reports `Usage.inputTokens` as the TOTAL prompt (was the uncached remainder → the cache rate + inflated/clamped to 100% on Claude). So `cacheRead/inputTokens` is now the true rate (live: a turn + adding new content reads 61%, not 100%). Added **`expectedCacheRate`** = `cacheRead/(cacheRead+ + cacheWrite)` (retention/health, ~100% when warm, 0% when the cache expired) to `WarmResponse` + + `POST /chat/warm` + the cache-warming surface (a "cache retention" stat). Live-verified: warm + within TTL → 100%; warm after >5 min idle → 0% (cache expired). FE handoff updated with both + metrics + the cross-turn real-turn `expectedCache = cacheRead_N/(cacheRead_{N-1}+cacheWrite_{N-1})`. - **Surface framework extended (DONE):** added `NumberField` to `ui-contract` + per-conversation surface scoping (optional `conversationId` on subscribe/unsubscribe/invoke + surface/update; new `SurfaceContext` on `SurfaceProvider.getSpec/invoke`; transport-ws keys subscriptions by |
