summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-06-11 14:11:13 +0900
committerAdam Malczewski <[email protected]>2026-06-11 14:11:13 +0900
commit7ffb6b28f5b6bdbfc53ebed94fc68af557612189 (patch)
treee66d9ea9d326ef771cc473d81ca5716ff78b08a8
parent763e5fb1c7fbfb4c7bbd43ffb935e42e5f5b5a42 (diff)
downloaddispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.tar.gz
dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.zip
fix(cache-warming): accurate cache rate + expectedCacheRate (retention) metric
The Claude cache % read 100% whenever anything was cached, because the metric's denominator (inputTokens) excluded cached tokens on Anthropic. Fixed upstream in ../claude/provider-anthropic (inputTokens = total prompt); this commit adds the companion retention metric and exposes it: - transport-contract: WarmResponse += expectedCacheRate - transport-http: POST /chat/warm returns expectedCacheRate = cacheRead/(cacheRead+cacheWrite) - cache-warming: computeExpectedCacheRate + a per-conversation 'cache retention' surface stat - handoff: documents the fix + cache-rate vs expected-cache (cross-turn) for the FE Live-verified vs claude haiku: real turn cache rate 61% (was inflated 100%); warm within TTL expectedCacheRate=100%, after expiry=0%.
-rw-r--r--frontend-cache-warming-handoff.md59
-rw-r--r--packages/cache-warming/src/extension.ts7
-rw-r--r--packages/cache-warming/src/index.ts1
-rw-r--r--packages/cache-warming/src/pure.test.ts44
-rw-r--r--packages/cache-warming/src/pure.ts25
-rw-r--r--packages/cache-warming/src/warmer.test.ts47
-rw-r--r--packages/cache-warming/src/warmer.ts6
-rw-r--r--packages/transport-contract/src/index.ts15
-rw-r--r--packages/transport-http/src/app.test.ts52
-rw-r--r--packages/transport-http/src/app.ts2
-rw-r--r--packages/transport-http/src/logic.test.ts24
-rw-r--r--packages/transport-http/src/logic.ts9
-rw-r--r--tasks.md8
13 files changed, 280 insertions, 19 deletions
diff --git a/frontend-cache-warming-handoff.md b/frontend-cache-warming-handoff.md
index e5d50b3..64b94d6 100644
--- a/frontend-cache-warming-handoff.md
+++ b/frontend-cache-warming-handoff.md
@@ -69,7 +69,8 @@ scoped** (state differs per conversation, e.g. cache-warming). To support the la
|---|---|---|---|
| `toggle` | enabled on/off | warming on for this conversation | `cache-warming/toggle` |
| `number` | refresh interval | **seconds** (`unit:"s"`, `min:1`, `step:1`, no `max` = free value) | `cache-warming/set-interval` |
- | `stat` | last cache % | most recent warm's hit % (`"—"` when none yet) | — (read-only) |
+ | `stat` | last cache rate | most recent warm's `cachePct` (`"—"` when none yet) | — (read-only) |
+ | `stat` | cache retention | most recent warm's `expectedCacheRate` — the **health** signal (~100% = cache stayed warm; 0% = it expired) | — (read-only) |
- **Invoke payloads:**
- `cache-warming/toggle` → **flips** the current enabled state. Send `{ type: "invoke", surfaceId:
"cache-warming", actionId: "cache-warming/toggle", conversationId }` (payload is ignored — it
@@ -86,17 +87,21 @@ For an on-demand warm (e.g. a button) without waiting for the automatic timer:
```
POST /chat/warm
body WarmRequest { conversationId: string; model?: string; cwd?: string }
- 200 WarmResponse { inputTokens; outputTokens; cacheReadTokens; cacheWriteTokens; cachePct }
+ 200 WarmResponse { inputTokens; outputTokens; cacheReadTokens; cacheWriteTokens;
+ cachePct; expectedCacheRate }
409 { error } // the conversation is currently generating — try again when idle
400 { error } // missing/invalid conversationId
```
- Pass the **same `model`** (`<credentialName>/<model>`) the conversation chats with, so the warm
request's prefix matches the real turn (that's what makes the cache hit). `cwd` only matters if the
conversation uses cwd-scoped tools.
-- `cachePct` = `round(clamp(cacheReadTokens / inputTokens, 0, 1) * 100)` — show it as the "last
- warming" hit indicator. The warm is **never** persisted or streamed and is **never** folded into
- the conversation's real usage/cache-rate (keep it visually distinct from the real cache rate in
- §`frontend-cache-rate-handoff.md`).
+- `cachePct` = `round(cacheReadTokens / inputTokens * 100)` — the cache RATE of the warm request.
+- `expectedCacheRate` = `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` — the
+ **retention / health** signal: ~**100%** when the cache was still warm (read back, ~nothing
+ rewritten), **0%** when it had expired (rewrote everything). This is the one to headline for a
+ "is warming working?" indicator.
+- The warm is **never** persisted or streamed and is **never** folded into the conversation's real
+ usage/cache-rate (keep it visually distinct from the real cache rate in §F / `frontend-cache-rate-handoff.md`).
- Types live in `@dispatch/transport-contract` (`WarmRequest`, `WarmResponse`).
## E. Behavior model (for the UX)
@@ -108,8 +113,46 @@ POST /chat/warm
- Verified live against Claude (`claude/claude-haiku-4-5-...`): an idle conversation's warm reports
~100% cache read once its prefix exceeds the provider's min-cacheable size.
+## F. Cache-rate metric — a correctness fix + the "expected cache" metric (READ THIS)
+A backend bug made the cache-hit % read **100% on Claude whenever anything was cached** (it inflated).
+Root cause: Anthropic's `input_tokens` is the *uncached remainder*, with cache read/creation reported
+separately — but the wire `Usage.inputTokens` convention (which the flash/OpenAI-compat provider
+already follows) is the **TOTAL prompt incl. cached**. Fixed in `../claude/provider-anthropic`
+(`inputTokens = input + cacheRead + cacheWrite`). **No FE change needed** — your existing
+`cacheRead/inputTokens` math (see `frontend-cache-rate-handoff.md`) now yields the *true* rate on
+Claude. (Note: that older handoff's caveat "cacheWriteTokens is usually absent" is **not** true for
+Claude — it reports both.)
+
+Two distinct cache numbers — show them as different things:
+- **Cache rate** = `cacheReadTokens / inputTokens` — *what fraction of THIS turn's prompt came from
+ cache*. It legitimately **drops when a turn adds a lot of new content** (e.g. a turn that pastes a
+ big file reads back the old prefix but also writes the new file → rate < 100%). This is the
+ per-turn efficiency number, available on every `usage`/`done` event and in persisted metrics.
+- **Expected cache (retention)** = *of the cache that existed going into this turn, how much did we
+ read back* — ideally **~100% every turn after the first** (you re-read the entire prefix you
+ cached). It is a **cross-turn** derivation:
+ ```
+ expectedCacheRate(turn N) = cacheRead_N / (cacheRead_{N-1} + cacheWrite_{N-1}) // clamp [0,1]
+ ```
+ (denominator = the prior turn's cached prefix = what it read + what it wrote). **<100% means the
+ cache busted/expired** between turns. The FE derives this from two consecutive turns' usage (which
+ you already have, live + persisted). For the WARM endpoint/surface this same idea is the single-shot
+ `expectedCacheRate` (§C/§D) the backend already computes.
+
+**Worked example (live, Claude haiku), one chat, two real turns:**
+| turn | inputTokens (total) | cacheRead | cacheWrite | cache rate `cr/input` | expected cache (cross-turn) |
+|---|---|---|---|---|---|
+| 1 (fresh) | 5149 | 0 | 5146 | 0% | — |
+| 2 (new msg) | 8462 | 5146 | 3313 | **61%** | `5146/(0+5146)` = **100%** |
+
+So on turn 2 the prompt was 61% cache (the rest was the new message), yet you successfully read back
+**100%** of what turn 1 cached — two true, complementary signals. (Pre-fix, the rate wrongly showed
+100% because the denominator excluded the 5146 cached tokens.)
+
## Versions / type references
- `@dispatch/ui-contract`: `NumberField` (new `SurfaceField` variant); `conversationId?` on
`SubscribeMessage`/`UnsubscribeMessage`/`InvokeMessage`/`SurfaceMessage`/`SurfaceUpdate`.
-- `@dispatch/transport-contract`: `WarmRequest`, `WarmResponse`.
-- Cache-% math + the real (non-warming) cache rate: see `frontend-cache-rate-handoff.md` (unchanged).
+- `@dispatch/transport-contract`: `WarmRequest`, `WarmResponse` (now incl. `expectedCacheRate`).
+- Cache-% fix: `../claude/provider-anthropic` now reports `inputTokens` as the total prompt — the
+ real (non-warming) cache rate in `frontend-cache-rate-handoff.md` becomes accurate on Claude with
+ no FE change; ignore that doc's "cacheWriteTokens usually absent" caveat for Claude.
diff --git a/packages/cache-warming/src/extension.ts b/packages/cache-warming/src/extension.ts
index 26d429b..802618a 100644
--- a/packages/cache-warming/src/extension.ts
+++ b/packages/cache-warming/src/extension.ts
@@ -77,7 +77,12 @@ export function activate(host: HostAPI): void {
return buildDefaultSpec();
}
const state = warmer.getState(convId);
- return buildConversationSpec(state.enabled, state.intervalMs, state.lastPct);
+ return buildConversationSpec(
+ state.enabled,
+ state.intervalMs,
+ state.lastPct,
+ state.lastExpectedPct,
+ );
}
async function invoke(
diff --git a/packages/cache-warming/src/index.ts b/packages/cache-warming/src/index.ts
index d77f4ec..88cab3b 100644
--- a/packages/cache-warming/src/index.ts
+++ b/packages/cache-warming/src/index.ts
@@ -5,6 +5,7 @@ export {
type ConversationSettings,
type ConversationState,
computeCachePct,
+ computeExpectedCacheRate,
DEFAULT_INTERVAL_MS,
isTokenCurrent,
MIN_INTERVAL_MS,
diff --git a/packages/cache-warming/src/pure.test.ts b/packages/cache-warming/src/pure.test.ts
index 1c912f2..f5e2f1d 100644
--- a/packages/cache-warming/src/pure.test.ts
+++ b/packages/cache-warming/src/pure.test.ts
@@ -4,6 +4,7 @@ import {
buildConversationSpec,
buildDefaultSpec,
computeCachePct,
+ computeExpectedCacheRate,
isTokenCurrent,
MIN_INTERVAL_MS,
msToSeconds,
@@ -29,6 +30,20 @@ describe("computeCachePct", () => {
});
});
+describe("computeExpectedCacheRate", () => {
+ it("cacheRead/(cacheRead+cacheWrite) rounded", () => {
+ expect(computeExpectedCacheRate(800, 200)).toBe(80);
+ expect(computeExpectedCacheRate(500, 500)).toBe(50);
+ expect(computeExpectedCacheRate(1000, 0)).toBe(100);
+ expect(computeExpectedCacheRate(0, 1000)).toBe(0);
+ expect(computeExpectedCacheRate(333, 667)).toBe(33);
+ });
+
+ it("0 when cacheRead+cacheWrite is 0", () => {
+ expect(computeExpectedCacheRate(0, 0)).toBe(0);
+ });
+});
+
describe("shouldWarm", () => {
it("returns true when enabled, idle, and token matches", () => {
const state: ConversationState = {
@@ -36,6 +51,7 @@ describe("shouldWarm", () => {
intervalMs: 240_000,
active: false,
lastPct: null,
+ lastExpectedPct: null,
token: 5,
};
expect(shouldWarm(state, 5)).toBe(true);
@@ -47,6 +63,7 @@ describe("shouldWarm", () => {
intervalMs: 240_000,
active: false,
lastPct: null,
+ lastExpectedPct: null,
token: 5,
};
expect(shouldWarm(state, 5)).toBe(false);
@@ -58,6 +75,7 @@ describe("shouldWarm", () => {
intervalMs: 240_000,
active: true,
lastPct: null,
+ lastExpectedPct: null,
token: 5,
};
expect(shouldWarm(state, 5)).toBe(false);
@@ -69,6 +87,7 @@ describe("shouldWarm", () => {
intervalMs: 240_000,
active: false,
lastPct: null,
+ lastExpectedPct: null,
token: 5,
};
expect(shouldWarm(state, 6)).toBe(false);
@@ -162,12 +181,12 @@ describe("parseIntervalPayload", () => {
});
describe("buildConversationSpec", () => {
- it("builds a per-conversation spec with toggle + number(interval) + last-% fields", () => {
- const spec = buildConversationSpec(true, 240_000, 80);
+ it("builds a per-conversation spec with toggle + number(interval) + last-% + retention fields", () => {
+ const spec = buildConversationSpec(true, 240_000, 80, 95);
expect(spec.id).toBe("cache-warming");
expect(spec.region).toBe("side");
expect(spec.title).toBe("Cache Warming");
- expect(spec.fields).toHaveLength(3);
+ expect(spec.fields).toHaveLength(4);
const toggle = spec.fields[0];
expect(toggle).toEqual({
@@ -194,20 +213,33 @@ describe("buildConversationSpec", () => {
label: "Last Cache %",
value: "80%",
});
+
+ const retention = spec.fields[3];
+ expect(retention).toEqual({
+ kind: "stat",
+ label: "Cache retention",
+ value: "95%",
+ });
});
- it("shows — when lastPct is null", () => {
- const spec = buildConversationSpec(true, 240_000, null);
+ it("shows — when lastPct and lastExpectedPct are null", () => {
+ const spec = buildConversationSpec(true, 240_000, null, null);
const stat = spec.fields[2];
expect(stat).toEqual({
kind: "stat",
label: "Last Cache %",
value: "—",
});
+ const retention = spec.fields[3];
+ expect(retention).toEqual({
+ kind: "stat",
+ label: "Cache retention",
+ value: "—",
+ });
});
it("reflects disabled state", () => {
- const spec = buildConversationSpec(false, 120_000, 50);
+ const spec = buildConversationSpec(false, 120_000, 50, 75);
const toggle = spec.fields[0];
expect(toggle).toEqual({
kind: "toggle",
diff --git a/packages/cache-warming/src/pure.ts b/packages/cache-warming/src/pure.ts
index 7b91b11..ab6fc79 100644
--- a/packages/cache-warming/src/pure.ts
+++ b/packages/cache-warming/src/pure.ts
@@ -17,6 +17,7 @@ export interface ConversationSettings {
export interface ConversationState extends ConversationSettings {
readonly active: boolean;
readonly lastPct: number | null;
+ readonly lastExpectedPct: number | null;
readonly token: number;
}
@@ -43,6 +44,21 @@ export function computeCachePct(inputTokens: number, cacheReadTokens: number): n
}
/**
+ * Compute expected cache retention rate from token counts.
+ * Of the cacheable prefix the warm touched, how much was still warm (read back)
+ * vs. had to be (re)written.
+ * Returns an integer in [0, 100]. cacheRead + cacheWrite ≤ 0 → 0.
+ */
+export function computeExpectedCacheRate(
+ cacheReadTokens: number,
+ cacheWriteTokens: number,
+): number {
+ const total = cacheReadTokens + cacheWriteTokens;
+ if (total <= 0) return 0;
+ return Math.round((cacheReadTokens / total) * 100);
+}
+
+/**
* Decide whether a conversation should be warmed right now.
* Requires: enabled, idle (not active), and the token is current (not superseded).
*/
@@ -120,8 +136,10 @@ export function buildConversationSpec(
enabled: boolean,
intervalMs: number,
lastPct: number | null,
+ lastExpectedPct: number | null,
): SurfaceSpec {
const pctDisplay = lastPct === null ? "—" : `${lastPct}%`;
+ const retentionDisplay = lastExpectedPct === null ? "—" : `${lastExpectedPct}%`;
const toggle: ToggleField = {
kind: "toggle",
label: "Enabled",
@@ -142,11 +160,16 @@ export function buildConversationSpec(
label: "Last Cache %",
value: pctDisplay,
};
+ const retentionStat: StatField = {
+ kind: "stat",
+ label: "Cache retention",
+ value: retentionDisplay,
+ };
return {
id: "cache-warming",
region: "side",
title: "Cache Warming",
- fields: [toggle, interval, stat],
+ fields: [toggle, interval, stat, retentionStat],
};
}
diff --git a/packages/cache-warming/src/warmer.test.ts b/packages/cache-warming/src/warmer.test.ts
index 9865877..86908a2 100644
--- a/packages/cache-warming/src/warmer.test.ts
+++ b/packages/cache-warming/src/warmer.test.ts
@@ -182,6 +182,30 @@ describe("CacheWarmer", () => {
expect(state.lastPct).toBe(80);
});
+ it("a completed warm stores both lastPct (rate) and lastExpectedPct (retention)", async () => {
+ const timers = fakeTimers();
+ const warmer = createCacheWarmer({
+ warm: async () => ({
+ inputTokens: 1000,
+ outputTokens: 10,
+ cacheReadTokens: 700,
+ cacheWriteTokens: 300,
+ }),
+ storage: memStorage(),
+ logger: makeLogger(),
+ timers,
+ onSurfaceChange: () => {},
+ });
+
+ warmer.onTurnSettled("conv-1", {});
+ timers.flush();
+
+ await new Promise((r) => setTimeout(r, 10));
+ const state = warmer.getState("conv-1");
+ expect(state.lastPct).toBe(70);
+ expect(state.lastExpectedPct).toBe(70);
+ });
+
it("re-arms timer after warm completes", async () => {
const timers = fakeTimers();
let warmCount = 0;
@@ -316,4 +340,27 @@ describe("CacheWarmer", () => {
await warmer.setIntervalMs("conv-1", 30_000);
expect(changeCount).toBe(2);
});
+
+ it("the per-conversation spec includes a cache-retention stat", async () => {
+ const timers = fakeTimers();
+ const warmer = createCacheWarmer({
+ warm: async () => ({
+ inputTokens: 1000,
+ outputTokens: 10,
+ cacheReadTokens: 900,
+ cacheWriteTokens: 100,
+ }),
+ storage: memStorage(),
+ logger: makeLogger(),
+ timers,
+ onSurfaceChange: () => {},
+ });
+
+ warmer.onTurnSettled("conv-1", {});
+ timers.flush();
+ await new Promise((r) => setTimeout(r, 10));
+
+ const state = warmer.getState("conv-1");
+ expect(state.lastExpectedPct).toBe(90);
+ });
});
diff --git a/packages/cache-warming/src/warmer.ts b/packages/cache-warming/src/warmer.ts
index 31dd41e..f50f346 100644
--- a/packages/cache-warming/src/warmer.ts
+++ b/packages/cache-warming/src/warmer.ts
@@ -5,6 +5,7 @@ import {
type ConversationSettings,
type ConversationState,
computeCachePct,
+ computeExpectedCacheRate,
DEFAULT_INTERVAL_MS,
isTokenCurrent,
MIN_INTERVAL_MS,
@@ -63,6 +64,7 @@ const DEFAULT_STATE: ConversationState = {
intervalMs: DEFAULT_INTERVAL_MS,
active: false,
lastPct: null,
+ lastExpectedPct: null,
token: 0,
};
@@ -145,11 +147,13 @@ export function createCacheWarmer(deps: CacheWarmerDeps): CacheWarmer {
});
} else {
const pct = computeCachePct(result.inputTokens, result.cacheReadTokens);
- setState(conversationId, { ...currentState, lastPct: pct });
+ const expectedPct = computeExpectedCacheRate(result.cacheReadTokens, result.cacheWriteTokens);
+ setState(conversationId, { ...currentState, lastPct: pct, lastExpectedPct: expectedPct });
deps.onSurfaceChange();
deps.logger.debug("cache-warming: warm complete", {
conversationId,
pct,
+ expectedPct,
});
}
diff --git a/packages/transport-contract/src/index.ts b/packages/transport-contract/src/index.ts
index fbb61fc..95111ae 100644
--- a/packages/transport-contract/src/index.ts
+++ b/packages/transport-contract/src/index.ts
@@ -192,10 +192,21 @@ export interface WarmResponse {
readonly cacheReadTokens: number;
readonly cacheWriteTokens: number;
/**
- * Cache-hit percent: `round(clamp(cacheReadTokens / inputTokens, 0, 1) * 100)`
- * (0 when `inputTokens <= 0`).
+ * **Cache rate** — what fraction of THIS request's prompt was served from cache:
+ * `round(cacheReadTokens / inputTokens * 100)` (0 when `inputTokens <= 0`).
+ * (`inputTokens` is the TOTAL prompt incl. cached, so this is in [0,100].)
*/
readonly cachePct: number;
+ /**
+ * **Expected cache (retention)** — of the cacheable prefix this warm touched, how
+ * much was still warm and read back vs. had to be (re)written:
+ * `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` (0 when the
+ * sum is 0). For a healthy warm this is ~**100%** (the whole prefix was still
+ * cached); it drops toward 0 as the cache expires/busts and the warm has to rewrite
+ * it. This is the warming HEALTH signal — distinct from `cachePct` (which a warm's
+ * tiny fresh probe makes ~equal, but which on a real turn reflects new content).
+ */
+ readonly expectedCacheRate: number;
}
// ─── WebSocket chat ops ───────────────────────────────────────────────────────
diff --git a/packages/transport-http/src/app.test.ts b/packages/transport-http/src/app.test.ts
index 7352b5d..22b26fc 100644
--- a/packages/transport-http/src/app.test.ts
+++ b/packages/transport-http/src/app.test.ts
@@ -449,12 +449,64 @@ describe("POST /chat/warm", () => {
cacheReadTokens: number;
cacheWriteTokens: number;
cachePct: number;
+ expectedCacheRate: number;
};
expect(body.inputTokens).toBe(1000);
expect(body.outputTokens).toBe(200);
expect(body.cacheReadTokens).toBe(800);
expect(body.cacheWriteTokens).toBe(100);
expect(body.cachePct).toBe(80);
+ expect(body.expectedCacheRate).toBe(89);
+ });
+
+ it("POST /chat/warm returns expectedCacheRate = round(cacheRead/(cacheRead+cacheWrite)*100)", async () => {
+ const app = createApp({
+ conversationStore: createFakeConversationStore(),
+ orchestrator: createFakeOrchestrator([]),
+ credentialStore: createFakeCredentialStore([]),
+ warmService: createFakeWarmService({
+ inputTokens: 500,
+ outputTokens: 100,
+ cacheReadTokens: 400,
+ cacheWriteTokens: 100,
+ }),
+ logger: noopLogger,
+ });
+
+ const res = await app.request("/chat/warm", {
+ method: "POST",
+ headers: { "Content-Type": "application/json" },
+ body: JSON.stringify({ conversationId: "conv1" }),
+ });
+
+ expect(res.status).toBe(200);
+ const body = (await res.json()) as { expectedCacheRate: number };
+ expect(body.expectedCacheRate).toBe(80);
+ });
+
+ it("POST /chat/warm returns expectedCacheRate = 0 when cacheRead+cacheWrite is 0", async () => {
+ const app = createApp({
+ conversationStore: createFakeConversationStore(),
+ orchestrator: createFakeOrchestrator([]),
+ credentialStore: createFakeCredentialStore([]),
+ warmService: createFakeWarmService({
+ inputTokens: 100,
+ outputTokens: 50,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 0,
+ }),
+ logger: noopLogger,
+ });
+
+ const res = await app.request("/chat/warm", {
+ method: "POST",
+ headers: { "Content-Type": "application/json" },
+ body: JSON.stringify({ conversationId: "conv1" }),
+ });
+
+ expect(res.status).toBe(200);
+ const body = (await res.json()) as { expectedCacheRate: number };
+ expect(body.expectedCacheRate).toBe(0);
});
it("POST /chat/warm returns 409 when the warm service reports the conversation is generating", async () => {
diff --git a/packages/transport-http/src/app.ts b/packages/transport-http/src/app.ts
index a8cef51..84c7d20 100644
--- a/packages/transport-http/src/app.ts
+++ b/packages/transport-http/src/app.ts
@@ -10,6 +10,7 @@ import { Hono } from "hono";
import { cors } from "hono/cors";
import {
computeCachePct,
+ computeExpectedCacheRate,
isParseError,
isSinceSeqError,
parseChatBody,
@@ -284,6 +285,7 @@ export function createApp(opts: CreateServerOptions): Hono {
cacheReadTokens: result.cacheReadTokens,
cacheWriteTokens: result.cacheWriteTokens,
cachePct: computeCachePct(result.inputTokens, result.cacheReadTokens),
+ expectedCacheRate: computeExpectedCacheRate(result.cacheReadTokens, result.cacheWriteTokens),
};
return c.json(response, 200);
});
diff --git a/packages/transport-http/src/logic.test.ts b/packages/transport-http/src/logic.test.ts
index 1e33f40..19b47ef 100644
--- a/packages/transport-http/src/logic.test.ts
+++ b/packages/transport-http/src/logic.test.ts
@@ -1,6 +1,7 @@
import type { AgentEvent } from "@dispatch/kernel";
import { describe, expect, it } from "vitest";
import {
+ computeExpectedCacheRate,
isParseError,
isSinceSeqError,
parseChatBody,
@@ -197,3 +198,26 @@ describe("serializeEventLine", () => {
expect(parsed.reason).toBe("stop");
});
});
+
+describe("computeExpectedCacheRate", () => {
+ it("returns round(cacheRead/(cacheRead+cacheWrite)*100)", () => {
+ expect(computeExpectedCacheRate(800, 200)).toBe(80);
+ });
+
+ it("returns 0 when cacheRead+cacheWrite is 0", () => {
+ expect(computeExpectedCacheRate(0, 0)).toBe(0);
+ });
+
+ it("returns 100 when all tokens are cacheRead", () => {
+ expect(computeExpectedCacheRate(500, 0)).toBe(100);
+ });
+
+ it("returns 0 when all tokens are cacheWrite", () => {
+ expect(computeExpectedCacheRate(0, 500)).toBe(0);
+ });
+
+ it("rounds to nearest integer", () => {
+ expect(computeExpectedCacheRate(1, 2)).toBe(33);
+ expect(computeExpectedCacheRate(2, 1)).toBe(67);
+ });
+});
diff --git a/packages/transport-http/src/logic.ts b/packages/transport-http/src/logic.ts
index bb827e2..bddedf0 100644
--- a/packages/transport-http/src/logic.ts
+++ b/packages/transport-http/src/logic.ts
@@ -113,3 +113,12 @@ export function computeCachePct(inputTokens: number, cacheReadTokens: number): n
if (inputTokens <= 0) return 0;
return Math.round(Math.max(0, Math.min(1, cacheReadTokens / inputTokens)) * 100);
}
+
+export function computeExpectedCacheRate(
+ cacheReadTokens: number,
+ cacheWriteTokens: number,
+): number {
+ const denom = cacheReadTokens + cacheWriteTokens;
+ if (denom <= 0) return 0;
+ return Math.round((cacheReadTokens / denom) * 100);
+}
diff --git a/tasks.md b/tasks.md
index c94b156..6fd3676 100644
--- a/tasks.md
+++ b/tasks.md
@@ -162,6 +162,14 @@ arm-on-settle/cancel-on-start; `pct = round(clamp(cacheRead/input,0,1)*100)`).
- **LIVE-VERIFIED against Claude haiku:** automatic timer warm → journal `warm complete pct:100`;
manual `POST /chat/warm` → `cacheReadTokens:6799, cachePct:100` (100% hit), HTTP 200. The external
`../claude` provider-anthropic is loaded via `bin/up` (`DISPATCH_EXTERNAL_EXTENSIONS`).
+- **Cache-metric fix + retention metric:** `provider-anthropic` (in `../claude`, commit `0e9d118`)
+ now reports `Usage.inputTokens` as the TOTAL prompt (was the uncached remainder → the cache rate
+ inflated/clamped to 100% on Claude). So `cacheRead/inputTokens` is now the true rate (live: a turn
+ adding new content reads 61%, not 100%). Added **`expectedCacheRate`** = `cacheRead/(cacheRead+
+ cacheWrite)` (retention/health, ~100% when warm, 0% when the cache expired) to `WarmResponse` +
+ `POST /chat/warm` + the cache-warming surface (a "cache retention" stat). Live-verified: warm
+ within TTL → 100%; warm after >5 min idle → 0% (cache expired). FE handoff updated with both
+ metrics + the cross-turn real-turn `expectedCache = cacheRead_N/(cacheRead_{N-1}+cacheWrite_{N-1})`.
- **Surface framework extended (DONE):** added `NumberField` to `ui-contract` + per-conversation
surface scoping (optional `conversationId` on subscribe/unsubscribe/invoke + surface/update; new
`SurfaceContext` on `SurfaceProvider.getSpec/invoke`; transport-ws keys subscriptions by