fix(cache-warming): accurate cache rate + expectedCacheRate (retention) metric

The Claude cache % read 100% whenever anything was cached, because the metric's denominator (inputTokens) excluded cached tokens on Anthropic. Fixed upstream in ../claude/provider-anthropic (inputTokens = total prompt); this commit adds the companion retention metric and exposes it: - transport-contract: WarmResponse += expectedCacheRate - transport-http: POST /chat/warm returns expectedCacheRate = cacheRead/(cacheRead+cacheWrite) - cache-warming: computeExpectedCacheRate + a per-conversation 'cache retention' surface stat - handoff: documents the fix + cache-rate vs expected-cache (cross-turn) for the FE Live-verified vs claude haiku: real turn cache rate 61% (was inflated 100%); warm within TTL expectedCacheRate=100%, after expiry=0%.
author: Adam Malczewski <[email protected]> 2026-06-11 14:11:13 +0900
committer: Adam Malczewski <[email protected]> 2026-06-11 14:11:13 +0900
commit: 7ffb6b28f5b6bdbfc53ebed94fc68af557612189 (patch)
tree: e66d9ea9d326ef771cc473d81ca5716ff78b08a8 /tasks.md
parent: 763e5fb1c7fbfb4c7bbd43ffb935e42e5f5b5a42 (diff)
download: dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.tar.gz
dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.zip
1 files changed, 8 insertions, 0 deletions
diff --git a/tasks.md b/tasks.md
index c94b156..6fd3676 100644
--- a/tasks.md
+++ b/tasks.md
@@ -162,6 +162,14 @@ arm-on-settle/cancel-on-start; `pct = round(clamp(cacheRead/input,0,1)*100)`).
 - **LIVE-VERIFIED against Claude haiku:** automatic timer warm → journal `warm complete pct:100`;
   manual `POST /chat/warm` → `cacheReadTokens:6799, cachePct:100` (100% hit), HTTP 200. The external
   `../claude` provider-anthropic is loaded via `bin/up` (`DISPATCH_EXTERNAL_EXTENSIONS`).
+- **Cache-metric fix + retention metric:** `provider-anthropic` (in `../claude`, commit `0e9d118`)
+  now reports `Usage.inputTokens` as the TOTAL prompt (was the uncached remainder → the cache rate
+  inflated/clamped to 100% on Claude). So `cacheRead/inputTokens` is now the true rate (live: a turn
+  adding new content reads 61%, not 100%). Added **`expectedCacheRate`** = `cacheRead/(cacheRead+
+  cacheWrite)` (retention/health, ~100% when warm, 0% when the cache expired) to `WarmResponse` +
+  `POST /chat/warm` + the cache-warming surface (a "cache retention" stat). Live-verified: warm
+  within TTL → 100%; warm after >5 min idle → 0% (cache expired). FE handoff updated with both
+  metrics + the cross-turn real-turn `expectedCache = cacheRead_N/(cacheRead_{N-1}+cacheWrite_{N-1})`.
 - **Surface framework extended (DONE):** added `NumberField` to `ui-contract` + per-conversation
   surface scoping (optional `conversationId` on subscribe/unsubscribe/invoke + surface/update; new
   `SurfaceContext` on `SurfaceProvider.getSpec/invoke`; transport-ws keys subscriptions by
author	Adam Malczewski <[email protected]>	2026-06-11 14:11:13 +0900
committer	Adam Malczewski <[email protected]>	2026-06-11 14:11:13 +0900
commit	7ffb6b28f5b6bdbfc53ebed94fc68af557612189 (patch)
tree	e66d9ea9d326ef771cc473d81ca5716ff78b08a8 /tasks.md
parent	763e5fb1c7fbfb4c7bbd43ffb935e42e5f5b5a42 (diff)
download	dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.tar.gz dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.zip