diff options
| author | Adam Malczewski <[email protected]> | 2026-06-11 14:11:13 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-11 14:11:13 +0900 |
| commit | 7ffb6b28f5b6bdbfc53ebed94fc68af557612189 (patch) | |
| tree | e66d9ea9d326ef771cc473d81ca5716ff78b08a8 /tasks.md | |
| parent | 763e5fb1c7fbfb4c7bbd43ffb935e42e5f5b5a42 (diff) | |
| download | dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.tar.gz dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.zip | |
fix(cache-warming): accurate cache rate + expectedCacheRate (retention) metric
The Claude cache % read 100% whenever anything was cached, because the metric's
denominator (inputTokens) excluded cached tokens on Anthropic. Fixed upstream in
../claude/provider-anthropic (inputTokens = total prompt); this commit adds the
companion retention metric and exposes it:
- transport-contract: WarmResponse += expectedCacheRate
- transport-http: POST /chat/warm returns expectedCacheRate = cacheRead/(cacheRead+cacheWrite)
- cache-warming: computeExpectedCacheRate + a per-conversation 'cache retention' surface stat
- handoff: documents the fix + cache-rate vs expected-cache (cross-turn) for the FE
Live-verified vs claude haiku: real turn cache rate 61% (was inflated 100%);
warm within TTL expectedCacheRate=100%, after expiry=0%.
Diffstat (limited to 'tasks.md')
| -rw-r--r-- | tasks.md | 8 |
1 files changed, 8 insertions, 0 deletions
@@ -162,6 +162,14 @@ arm-on-settle/cancel-on-start; `pct = round(clamp(cacheRead/input,0,1)*100)`). - **LIVE-VERIFIED against Claude haiku:** automatic timer warm → journal `warm complete pct:100`; manual `POST /chat/warm` → `cacheReadTokens:6799, cachePct:100` (100% hit), HTTP 200. The external `../claude` provider-anthropic is loaded via `bin/up` (`DISPATCH_EXTERNAL_EXTENSIONS`). +- **Cache-metric fix + retention metric:** `provider-anthropic` (in `../claude`, commit `0e9d118`) + now reports `Usage.inputTokens` as the TOTAL prompt (was the uncached remainder → the cache rate + inflated/clamped to 100% on Claude). So `cacheRead/inputTokens` is now the true rate (live: a turn + adding new content reads 61%, not 100%). Added **`expectedCacheRate`** = `cacheRead/(cacheRead+ + cacheWrite)` (retention/health, ~100% when warm, 0% when the cache expired) to `WarmResponse` + + `POST /chat/warm` + the cache-warming surface (a "cache retention" stat). Live-verified: warm + within TTL → 100%; warm after >5 min idle → 0% (cache expired). FE handoff updated with both + metrics + the cross-turn real-turn `expectedCache = cacheRead_N/(cacheRead_{N-1}+cacheWrite_{N-1})`. - **Surface framework extended (DONE):** added `NumberField` to `ui-contract` + per-conversation surface scoping (optional `conversationId` on subscribe/unsubscribe/invoke + surface/update; new `SurfaceContext` on `SurfaceProvider.getSpec/invoke`; transport-ws keys subscriptions by |
