summaryrefslogtreecommitdiffhomepage
path: root/tasks.md
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-06-11 14:11:13 +0900
committerAdam Malczewski <[email protected]>2026-06-11 14:11:13 +0900
commit7ffb6b28f5b6bdbfc53ebed94fc68af557612189 (patch)
treee66d9ea9d326ef771cc473d81ca5716ff78b08a8 /tasks.md
parent763e5fb1c7fbfb4c7bbd43ffb935e42e5f5b5a42 (diff)
downloaddispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.tar.gz
dispatch-7ffb6b28f5b6bdbfc53ebed94fc68af557612189.zip
fix(cache-warming): accurate cache rate + expectedCacheRate (retention) metric
The Claude cache % read 100% whenever anything was cached, because the metric's denominator (inputTokens) excluded cached tokens on Anthropic. Fixed upstream in ../claude/provider-anthropic (inputTokens = total prompt); this commit adds the companion retention metric and exposes it: - transport-contract: WarmResponse += expectedCacheRate - transport-http: POST /chat/warm returns expectedCacheRate = cacheRead/(cacheRead+cacheWrite) - cache-warming: computeExpectedCacheRate + a per-conversation 'cache retention' surface stat - handoff: documents the fix + cache-rate vs expected-cache (cross-turn) for the FE Live-verified vs claude haiku: real turn cache rate 61% (was inflated 100%); warm within TTL expectedCacheRate=100%, after expiry=0%.
Diffstat (limited to 'tasks.md')
-rw-r--r--tasks.md8
1 files changed, 8 insertions, 0 deletions
diff --git a/tasks.md b/tasks.md
index c94b156..6fd3676 100644
--- a/tasks.md
+++ b/tasks.md
@@ -162,6 +162,14 @@ arm-on-settle/cancel-on-start; `pct = round(clamp(cacheRead/input,0,1)*100)`).
- **LIVE-VERIFIED against Claude haiku:** automatic timer warm → journal `warm complete pct:100`;
manual `POST /chat/warm` → `cacheReadTokens:6799, cachePct:100` (100% hit), HTTP 200. The external
`../claude` provider-anthropic is loaded via `bin/up` (`DISPATCH_EXTERNAL_EXTENSIONS`).
+- **Cache-metric fix + retention metric:** `provider-anthropic` (in `../claude`, commit `0e9d118`)
+ now reports `Usage.inputTokens` as the TOTAL prompt (was the uncached remainder → the cache rate
+ inflated/clamped to 100% on Claude). So `cacheRead/inputTokens` is now the true rate (live: a turn
+ adding new content reads 61%, not 100%). Added **`expectedCacheRate`** = `cacheRead/(cacheRead+
+ cacheWrite)` (retention/health, ~100% when warm, 0% when the cache expired) to `WarmResponse` +
+ `POST /chat/warm` + the cache-warming surface (a "cache retention" stat). Live-verified: warm
+ within TTL → 100%; warm after >5 min idle → 0% (cache expired). FE handoff updated with both
+ metrics + the cross-turn real-turn `expectedCache = cacheRead_N/(cacheRead_{N-1}+cacheWrite_{N-1})`.
- **Surface framework extended (DONE):** added `NumberField` to `ui-contract` + per-conversation
surface scoping (optional `conversationId` on subscribe/unsubscribe/invoke + surface/update; new
`SurfaceContext` on `SurfaceProvider.getSpec/invoke`; transport-ws keys subscriptions by