diff options
| author | Adam Malczewski <[email protected]> | 2026-06-07 17:47:58 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-07 17:47:58 +0900 |
| commit | 4248cd1d546a4c1fb4e68940c11b5e309c2c2736 (patch) | |
| tree | 50bbfda39bb1c52f154f44c7656342241527a2ab | |
| parent | 7c459c7d919d1e08a228e8abc56129be174d8abe (diff) | |
| download | dispatch-4248cd1d546a4c1fb4e68940c11b5e309c2c2736.tar.gz dispatch-4248cd1d546a4c1fb4e68940c11b5e309c2c2736.zip | |
docs(tasks): record live metrics Pass 1 done + live-verified
| -rw-r--r-- | tasks.md | 31 |
1 files changed, 16 insertions, 15 deletions
@@ -580,7 +580,7 @@ measured** (provider-agnostic, self-consistent); **first token = first text OR r spans with valid `durationMs` (ttft 1090ms, decode 1673ms) + `firstToken:true`. GLOSSARY: TTFT, decode time. NOT on the wire (clients don't receive it) — a future wire+FE step if desired. -### Expose backend metrics to clients (timing/tokens) — IN PROGRESS +### Expose backend metrics to clients (timing/tokens) — Pass 1 DONE + verified live (`7c459c7`) User ask: surface the backend's authoritative metrics (tokens, TTFT/decode, TPS, tool-exec + turn durations) to clients (CLI/web FE). Decisions (user, §5.2): - **Delivery = inline in the existing chat stream**, as distinct `AgentEvent` types/fields — @@ -589,7 +589,7 @@ turn durations) to clients (CLI/web FE). Decisions (user, §5.2): - **Carrier = a new `step-complete` event** (per-step end, ordering-safe timing) + additive fields. - **Scope = LIVE now**; persisted-for-replay is a documented fast-follow (below). -#### Pass 1 — LIVE stream metrics [~] IN PROGRESS +#### Pass 1 — LIVE stream metrics [x] DONE + verified live Metric set (all wire fields additive/optional): per-step tokens (`usage` += `stepId`), per-step `ttftMs`/`decodeMs`/`genTotalMs` (new `step-complete` event), tool-exec `durationMs` (`tool-result`), turn `durationMs` + aggregate `usage` (`done`). TPS derived FE-side (`outputTokens/decodeMs`); @@ -600,19 +600,20 @@ context-size proxy = `usage.inputTokens` (already present). number` (additive optional clock — runtime has no clock today; needed to put real numbers on the wire). **Whole-graph typecheck clean** → new variant breaks NO consumer (no exhaustive switches; cli/transport unaffected). GLOSSARY: TTFT/decode already added. -- [ ] **Build wave (2 disjoint owner-agents, parallel, mimo-v2.5-pro):** - - **kernel-runtime:** when `now` provided, measure + emit timing — `step-complete` per step - (ttft/decode/genTotal, reusing the just-built first-token detection), `stepId` on `usage`, - `durationMs` on `tool-result`, `durationMs`+`usage` on `done`. Keep the trace spans (may unify - span timing with the numeric measurement). Omit timing gracefully when `now` absent. +tests. - - **session-orchestrator:** add `now?` to `SessionOrchestratorDeps`, thread into the - `RunTurnInput` it builds; provide `() => Date.now()` from the extension's `activate` (shell - edge). +test asserting forwarding. - - **NOT touched (typecheck-confirmed):** transport-http/ws (verbatim pass-through), cli (optional - fields, non-exhaustive switch), host-bin (orchestrator self-injects the clock in activate). -- [ ] **Post-wave (orchestrator):** full typecheck/test/biome; bump `wire`+`transport-contract` - minor (`0.2.0→0.3.0`); regen FE `.dispatch/{wire,transport-contract}.reference.md`; courier reply. - Live: confirm `step-complete`/durations/`usage.stepId` on the real stream. +- [x] **Build wave (2 disjoint owner-agents, parallel, mimo-v2.5-pro):** + - **kernel-runtime:** emits `step-complete` per step (ttft/decode/genTotal), `stepId` on `usage`, + `durationMs` on `tool-result`, `durationMs`+`usage` on `done`; gated on injected `now`; trace + spans kept. +6 tests. reports/kernel-runtime.md. + - **session-orchestrator:** `now?` on `SessionOrchestratorDeps`, threaded into `RunTurnInput`; + extension `activate` injects `() => Date.now()`. +2 tests. reports/session-orchestrator.md. + - **NOT touched (typecheck-confirmed):** transport-http/ws, cli, host-bin. +- [x] **Post-wave (orchestrator):** typecheck EXIT 0, **520 vitest** (512→+8) + 89 bun, biome 0/0, + in-lane, no internal mocks. Bumped `wire`+`transport-contract` `0.2.0→0.3.0`; regen FE + `.dispatch/{wire,transport-contract}.reference.md`; wrote **`frontend-metrics-handoff.md`** (in + arch-rewrite root, for the user to courier — full FE consumption guide). **Live (host-bin :24235, + real flash, tool turn):** 2 `step-complete` (genTotal==ttft+decode ✓), both `usage` carry + `stepId`, `tool-result.durationMs=3`, `done.durationMs=3294`+turn `usage` (cacheRead 768), + derived TPS≈107 tok/s. All green. #### Pass 2 — DEFERRED: persisted metrics for REPLAY Today usage/timing are NOT persisted (conversation-store stores chunks only), so reopening a past |
