summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-06-07 17:09:03 +0900
committerAdam Malczewski <[email protected]>2026-06-07 17:09:03 +0900
commit5746cf4e545cd5b0d7faf0595554f273f236f3a9 (patch)
tree316e4c9ceff0e817767c8712eef9190949aa06b0
parent3ecc9778fe278d6665b1e9a918f44c16f6992b87 (diff)
downloaddispatch-5746cf4e545cd5b0d7faf0595554f273f236f3a9.tar.gz
dispatch-5746cf4e545cd5b0d7faf0595554f273f236f3a9.zip
docs(tasks): record per-step TTFT+decode timing done + live-verified
-rw-r--r--tasks.md19
1 files changed, 19 insertions, 0 deletions
diff --git a/tasks.md b/tasks.md
index 599833d..89662b9 100644
--- a/tasks.md
+++ b/tasks.md
@@ -561,6 +561,25 @@ on-chunk provenance (distinct concepts — doc'd in wire + GLOSSARY).
all sharing `stepId` `turn-…#0`; 4 persisted tool chunks all carry it; live↔persisted cross-match ✓.
The FE's exact use case (group a parallel batch by `stepId`) proven end-to-end.
+### Per-step TTFT + decode timing (observability) [x] DONE + verified live (`3ecc977`)
+User ask (granularity follow-up): isolate time-to-first-token from generation time. Decisions
+(user, §5.2): **observability-only** (trace-store, NO wire/contract change); **kernel/step
+measured** (provider-agnostic, self-consistent); **first token = first text OR reasoning delta**.
+- **kernel-runtime (owner, mimo-v2.5-pro):** in `executeStep`, opens a `ttft` child span of the
+ step at stream start, ends it on the first text/reasoning delta (`firstToken:true`) and opens a
+ `decode` child span (first token → stream end). `decode = generation total − TTFT`; both
+ `durationMs` retrievable from the trace-store. No-content step (tool-call-only / pre-first-token
+ error) ends `ttft` with `firstToken:false` and emits no misleading `decode`. **Span-based — no
+ clock injection, no contract change** (runtime has no direct clock; logger's injected `now`
+ drives span durations → deterministic tests). +3 runtime tests. (Agent left reports/ stale; impl
+ independently verified.) prompts/ttft-kernel-runtime.md.
+- TTFT is inherently per-step (each round-trip re-prefills): step 0's `ttft` = the turn's
+ user-visible first-token latency; Σ`decode` = generation minus all prefill; Σ`ttft` = total prefill.
+- **Verified (orchestrator):** typecheck EXIT 0, **512 vitest** (509→+3), biome 0/0, in-lane, no
+ internal mocks. **Live (host-bin :24234, real flash):** journal carries balanced `ttft`/`decode`
+ spans with valid `durationMs` (ttft 1090ms, decode 1673ms) + `firstToken:true`. GLOSSARY: TTFT,
+ decode time. NOT on the wire (clients don't receive it) — a future wire+FE step if desired.
+
### 3. dedup / storage growth (after frontend)
The deferred trace-body de-duplication + rotation/compression (D5 volume-control +
`prefix.fingerprint` + §6 retention strategy) — already designed in