diff options
| author | Adam Malczewski <[email protected]> | 2026-06-12 00:43:55 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-12 00:43:55 +0900 |
| commit | e2646ad2b0c28bd64ad4efd6b154f97f8a35e0ad (patch) | |
| tree | 0f739576bbabf19ebb0e254776188b0950c4f489 | |
| parent | e7eada4802ceebd86c83bcd6e3eca70152e7f331 (diff) | |
| download | dispatch-e2646ad2b0c28bd64ad4efd6b154f97f8a35e0ad.tar.gz dispatch-e2646ad2b0c28bd64ad4efd6b154f97f8a35e0ad.zip | |
feat(metrics): expose current context size to the frontend
contextSize = the turn's FINAL step inputTokens+outputTokens (true context
occupancy; NOT the aggregate usage, which sums per-step prompts and overcounts
multi-step turns). Stamped on both the live done event (kernel) and persisted
TurnMetrics (session-orchestrator); a client reads the latest turn's value.
- @dispatch/wire 0.4.0->0.5.0: optional contextSize on TurnDoneEvent + TurnMetrics
- @dispatch/transport-contract 0.5.0->0.6.0 (re-export only)
- glossary: context size (reserve 'context window' for the model limit, later)
- FE courier: frontend-context-size-handoff.md
881 vitest pass; tsc -b EXIT 0; biome clean.
| -rw-r--r-- | GLOSSARY.md | 1 | ||||
| -rw-r--r-- | frontend-context-size-handoff.md | 47 | ||||
| -rw-r--r-- | packages/kernel/src/runtime/events.ts | 13 | ||||
| -rw-r--r-- | packages/kernel/src/runtime/run-turn.test.ts | 97 | ||||
| -rw-r--r-- | packages/kernel/src/runtime/run-turn.ts | 7 | ||||
| -rw-r--r-- | packages/session-orchestrator/src/metrics.test.ts | 105 | ||||
| -rw-r--r-- | packages/session-orchestrator/src/metrics.ts | 14 | ||||
| -rw-r--r-- | packages/session-orchestrator/src/orchestrator.test.ts | 60 | ||||
| -rw-r--r-- | packages/transport-contract/package.json | 2 | ||||
| -rw-r--r-- | packages/wire/package.json | 2 | ||||
| -rw-r--r-- | packages/wire/src/index.ts | 25 | ||||
| -rw-r--r-- | tasks.md | 22 |
12 files changed, 392 insertions, 3 deletions
diff --git a/GLOSSARY.md b/GLOSSARY.md index 63b7ea8..7564276 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -51,6 +51,7 @@ | **diagnostics** | The errors/warnings/hints a `language server` reports for a file — received both push (`textDocument/publishDiagnostics`) and pull (`textDocument/diagnostic`), then merged + deduped. | lints (when meaning LSP diagnostics) | | **workspace root** | The directory a `language server` is rooted at (its `rootUri` and spawn cwd): the nearest root-marker ancestor of a file, bounded above by the conversation's `working directory`. | project root (when meaning the per-server root) | | **working directory** | The per-conversation filesystem directory that tools and `language server`s operate within (`ToolExecuteContext.cwd`). Persisted per conversation by `conversation-store`; gettable/settable via the cwd endpoint; defaults a turn's cwd when `/chat` omits it. | cwd (spell out on first use), workdir (when meaning the conversation's directory) | +| **context size** | The number of tokens a conversation currently occupies: the most recent turn's FINAL step `inputTokens + outputTokens` (NOT the aggregate per-turn `usage`, which sums per-step prompts and overcounts a multi-step turn). Stamped on `TurnDoneEvent.contextSize` (live) + `TurnMetrics.contextSize` (persisted); a client reads the LATEST turn's value as current usage. Distinct from the model's **context window** (its max token limit — a later feature). | context window (when meaning current usage), context length, tokens used, context usage | ## Known vocabulary drift diff --git a/frontend-context-size-handoff.md b/frontend-context-size-handoff.md new file mode 100644 index 0000000..ed4cacd --- /dev/null +++ b/frontend-context-size-handoff.md @@ -0,0 +1,47 @@ +# FE handoff — context size (current context-window usage) + +Courier this to `../dispatch-web` (cross-repo contract change; `lsp references` does not +span repos — ORCHESTRATOR §7). Backend commit adds an optional `contextSize` field; no +breaking change. + +## What shipped (backend) + +A new optional field **`contextSize`** (a token count) now flows to the frontend on two +existing carriers. Both are computed identically and are EQUAL for the same turn: + +1. **Live** — `TurnDoneEvent.contextSize?: number` (the `done` AgentEvent, arriving in a + `chat.delta` WS message / the NDJSON stream). +2. **Persisted** — `TurnMetrics.contextSize?: number`, served by + `GET /conversations/:id/metrics` (`ConversationMetricsResponse.turns[].contextSize`). + +Types: `@dispatch/wire` (`0.4.0 → 0.5.0`), re-exported by +`@dispatch/transport-contract` (`0.5.0 → 0.6.0`). Bump the pinned `file:` deps. + +## Definition (read this — it's subtle) + +`contextSize` = **the turn's FINAL step `inputTokens + outputTokens`** — the tokens the +conversation occupies right now. + +It is deliberately **NOT** the aggregate `usage` already on `done` / `TurnMetrics`. +`usage.inputTokens` is the SUM across steps, which **overcounts** a multi-step / tool-calling +turn (each step re-prefills the growing prompt). The final step's input already contains all +prior context, so `finalStep.input + finalStep.output` is the true occupancy. Do not derive +context size from `usage` yourself — read `contextSize`. + +## How to render it + +- **Current value = the LATEST turn's `contextSize`.** The chat's "current context usage" is + whatever the most recent turn reported. +- **Live update:** when a `done` event arrives, if `event.contextSize !== undefined`, set the + displayed context size to it. +- **On (re)hydrate:** call `GET /conversations/:id/metrics`, take the LAST element of `turns` + that has a defined `contextSize`, and show its value. (Turns appear only after they seal.) +- **Optionality:** `contextSize` may be `undefined` (provider reported no per-step usage). + Treat absent as "unknown" — render a placeholder, NOT `0`. + +## Not included yet (next step) + +The model's **max context-window limit** is a SEPARATE, later field — so a UI like +`contextSize / limit` (e.g. `34,102 / 200,000`) can't show the denominator yet. For now show +only the current size (e.g. "34,102 tokens in context"). "context size" = current usage; +"context window" = the future limit (see GLOSSARY). diff --git a/packages/kernel/src/runtime/events.ts b/packages/kernel/src/runtime/events.ts index 300e711..b194577 100644 --- a/packages/kernel/src/runtime/events.ts +++ b/packages/kernel/src/runtime/events.ts @@ -127,16 +127,29 @@ export function doneEvent( reason: string, durationMs?: number, usage?: Usage, + contextSize?: number, ): AgentEvent { + if (durationMs !== undefined && usage !== undefined && contextSize !== undefined) { + return { type: "done", conversationId, turnId, reason, durationMs, usage, contextSize }; + } if (durationMs !== undefined && usage !== undefined) { return { type: "done", conversationId, turnId, reason, durationMs, usage }; } + if (durationMs !== undefined && contextSize !== undefined) { + return { type: "done", conversationId, turnId, reason, durationMs, contextSize }; + } + if (usage !== undefined && contextSize !== undefined) { + return { type: "done", conversationId, turnId, reason, usage, contextSize }; + } if (durationMs !== undefined) { return { type: "done", conversationId, turnId, reason, durationMs }; } if (usage !== undefined) { return { type: "done", conversationId, turnId, reason, usage }; } + if (contextSize !== undefined) { + return { type: "done", conversationId, turnId, reason, contextSize }; + } return { type: "done", conversationId, turnId, reason }; } diff --git a/packages/kernel/src/runtime/run-turn.test.ts b/packages/kernel/src/runtime/run-turn.test.ts index fa2aba4..dcaea7f 100644 --- a/packages/kernel/src/runtime/run-turn.test.ts +++ b/packages/kernel/src/runtime/run-turn.test.ts @@ -2480,4 +2480,101 @@ describe("runTurn", () => { } }); }); + + describe("contextSize", () => { + it("single-step turn: contextSize equals step inputTokens + outputTokens", async () => { + const provider = createFakeProvider([ + [ + { type: "text-delta", delta: "Hello" }, + { type: "usage", usage: { inputTokens: 100, outputTokens: 50 } }, + { type: "finish", reason: "stop" }, + ], + ]); + + const { events, emit } = createCollectingEmit(); + + await runTurn({ + provider, + messages: [userMessage], + tools: [], + dispatch: { maxConcurrent: 1, eager: false }, + conversationId: "conv-1", + turnId: "turn-1", + emit, + }); + + const doneEvt = events.find((e) => e.type === "done"); + expect(doneEvt).toBeDefined(); + if (doneEvt?.type === "done") { + expect(doneEvt.contextSize).toBe(150); + } + }); + + it("multi-step turn: contextSize equals ONLY the last step's inputTokens + outputTokens", async () => { + const tool = createFakeTool("echo", async () => ({ content: "echoed" })); + + const provider = createFakeProvider([ + [ + { type: "tool-call", toolCallId: "tc1", toolName: "echo", input: {} }, + { type: "usage", usage: { inputTokens: 100, outputTokens: 20 } }, + { type: "finish", reason: "tool-calls" }, + ], + [ + { type: "text-delta", delta: "done" }, + { type: "usage", usage: { inputTokens: 300, outputTokens: 80 } }, + { type: "finish", reason: "stop" }, + ], + ]); + + const { events, emit } = createCollectingEmit(); + + await runTurn({ + provider, + messages: [userMessage], + tools: [tool], + dispatch: { maxConcurrent: 1, eager: false }, + conversationId: "conv-1", + turnId: "turn-1", + emit, + }); + + const doneEvt = events.find((e) => e.type === "done"); + expect(doneEvt).toBeDefined(); + if (doneEvt?.type === "done") { + expect(doneEvt.contextSize).toBe(380); + expect(doneEvt.usage).toBeDefined(); + if (doneEvt.usage !== undefined) { + expect(doneEvt.contextSize).not.toBe(doneEvt.usage.inputTokens); + } + } + }); + + it("no usage reported: contextSize is undefined", async () => { + const provider = createFakeProvider([ + [ + { type: "text-delta", delta: "Hello" }, + { type: "finish", reason: "stop" }, + ], + ]); + + const { events, emit } = createCollectingEmit(); + + await runTurn({ + provider, + messages: [userMessage], + tools: [], + dispatch: { maxConcurrent: 1, eager: false }, + conversationId: "conv-1", + turnId: "turn-1", + emit, + }); + + const doneEvt = events.find((e) => e.type === "done"); + expect(doneEvt).toBeDefined(); + if (doneEvt?.type === "done") { + expect(doneEvt.contextSize).toBeUndefined(); + expect(doneEvt.usage).toBeUndefined(); + } + }); + }); }); diff --git a/packages/kernel/src/runtime/run-turn.ts b/packages/kernel/src/runtime/run-turn.ts index b50e8ee..bf57854 100644 --- a/packages/kernel/src/runtime/run-turn.ts +++ b/packages/kernel/src/runtime/run-turn.ts @@ -449,6 +449,7 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> { const messages: ChatMessage[] = [...input.messages]; const resultMessages: ChatMessage[] = []; let totalUsage = zeroUsage(); + let lastStepUsage: Usage | undefined; let finishReason = "stop"; const toolMap = new Map<string, ToolContract>(); @@ -513,6 +514,7 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> { }); totalUsage = addUsage(totalUsage, stepResult.usage); + lastStepUsage = stepResult.usage; if (stepResult.assistantMessage !== undefined) { messages.push(stepResult.assistantMessage); @@ -571,6 +573,10 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> { totalUsage.outputTokens > 0 || totalUsage.cacheReadTokens !== undefined || totalUsage.cacheWriteTokens !== undefined; + const contextSize = + hasUsage && lastStepUsage !== undefined + ? lastStepUsage.inputTokens + lastStepUsage.outputTokens + : undefined; input.emit( doneEvent( conversationId, @@ -578,6 +584,7 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> { finishReason, turnDurationMs, hasUsage ? totalUsage : undefined, + contextSize, ), ); diff --git a/packages/session-orchestrator/src/metrics.test.ts b/packages/session-orchestrator/src/metrics.test.ts index c123dba..1920fc0 100644 --- a/packages/session-orchestrator/src/metrics.test.ts +++ b/packages/session-orchestrator/src/metrics.test.ts @@ -261,4 +261,109 @@ describe("createMetricsAccumulator", () => { expect(tm.usage.inputTokens).toBe(0); expect(tm.usage.outputTokens).toBe(0); }); + + it("contextSize equals inputTokens + outputTokens for a single-step turn", () => { + const acc = createMetricsAccumulator(); + + acc.ingest({ + type: "usage", + conversationId: "c1", + turnId: "t1", + stepId: stepId("t1#0"), + usage: { inputTokens: 10, outputTokens: 5 }, + }); + acc.ingest({ + type: "step-complete", + conversationId: "c1", + turnId: "t1", + stepId: stepId("t1#0"), + }); + acc.ingest({ + type: "done", + conversationId: "c1", + turnId: "t1", + reason: "stop", + usage: { inputTokens: 10, outputTokens: 5 }, + }); + + const tm = acc.build("t1"); + expect(tm.contextSize).toBe(15); + }); + + it("contextSize equals ONLY the last step's inputTokens + outputTokens for a multi-step turn", () => { + const acc = createMetricsAccumulator(); + + acc.ingest({ + type: "usage", + conversationId: "c1", + turnId: "t1", + stepId: stepId("t1#0"), + usage: { inputTokens: 10, outputTokens: 5 }, + }); + acc.ingest({ + type: "step-complete", + conversationId: "c1", + turnId: "t1", + stepId: stepId("t1#0"), + }); + acc.ingest({ + type: "usage", + conversationId: "c1", + turnId: "t1", + stepId: stepId("t1#1"), + usage: { inputTokens: 20, outputTokens: 10 }, + }); + acc.ingest({ + type: "step-complete", + conversationId: "c1", + turnId: "t1", + stepId: stepId("t1#1"), + }); + acc.ingest({ + type: "done", + conversationId: "c1", + turnId: "t1", + reason: "stop", + usage: { inputTokens: 100, outputTokens: 50 }, + }); + + const tm = acc.build("t1"); + expect(tm.contextSize).toBe(30); + expect(tm.contextSize).not.toBe(tm.usage.inputTokens); + }); + + it("contextSize is undefined when the turn has no steps", () => { + const acc = createMetricsAccumulator(); + + acc.ingest({ + type: "done", + conversationId: "c1", + turnId: "t1", + reason: "stop", + }); + + const tm = acc.build("t1"); + expect(tm.contextSize).toBeUndefined(); + }); + + it("contextSize is undefined when the last step has no usable per-step usage", () => { + const acc = createMetricsAccumulator(); + + acc.ingest({ + type: "step-complete", + conversationId: "c1", + turnId: "t1", + stepId: stepId("t1#0"), + genTotalMs: 200, + }); + acc.ingest({ + type: "done", + conversationId: "c1", + turnId: "t1", + reason: "stop", + }); + + const tm = acc.build("t1"); + expect(tm.contextSize).toBeUndefined(); + }); }); diff --git a/packages/session-orchestrator/src/metrics.ts b/packages/session-orchestrator/src/metrics.ts index e953bd9..2dfa533 100644 --- a/packages/session-orchestrator/src/metrics.ts +++ b/packages/session-orchestrator/src/metrics.ts @@ -100,6 +100,20 @@ export function createMetricsAccumulator(): MetricsAccumulator { if (doneDurationMs !== undefined) { (tm as { durationMs?: number }).durationMs = doneDurationMs; } + + // contextSize = final step's inputTokens + outputTokens (true context occupancy). + // Omit when no steps or the last step had no usable per-step usage event. + if (stepMetrics.length > 0) { + const lastStep = stepMetrics[stepMetrics.length - 1]; + if (lastStep !== undefined) { + const lastAcc = steps.get(lastStep.stepId); + if (lastAcc?.usage !== undefined) { + (tm as { contextSize?: number }).contextSize = + lastStep.usage.inputTokens + lastStep.usage.outputTokens; + } + } + } + return tm; } diff --git a/packages/session-orchestrator/src/orchestrator.test.ts b/packages/session-orchestrator/src/orchestrator.test.ts index 33deb15..b33bdcc 100644 --- a/packages/session-orchestrator/src/orchestrator.test.ts +++ b/packages/session-orchestrator/src/orchestrator.test.ts @@ -834,6 +834,66 @@ describe("turn metrics persistence", () => { expect(tm.usage.outputTokens).toBe(15); }); + it("persists contextSize as the last step's inputTokens + outputTokens", async () => { + const store = createInMemoryStore(); + const tool = createFakeTool("echo", async () => ({ content: "echoed" })); + + let callIndex = 0; + const provider: ProviderContract = { + id: "fake", + stream() { + const idx = callIndex++; + return (async function* () { + if (idx === 0) { + yield { + type: "tool-call", + toolCallId: "tc1", + toolName: "echo", + input: {}, + } as ProviderEvent; + yield { + type: "usage", + usage: { inputTokens: 10, outputTokens: 5 }, + } as ProviderEvent; + yield { type: "finish", reason: "tool-calls" } as ProviderEvent; + } else { + yield { type: "text-delta", delta: "Step2" } as ProviderEvent; + yield { + type: "usage", + usage: { inputTokens: 20, outputTokens: 10 }, + } as ProviderEvent; + yield { type: "finish", reason: "stop" } as ProviderEvent; + } + })(); + }, + }; + + const { orchestrator } = createSessionOrchestrator({ + conversationStore: store, + resolveProvider: () => provider, + resolveTools: () => [tool], + applyToolsFilter: identityApplyToolsFilter, + runTurn, + now: () => 1000, + }); + + await orchestrator.handleMessage({ + conversationId: "conv-context-size", + text: "test", + onEvent: () => {}, + }); + + const metrics = store.metricsData.get("conv-context-size"); + expect(metrics).toBeDefined(); + expect(metrics).toHaveLength(1); + + const tm = metrics?.[0]; + if (tm === undefined) throw new Error("expected metrics"); + + expect(tm.steps.length).toBeGreaterThanOrEqual(2); + expect(tm.contextSize).toBe(30); + }); + it("does not persist metrics nor emit turn-sealed when chunk append fails", async () => { const provider = createFakeProvider([ [ diff --git a/packages/transport-contract/package.json b/packages/transport-contract/package.json index 7ebd2c2..5a2a61f 100644 --- a/packages/transport-contract/package.json +++ b/packages/transport-contract/package.json @@ -1,6 +1,6 @@ { "name": "@dispatch/transport-contract", - "version": "0.5.0", + "version": "0.6.0", "type": "module", "private": true, "main": "dist/index.js", diff --git a/packages/wire/package.json b/packages/wire/package.json index 790c7e1..80671cf 100644 --- a/packages/wire/package.json +++ b/packages/wire/package.json @@ -1,6 +1,6 @@ { "name": "@dispatch/wire", - "version": "0.4.0", + "version": "0.5.0", "type": "module", "private": true, "main": "dist/index.js", diff --git a/packages/wire/src/index.ts b/packages/wire/src/index.ts index aa6f9d0..52662ef 100644 --- a/packages/wire/src/index.ts +++ b/packages/wire/src/index.ts @@ -192,6 +192,16 @@ export interface TurnMetrics { readonly durationMs?: number; /** Per-step metrics in step order. */ readonly steps: readonly StepMetrics[]; + /** + * **Context size** — tokens the conversation occupies as of this turn: the + * turn's FINAL step `inputTokens + outputTokens` (the last entry of `steps`), + * NOT the aggregate `usage` (which sums per-step prompts and overcounts a + * multi-step turn). The persisted, replayable counterpart of + * `TurnDoneEvent.contextSize` and equal to it for the same turn. A client + * reopening a past conversation reads the LAST turn's `contextSize` as the + * current context usage. Optional: absent when no per-step usage was available. + */ + readonly contextSize?: number; } // ─── Outward events ───────────────────────────────────────────────────────── @@ -364,6 +374,21 @@ export interface TurnDoneEvent { * provider reported no usage). */ readonly usage?: Usage; + /** + * **Context size** — the number of tokens the conversation now occupies: this + * (the most recent) turn's FINAL step `inputTokens + outputTokens` (the full + * prompt sent into the last LLM round-trip plus that round-trip's output). This + * is the "tokens in context" figure a client renders as the chat's current + * context usage, and a client treats the LATEST turn's value as the live total. + * + * Deliberately NOT the aggregate `usage` above: `usage` SUMS each step's + * `inputTokens`, which overcounts a multi-step / tool-calling turn because every + * step re-prefills the growing prompt — the final step's input already includes + * all prior context, so its input+output is the true occupancy. Optional: absent + * when no per-step usage was observed this turn (mirrors `usage`). A later field + * will carry the model's max context-window LIMIT; this is only the current size. + */ + readonly contextSize?: number; } /** @@ -5,7 +5,7 @@ > Keep this lean and current; do not let it re-accrete a step-by-step changelog. ## Status (current) -`tsc -b` EXIT 0 · biome clean · **865 vitest + 135 bun = 1000 tests**. +`tsc -b` EXIT 0 · biome clean · **881 vitest + 135 bun = 1016 tests**. Built and verified live (full-fidelity: every feature is a manifest-loaded extension through the host): @@ -231,6 +231,26 @@ workspace root, working directory. finished it directly; also fixed a real design bug the agent missed: the manager read config statically instead of per-cwd (would have broken Roblox). +## Context size — current context-window usage (DONE) +User-gated decisions: term = **context size** (current usage; reserve "context window" for the +model's max LIMIT, a later feature); definition = the turn's **FINAL step `inputTokens + +outputTokens`** (NOT the aggregate `usage`, which sums per-step prompts and overcounts a +multi-step turn); delivery = a backend-computed field on BOTH the live `done` event and the +persisted `TurnMetrics`. +- [x] **Contract (orchestrator):** optional `contextSize?: number` added to `TurnDoneEvent` + + `TurnMetrics` in `@dispatch/wire` (`0.4.0→0.5.0`); `@dispatch/transport-contract` + `0.5.0→0.6.0` (re-exports both — no other change). Glossary: added **context size**. +- [x] **Wave (parallel, disjoint pkgs):** + - [x] **kernel** — `run-turn.ts` tracks the last step's `Usage`; `doneEvent()` stamps + `done.contextSize = lastStep.input + lastStep.output` (omitted when no usage). +3 tests. + - [x] **session-orchestrator** — `metrics.ts build()` stamps `TurnMetrics.contextSize` from + the final per-step metrics (same definition; equals the live value). +5 tests. +- [x] Verified: `tsc -b` EXIT 0, biome clean, 881 vitest pass; both owners stayed in-lane. + `conversation-store` (JSON passthrough) + `transport-http` (forwards/serves) unchanged. +- [x] **FE courier handoff:** `frontend-context-size-handoff.md` (user couriers to + `../dispatch-web`). Not yet exercised end-to-end against a live LLM (unit tests cover both + producers); optional live-verify deferred. + ## Open items - **`prefix.fingerprint` / `warm|real` cache-bust attributes (deferred):** decoupled from dedup by the content-addressed decision; also gated on cache-warming being |
