diff options
| author | Adam Malczewski <[email protected]> | 2026-06-12 16:28:07 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-12 16:28:07 +0900 |
| commit | 4001274e3ba25a3946df1e9f2dc82ca6781cd2bf (patch) | |
| tree | 24af95e69bda5c38ab7eefd6b71d55b4c247040a | |
| parent | e6f6bd86eab07954d8f06e740659969c3dfecc7f (diff) | |
| download | dispatch-web-4001274e3ba25a3946df1e9f2dc82ca6781cd2bf.tar.gz dispatch-web-4001274e3ba25a3946df1e9f2dc82ca6781cd2bf.zip | |
feat(cache-warming): consume CR-4 lifecycle — tab-close cancel + scope-aware subscriptions
- closeTab now POSTs /conversations/:id/close (abort in-flight turn + stop/disable
warming server-side); disconnect still leaves both running ([email protected])
- syncSubscriptions honors catalog scope ([email protected]): global surfaces are
not re-subscribed on conversation switch
- fix(ws): the surface-message parser dropped the conversationId echo (CR-4d was
ours, not the backend's) — preserved + unit-tested
- secondsUntilNext: 3s stale guard — a past nextWarmAt renders as waiting, not 0s
- re-pinned + re-mirrored [email protected] / [email protected]
- scripts/probe-cache-warming.ts: live CR-4 probe (default-off, future nextWarmAt,
repeated warms, mid-turn close abort, idempotent re-close) — 17/17 against bin/up
| -rw-r--r-- | .dispatch/transport-contract.reference.md | 294 | ||||
| -rw-r--r-- | .dispatch/ui-contract.reference.md | 87 | ||||
| -rw-r--r-- | backend-handoff-cache-warming.md | 102 | ||||
| -rw-r--r-- | backend-handoff.md | 143 | ||||
| -rw-r--r-- | scripts/probe-cache-warming.ts | 277 | ||||
| -rw-r--r-- | src/adapters/ws/logic.test.ts | 23 | ||||
| -rw-r--r-- | src/adapters/ws/logic.ts | 10 | ||||
| -rw-r--r-- | src/app/store.svelte.ts | 29 | ||||
| -rw-r--r-- | src/app/store.test.ts | 72 | ||||
| -rw-r--r-- | src/features/cache-warming/logic/view-model.test.ts | 10 | ||||
| -rw-r--r-- | src/features/cache-warming/logic/view-model.ts | 14 |
11 files changed, 883 insertions, 178 deletions
diff --git a/.dispatch/transport-contract.reference.md b/.dispatch/transport-contract.reference.md index 86eac50..e6ab799 100644 --- a/.dispatch/transport-contract.reference.md +++ b/.dispatch/transport-contract.reference.md @@ -5,10 +5,25 @@ > hangs on a permission prompt). Your CODE still imports `@dispatch/transport-contract` normally — > this file is for READING only. > -> **Orchestrator:** SNAPSHOT of `[email protected]` (CR-3 user-message shipped). Depends on -> `@dispatch/[email protected]` (see `wire.reference.md`) + `@dispatch/[email protected]` (see +> **Orchestrator:** SNAPSHOT of `[email protected]` (CR-4 cache-warming lifecycle shipped). +> Depends on `@dispatch/[email protected]` (see `wire.reference.md`) + `@dispatch/[email protected]` (see > `ui-contract.reference.md`). > +> **2026-06-12 delta (CR-4 cache-warming lifecycle — package bumped `0.8.0` → `0.9.0`):** adds +> `POST /conversations/:id/close` (`CloseConversationResponse`) — the EXPLICIT "user closed this +> conversation's tab" affordance, distinct from a socket disconnect / `chat.unsubscribe` (which +> still NEVER touch the turn or the warming schedule). Closing (1) aborts any in-flight turn — the +> kernel stops at the next event boundary, partial messages are PERSISTED, and the turn SEALS +> normally with `finishReason: "aborted"` (watchers receive `done` then `turn-sealed`, so a +> stream-derived "generating" flag clears with no special-casing) — and (2) stops + DISABLES +> cache-warming for the conversation (persisted OFF; reopening does not resume warming). Idempotent: +> closing an idle/unknown conversation is `200` with `abortedTurn: false`. Backend behavior fixes +> riding EXISTING shapes (no other contract change): warming now defaults OFF for a new conversation +> (240s interval default kept; re-enable restores the persisted interval); post-warm surface updates +> now carry the FUTURE `nextWarmAt` (notify-before-reschedule fixed); `nextWarmAt: null` is pushed on +> `turn-start` (nothing scheduled while generating) and when warming is/became disabled. Caveat: the +> warming opt-in is NOT yet re-hydrated across a backend restart (reads disabled until toggled again). +> > **2026-06-12 delta (CR-3 user-message handoff — package bumped `0.7.0` → `0.8.0`):** NO transport > shape change — it re-exports `AgentEvent` (which `chat.delta` / `/chat` NDJSON carry), and that union > gained the additive `TurnInputEvent` (`{ type: "user-message"; conversationId; turnId; text }`), the @@ -29,7 +44,7 @@ > persist across turns. FE consumes via the `chat` feature + app store (re-subscribe every open > conversation on (re)connect + page load; derive a "running" state structurally from > `turn-start`…no-`done`/`turn-sealed`-yet). OUT of scope: per-step crash-resume, concurrent-send -> arbitration, explicit stop. +> arbitration. > > **2026-06-12 delta (context-size handoff — package bumped `0.5.0` → `0.6.0`, depends on > `[email protected]`):** no NEW transport shape — the optional `contextSize?: number` rides the @@ -85,6 +100,9 @@ - `POST /chat/warm` — body `WarmRequest` (JSON) → `200 WarmResponse` (cache-warm usage incl. `cachePct`); `409 { error }` when the conversation is currently generating; `400 { error }` on a missing/invalid `conversationId`. The warm is NEVER persisted/streamed/folded into real usage. +- `POST /conversations/:id/close` — no body → `200 CloseConversationResponse`. The EXPLICIT tab-close + affordance: aborts any in-flight turn (persists the partial; seals with `finishReason: "aborted"`) + AND stops + disables cache-warming (persisted OFF). Idempotent (`abortedTurn: false` when idle/unknown). - `GET /metrics/throughput?period=day|week|month&date=<...>` — `ThroughputResponse` (token-weighted tokens/sec per model over the window). Not part of cache-warming; listed for completeness. - `GET /conversations/:id/cwd` — `CwdResponse` (`cwd` is `null` until set). @@ -97,18 +115,23 @@ (`@dispatch/ui-contract`) + chat ops (below). Open once, send `WsClientMessage`, receive `WsServerMessage`. Live `AgentEvent` deltas carry `conversationId`+`turnId` but **no `seq`** (seq lives only on `StoredChunk`, obtained via the `sinceSeq` sync after `turn-sealed`). -- DEFERRED (not built; do not depend on): `GET /conversations` (list), `POST /conversations/:id/cancel`. +- DEFERRED (not built; do not depend on): `GET /conversations` (list). (The former deferred + `POST /conversations/:id/cancel` is superseded by `POST /conversations/:id/close`.) ```ts /** * Transport contract — the typed description of Dispatch's client–server API - * (HTTP + WebSocket). Types-only (zero runtime). Each side owns its own - * (de)serialization — the contract is the SHAPES, not the codec. + * (HTTP + WebSocket). + * + * This package is types-only (zero runtime). It is the single shared surface + * every client imports to know how to talk to the backend. Each side owns its + * OWN (de)serialization: the contract is the SHAPES, not the codec. The + * streaming response payload is the kernel's `AgentEvent` union, re-exported + * here so a client has one import for the whole wire. * - * The WebSocket carries BOTH chat ops (here) and surface ops (in + * The WebSocket carries BOTH chat ops (defined here) and surface ops (defined in * `@dispatch/ui-contract`) over one connection; the unified `WsClientMessage` / - * `WsServerMessage` unions below compose them. Chat ops are new, non-colliding - * `type` variants (`chat.*`) — the shipped surface protocol is unchanged. + * `WsServerMessage` unions below compose them. */ import type { SurfaceClientMessage, SurfaceServerMessage } from "@dispatch/ui-contract"; @@ -124,19 +147,35 @@ export type { AgentEvent, StepMetrics, StoredChunk, TurnMetrics } from "@dispatc * response header (useful when `conversationId` was omitted). */ export interface ChatRequest { - /** The conversation to continue. Omit to start fresh — server mints an id (X-Conversation-Id). */ + /** + * The conversation to continue. Omit to start a fresh conversation — the + * server mints an id and returns it via the `X-Conversation-Id` header. + */ readonly conversationId?: string; + /** The user's message text for this turn. */ readonly message: string; - /** Model name in `<credentialName>/<model>` form (one of `GET /models`). Omit = server default. */ + + /** + * The model to use, as a model name in `<credentialName>/<model>` form — one + * of the exact strings returned by `GET /models`. Omit to use the server's + * default credential + model. + */ readonly model?: string; - /** Working directory for this turn's tool execution. Defaults server-side. Not part of the prompt. */ + + /** + * Working directory for this turn's tool execution. Defaults server-side when + * omitted. Forwarded to tools for path resolution; never part of the model + * prompt (so it does not affect prompt caching). + */ readonly cwd?: string; } /** - * Response body for `GET /models` — the model catalog. Each entry is a model - * name in `<credentialName>/<model>` form (exactly `ChatRequest.model`). + * Response body for `GET /models` — the model catalog. + * + * Each entry is a model name in `<credentialName>/<model>` form: exactly the + * string a client passes back as `ChatRequest.model`. */ export interface ModelsResponse { readonly models: readonly string[]; @@ -144,14 +183,22 @@ export interface ModelsResponse { /** * Response body for `GET /conversations/:id?sinceSeq=<n>` — the incremental - * read-side history endpoint a long-lived client uses to (re)hydrate cheaply. + * read-side history endpoint a long-lived client uses to (re)hydrate a + * conversation cheaply. + * + * `chunks` is the RAW, append-order, seq-ordered slice of the conversation log + * with `seq > sinceSeq` (or the whole log when `sinceSeq` is omitted/0). It is + * NOT reconciled: a dangling tool-call is returned as-is (rendered as an + * interrupted call). Reconciliation is a turn-path concern — the server repairs + * history only when it feeds a provider, never on this read path — which is what + * preserves the per-chunk `seq` cursor invariant (a synthesized repair chunk + * would have no seq). * - * `chunks` is the RAW, append-order, seq-ordered slice with `seq > sinceSeq` - * (or the whole log when `sinceSeq` is omitted/0). NOT reconciled: a dangling - * tool-call is returned as-is. `latestSeq` is the `seq` of the LAST chunk, or — - * when the slice is empty (caught up) — the requested `sinceSeq` (0 for a full - * read of an empty conversation). After applying, the client's new cursor is - * always `latestSeq`; empty `chunks` means "nothing new past your cursor". + * `latestSeq` is the `seq` of the LAST chunk in this response, or — when the + * slice is empty (the client is already caught up) — the requested `sinceSeq` + * (0 for a full read of an empty conversation). So after applying the response a + * client's new cursor is always `latestSeq`, and an empty `chunks` means + * "nothing new past your cursor". */ export interface ConversationHistoryResponse { readonly chunks: readonly StoredChunk[]; @@ -163,12 +210,6 @@ export interface ConversationHistoryResponse { * (and per-step) token + timing metrics for a conversation, for a client * reopening a past conversation to render historical usage/latency. * - * This is a SEPARATE axis from the two other read concerns and is deliberately - * its own endpoint: the live `usage`/`step-complete`/`done` events are transient - * (not persisted), and `ConversationHistoryResponse` carries seq-cursor chunk - * CONTENT. Metrics are keyed per TURN (not per chunk) and so are not seq-filtered - * — hence a sibling route rather than a field on the history response. - * * `turns` is every SEALED turn's `TurnMetrics` in turn order. A turn appears only * after its metrics were persisted (post-seal); an in-flight or unsealed turn is * absent until then. @@ -180,63 +221,46 @@ export interface ConversationMetricsResponse { /** The aggregation window for `GET /metrics/throughput`. */ export type ThroughputPeriod = "day" | "week" | "month"; -/** One model's token-weighted throughput over a period. */ +/** + * One model's throughput over a period. `tokensPerSecond` is the TOKEN-WEIGHTED + * average — `Σ(output tokens) / Σ(generation seconds)` across the period's + * turns — so larger turns count proportionally more than smaller ones. + * Generation time is the model's pure decode time (it excludes tool-execution + * waits). + */ export interface ThroughputModelStat { + /** The model name in `<credentialName>/<model>` form (as selected). */ readonly model: string; + /** Token-weighted average tokens/second over the period. */ readonly tokensPerSecond: number; + /** Total output tokens generated across the period's turns. */ readonly totalOutputTokens: number; + /** Total pure generation time across the period's turns, in milliseconds. */ readonly totalGenMs: number; + /** Number of turns that contributed. */ readonly turns: number; } -/** Response body for `GET /metrics/throughput?period=...&date=...`. */ +/** + * Response body for + * `GET /metrics/throughput?period=day|week|month&date=<...>`. + * + * `date` is `YYYY-MM-DD` for day/week (week = the ISO Mon–Sun week containing + * that date) and `YYYY-MM` for month. Boundaries are computed in the server's + * local timezone; `start`/`end` are the resolved half-open `[start, end)` range + * in epoch-ms. `models` lists every model active in the window, sorted by + * `tokensPerSecond` descending. + */ export interface ThroughputResponse { readonly period: ThroughputPeriod; readonly date: string; - readonly start: number; // inclusive window start, epoch-ms - readonly end: number; // exclusive window end, epoch-ms + /** Inclusive start of the window, epoch-ms. */ + readonly start: number; + /** Exclusive end of the window, epoch-ms. */ + readonly end: number; readonly models: readonly ThroughputModelStat[]; } -/** - * Request body for `POST /chat/warm` — manually trigger a prompt-cache WARMING - * request for a conversation (e.g. a "warm now" button). The warm replays the - * conversation's existing prefix to refresh the provider cache; it is NEVER - * persisted and NEVER streamed. Pass the SAME `model`/`cwd` the conversation - * chats with so the prefix is byte-identical to a real turn (that's the cache hit). - */ -export interface WarmRequest { - readonly conversationId: string; - readonly model?: string; // `<credentialName>/<model>`; omit = server default - readonly cwd?: string; -} - -/** - * Response body for `POST /chat/warm` (HTTP 200). The warm's usage — never folded - * into the conversation's real usage. A client surfaces `cachePct` as the "last - * warming" cache-hit indicator. A 409 (currently generating) returns `{ error }` instead. - */ -export interface WarmResponse { - readonly inputTokens: number; - readonly outputTokens: number; - readonly cacheReadTokens: number; - readonly cacheWriteTokens: number; - /** - * **Cache rate** — what fraction of THIS request's prompt was served from cache: - * `round(cacheReadTokens / inputTokens * 100)` (0 when `inputTokens <= 0`). - * (`inputTokens` is the TOTAL prompt incl. cached, so this is in [0,100].) - */ - readonly cachePct: number; - /** - * **Expected cache (retention)** — of the cacheable prefix this warm touched, how - * much was still warm and read back vs. had to be (re)written: - * `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` (0 when the - * sum is 0). For a healthy warm this is ~**100%**; it drops toward 0 as the cache - * expires/busts. This is the warming HEALTH signal — headline it for "Warm now". - */ - readonly expectedCacheRate: number; -} - // ─── Per-conversation working directory (cwd) ───────────────────────────────── /** Response of `GET /conversations/:id/cwd`. `cwd` is null when never set. */ @@ -250,6 +274,28 @@ export interface SetCwdRequest { readonly cwd: string; } +// ─── Conversation close (explicit tab close) ────────────────────────────────── + +/** + * Response of `POST /conversations/:id/close` (no request body). + * + * The EXPLICIT "the user closed this conversation's tab" affordance — distinct + * from a socket disconnect or `chat.unsubscribe`, which deliberately never touch + * the turn or the warming schedule. Closing: + * 1. aborts any in-flight turn (the kernel stops at the next event boundary, + * partial messages are persisted, and the turn SEALS normally with + * `finishReason: "aborted"` — watchers see `done` + `turn-sealed`), and + * 2. stops + disables cache-warming for the conversation (persisted OFF, so a + * reopened conversation stays opt-in). + * Idempotent: closing an idle or unknown conversation succeeds with + * `abortedTurn: false`. + */ +export interface CloseConversationResponse { + readonly conversationId: string; + /** True when an in-flight turn existed and was aborted by this close. */ + readonly abortedTurn: boolean; +} + // ─── Per-conversation LSP status ────────────────────────────────────────────── /** The connection state of a single language server for a workspace. */ @@ -280,15 +326,71 @@ export interface LspStatusResponse { readonly servers: readonly LspServerInfo[]; } +/** + * Request body for `POST /chat/warm` — manually trigger a prompt-cache WARMING + * request for a conversation (e.g. a frontend "warm now" button, or fast tests + * that don't want to wait for the automatic warming timer). + * + * The warm replays the conversation's existing prefix to the provider to refresh + * its prompt cache; it is NEVER persisted and NEVER streamed (no `AgentEvent`s). + * Pass the same `model`/`cwd` the conversation chats with so the warm request's + * prefix is byte-identical to a real turn (which is what makes the cache hit). + */ +export interface WarmRequest { + /** The conversation whose prompt cache to warm. */ + readonly conversationId: string; + + /** + * The model name in `<credentialName>/<model>` form the conversation uses, so + * the warm resolves the same provider + prefix. Omit to use the server default. + */ + readonly model?: string; + + /** Working directory matching the conversation's turns (for cwd-aware tool assembly). */ + readonly cwd?: string; +} + +/** + * Response body for `POST /chat/warm` (HTTP 200). The warm request's usage — + * never folded into the conversation's real usage. A client surfaces `cachePct` + * as the "last warming" cache-hit indicator. + * + * When warming cannot run because the conversation is currently generating, the + * server responds `409` with `{ error }` instead of this body. + */ +export interface WarmResponse { + readonly inputTokens: number; + readonly outputTokens: number; + readonly cacheReadTokens: number; + readonly cacheWriteTokens: number; + /** + * **Cache rate** — what fraction of THIS request's prompt was served from cache: + * `round(cacheReadTokens / inputTokens * 100)` (0 when `inputTokens <= 0`). + * (`inputTokens` is the TOTAL prompt incl. cached, so this is in [0,100].) + */ + readonly cachePct: number; + /** + * **Expected cache (retention)** — of the cacheable prefix this warm touched, how + * much was still warm and read back vs. had to be (re)written: + * `round(cacheReadTokens / (cacheReadTokens + cacheWriteTokens) * 100)` (0 when the + * sum is 0). For a healthy warm this is ~**100%** (the whole prefix was still + * cached); it drops toward 0 as the cache expires/busts and the warm has to rewrite + * it. This is the warming HEALTH signal — distinct from `cachePct` (which a warm's + * tiny fresh probe makes ~equal, but which on a real turn reflects new content). + */ + readonly expectedCacheRate: number; +} + // ─── WebSocket chat ops ─────────────────────────────────────────────────────── // The persistent WS connection multiplexes chat ops (below) with surface ops -// (`@dispatch/ui-contract`). Chat `type`s are namespaced (`chat.*`) so they -// never collide with surface ones. +// (`@dispatch/ui-contract`). The unified unions at the bottom compose both. Chat +// `type`s are namespaced (`chat.*`) so they never collide with surface ones. /** - * Client → server: start or continue a turn over the WS connection. Same fields - * as the HTTP `ChatRequest`; omit `conversationId` to start fresh — the resolved - * id arrives on the streamed `AgentEvent`s (each carries `conversationId`). + * Client → server: start or continue a turn over the WS connection. Carries the + * same fields as the HTTP `ChatRequest` (so one shape drives both transports); + * omit `conversationId` to start fresh — the resolved id arrives on the streamed + * `AgentEvent`s (each carries `conversationId`). */ export interface ChatSendMessage extends ChatRequest { readonly type: "chat.send"; @@ -296,8 +398,9 @@ export interface ChatSendMessage extends ChatRequest { /** * Server → client: one `AgentEvent` from an in-flight turn (text-delta, - * tool-call, usage, done, turn-sealed, …). Fold these into the transcript - * exactly as the HTTP NDJSON stream — same events, different carrier. + * tool-call, usage, done, turn-sealed, …). The client folds these into its + * transcript exactly as it folds the HTTP NDJSON stream — same events, different + * carrier. */ export interface ChatDeltaMessage { readonly type: "chat.delta"; @@ -316,13 +419,20 @@ export interface ChatErrorMessage { } /** - * Client → server: start WATCHING a conversation's live turn events WITHOUT sending. - * On subscribe the server REPLAYS the current in-flight turn so far (from its - * `turn-start`) as `chat.delta`, then streams live; nothing replayed if idle (rely - * on `GET /conversations/:id` history). Infer "generating" from a replayed - * `turn-start` with no matching `done`/`turn-sealed` yet. `chat.send` already - * auto-subscribes the sender, so this is for conversations you VIEW but didn't send to - * (a 2nd device, or a reloaded/reconnected client). Idempotent per (connection, id). + * Client → server: start WATCHING a conversation's live turn events WITHOUT + * sending a message. This is what makes a turn viewable independently of who + * started it — a second device (multi-client handoff) or a client that reloaded + * mid-turn subscribes to receive the in-flight turn. + * + * On subscribe the server replays the CURRENT in-flight turn's events so far as + * `chat.delta` messages (so a late-joiner sees the whole running turn from its + * `turn-start`), then streams subsequent live events. If no turn is in-flight, + * nothing is replayed (the client relies on `GET /conversations/:id` history). + * A client infers "generating" from a replayed `turn-start` with no matching + * `done`/`turn-sealed` yet. Idempotent per `(connection, conversationId)`. + * + * NOTE: `chat.send` auto-subscribes the sending connection, so a client only needs + * `chat.subscribe` for conversations it is viewing but did not send to. */ export interface ChatSubscribeMessage { readonly type: "chat.subscribe"; @@ -331,22 +441,28 @@ export interface ChatSubscribeMessage { /** * Client → server: stop watching a conversation's turn events on this connection. - * Does NOT stop/affect the turn (it runs to completion regardless of subscribers). - * Socket close drops all of a connection's subscriptions the same way — again - * WITHOUT aborting any in-flight turn. + * Does NOT stop or affect the turn itself (the turn runs to completion regardless + * of subscribers). The server also drops all of a connection's subscriptions when + * the socket closes — again WITHOUT aborting any in-flight turn. */ export interface ChatUnsubscribeMessage { readonly type: "chat.unsubscribe"; readonly conversationId: string; } -/** Every client → server WS message: surface ops + chat ops. Discriminate on `type`. */ +/** + * Every client → server WS message: surface ops (`@dispatch/ui-contract`) + chat + * ops. A server discriminates on `type`. + */ export type WsClientMessage = | SurfaceClientMessage | ChatSendMessage | ChatSubscribeMessage | ChatUnsubscribeMessage; -/** Every server → client WS message: surface ops + chat ops. Discriminate on `type`. */ +/** + * Every server → client WS message: surface ops (`@dispatch/ui-contract`) + chat + * ops. A client discriminates on `type`. + */ export type WsServerMessage = SurfaceServerMessage | ChatDeltaMessage | ChatErrorMessage; ``` diff --git a/.dispatch/ui-contract.reference.md b/.dispatch/ui-contract.reference.md index 00d354f..d751af8 100644 --- a/.dispatch/ui-contract.reference.md +++ b/.dispatch/ui-contract.reference.md @@ -5,29 +5,49 @@ > hangs on a permission prompt). Your CODE still imports `@dispatch/ui-contract` normally — this > file is for READING only. > -> **Orchestrator:** this is a SNAPSHOT — regenerate it whenever `ui-contract` changes. +> **Orchestrator:** this is a SNAPSHOT of `[email protected]` — regenerate it whenever +> `ui-contract` changes. +> +> **2026-06-12 delta (CR-2/CR-4 handoff — package bumped `0.1.0` → `0.2.0`):** adds the optional +> `scope?: "global" | "conversation"` to `SurfaceCatalogEntry` so a client can skip re-subscribing +> GLOBAL surfaces on a conversation switch. ABSENT means assume conversation-scoped (the +> conservative always-send-conversationId policy remains correct for both). Emitted today: +> `loaded-extensions` → `"global"`, `cache-warming` → `"conversation"`. Also (CR-4d, no shape +> change): the initial `surface` reply to a conversation-scoped subscribe ECHOES `conversationId` +> as documented (was already on backend HEAD; verify with a freshly-booted backend). > > **2026-06 delta (cache-warming handoff):** adds the `NumberField` variant (`kind:"number"`) to > the `SurfaceField` union, and an OPTIONAL `conversationId?` to `SubscribeMessage` / > `UnsubscribeMessage` / `InvokeMessage` / `SurfaceMessage` / `SurfaceUpdate` so a surface can be > CONVERSATION-SCOPED (state differs per conversation, e.g. `cache-warming`) vs GLOBAL (one state for > all, e.g. `loaded-extensions`). All additive / backward-compatible: a global surface omits -> `conversationId` and behaves exactly as before. (Backend left the package version at `0.1.0`.) +> `conversationId` and behaves exactly as before. ```ts /** * UI contract — the frontend-agnostic vocabulary for backend-declared "surfaces". * * A SURFACE is a "data transportation surface": a typed description of what data an - * extension exposes, its semantics, and the actions that can act on it — NOT UI. - * Any client renders a surface in its own idiom (web/Svelte, CLI, future TUI/mobile). - * Types-only, zero runtime, zero `@dispatch/*` deps. + * extension exposes, its semantics, and the actions that can act on it — NOT UI. It + * carries STRUCTURE + SEMANTICS + ACTIONS, never styling and never a rendering- + * framework token. Any client (web/Svelte, CLI, future TUI/mobile) renders a surface + * in its own idiom, so swapping or adding a client is a zero-backend-change event. + * + * This package is types-only (zero runtime) and has ZERO `@dispatch/*` dependencies, + * so a separate client repo can depend on JUST this contract. */ -/** Where a surface mounts — a coarse, semantic placement hint, NOT layout/CSS. Open string. */ +/** + * Where a surface mounts — a coarse, semantic placement hint, NOT a layout/CSS + * instruction. A client maps a region to its own idiom; an unknown region falls back + * to the client's default placement. Deliberately left open (a `string`). + */ export type Region = string; -/** A typed reference to a backend action a field can invoke (client posts payload back). */ +/** + * A typed reference to a backend action a field can invoke. The client posts it back + * (with a payload); the surface id comes from context. + */ export interface ActionRef { readonly actionId: string; } @@ -38,7 +58,10 @@ export interface SurfaceOption { readonly label: string; } -/** A field within a surface — a SEMANTIC value, not a widget. `kind` is the discriminant. */ +/** + * A field within a surface — a SEMANTIC value, not a widget. `kind` is the + * discriminant a client switches on to pick a renderer. + */ export type SurfaceField = | ToggleField | ProgressField @@ -56,7 +79,7 @@ export interface ToggleField { readonly action: ActionRef; } -/** A bounded ratio in [0, 1] with a label. Read-only. */ +/** A bounded ratio in [0, 1] with a label (e.g. a cache-hit rate). Read-only. */ export interface ProgressField { readonly kind: "progress"; readonly label: string; @@ -83,7 +106,7 @@ export interface StatField { * A settable numeric value plus the action that sets it — the free-value * counterpart to `selector` (which is a fixed enum). Optional `min`/`max`/`step` * are SEMANTIC bounds a client may use to validate/step input; `unit` is a - * display hint (e.g. "ms", "s"). The client posts the new number as the action + * display hint (e.g. "ms", "min"). The client posts the new number as the action * payload. Unlike `progress`/`stat` (read-only), this field is interactive. */ export interface NumberField { @@ -105,8 +128,10 @@ export interface ButtonField { } /** - * The escape hatch: data that fits no semantic field kind. Opaque `payload` + a - * `rendererId`; clients WITH a renderer for that id show it, others GRACEFULLY SKIP. + * The escape hatch: data that fits no semantic field kind. Carries an opaque + * `payload` + a `rendererId`; clients WITH a renderer for that id show it, others + * GRACEFULLY SKIP. Keep rare — and the owning extension should export a typed + * payload type so its bespoke renderer narrows `payload` via a typed symbol. */ export interface CustomField { readonly kind: "custom"; @@ -114,7 +139,10 @@ export interface CustomField { readonly payload: unknown; } -/** A surface: an ordered set of fields mounted in a region, with a title. */ +/** + * A surface: an ordered set of fields mounted in a region, with a title. The atomic + * unit a backend extension contributes and a client renders. + */ export interface SurfaceSpec { readonly id: string; readonly region: Region; @@ -122,20 +150,33 @@ export interface SurfaceSpec { readonly fields: readonly SurfaceField[]; } -/** A surface-catalog entry — discovery metadata only (no field data). */ +/** + * A surface-catalog entry — discovery metadata only (no field data). + */ export interface SurfaceCatalogEntry { readonly id: string; readonly region: Region; readonly title: string; + /** + * Whether the surface's spec/values differ per conversation ("conversation") + * or are app-wide ("global"). A client may skip re-subscribing GLOBAL surfaces + * on a conversation switch (they ignore `conversationId`). Optional + additive: + * when absent, a client should assume conversation-scoped (the conservative + * "always send the focused conversationId" policy still works for both). + */ + readonly scope?: "global" | "conversation"; } /** The surface catalog: the list of available surfaces a client can choose to show. */ export type SurfaceCatalog = readonly SurfaceCatalogEntry[]; /** - * A live update for a subscribed surface. v1 carries the full new spec. - * `conversationId` is present only for a CONVERSATION-SCOPED surface (tells the - * client which conversation this update is for); a global surface omits it. + * A live update for a subscribed surface (pushed over the WS channel). v1 carries + * the full new spec (the simplest "patch"). + * + * `conversationId` is present only for a CONVERSATION-SCOPED surface (one whose + * spec/values differ per conversation, e.g. cache-warming controls): it tells the + * client which conversation this update pertains to. A global surface omits it. */ export interface SurfaceUpdate { readonly surfaceId: string; @@ -143,14 +184,18 @@ export interface SurfaceUpdate { readonly conversationId?: string; } -// ── Surface WebSocket protocol (slice 1: surfaces only) ────────────────────── +// ── Surface WebSocket protocol ──────────────────────────────────────────────── /** A client → server message on the surface channel. */ export type SurfaceClientMessage = SubscribeMessage | UnsubscribeMessage | InvokeMessage; /** - * Begin receiving live updates for a surface. For a CONVERSATION-SCOPED surface, - * include the `conversationId` whose state you want; omit it for a global surface. + * Begin receiving live updates for a surface (server replies with `surface`, then `update`s). + * + * For a CONVERSATION-SCOPED surface, include the `conversationId` whose state you + * want — the server resolves the spec for that conversation and pushes its updates. + * Omit it for a global surface (or to view a conversation-scoped surface with no + * conversation in focus → the surface decides its default/empty state). */ export interface SubscribeMessage { readonly type: "subscribe"; @@ -208,7 +253,7 @@ export interface SurfaceUpdateMessage { readonly update: SurfaceUpdate; } -/** A surface-scoped error. */ +/** A surface-scoped error (e.g. unknown surface id, invoke failed). */ export interface SurfaceErrorMessage { readonly type: "error"; readonly surfaceId?: string; diff --git a/backend-handoff-cache-warming.md b/backend-handoff-cache-warming.md new file mode 100644 index 0000000..a0019f9 --- /dev/null +++ b/backend-handoff-cache-warming.md @@ -0,0 +1,102 @@ +# Cache-warming lifecycle handoff (FE → backend) — CR-4 — **RESOLVED ✅ 2026-06-12** + +> **Closed.** Backend reply: `../arch-rewrite/frontend-cache-warming-lifecycle-handoff.md` +> (`[email protected]` + `[email protected]`). All asks shipped; FE consumed + live-probed +> 17/17 (`scripts/probe-cache-warming.ts` against `bin/up`). CR-4d turned out to be an FE bug (our +> WS parser dropped the `conversationId` echo on the initial `surface` message) — fixed FE-side. +> Current status lives in `backend-handoff.md` §2. Original report kept below for history. + +> **From:** dispatch-web · **To:** arch-rewrite · **Courier:** the user. +> User-reported symptoms, investigated FE-side with a live probe against a running backend +> (`bin/up2` stack, HTTP :25203 / surface WS :25205, 2026-06-12). Repro tool: +> `dispatch-web/scripts/probe-cache-warming.ts` (drives the FE's real WS adapter + the +> `cache-warming` surface; safe to re-run to verify fixes). +> +> **Verdict up front:** the FE renders the surface data faithfully — symptoms 1 and 2 are +> backend data/behavior; symptom 3 needs a new backend affordance (FE will wire it on arrival). + +## User-reported symptoms + +1. Warming is **ON by default** for a new conversation — the user has to manually turn it off. + Wanted: default OFF, opt-in per conversation. +2. With warming enabled, **no usable countdown** to the next refresh — the user can't tell + whether refreshes are happening at all. +3. Wanted lifecycle: refreshes **keep running when the browser window closes** (✅ already true, + verified — see below), but **closing the conversation's tab in the app should stop the + refreshes AND abort any in-flight generation** (closing the tab = "done with this chat for now"). + +## Probe evidence (verbatim observations) + +Fresh conversation (first turn sealed), then `subscribe {surfaceId:"cache-warming", conversationId}`: + +- **Initial spec:** `toggle value: true`, `number value: 240` (s), timer payload + `{ nextWarmAt: <now+240s>, lastWarmAt: null }` → **enabled by default, warm already scheduled**. + Confirms symptom 1 is backend default state. +- `invoke cache-warming/set-interval payload:20` → update with a FUTURE `nextWarmAt` (+20s). ✅ +- **Automatic warms DO repeat and DO push updates** — 3 warms observed at ~21s spacing + (interval 20s), each pushing an `update` with fresh `Last Cache %` / `Cache retention` stats. + So the engine itself works. +- **BUG (symptom 2 root cause): every post-warm `update` carries a STALE `nextWarmAt` — the fire + time of the warm that JUST completed (i.e. in the past), never the next scheduled one.** + Observed sequence (epoch ms): + + | update after | nextWarmAt | lastWarmAt | note | + |---|---|---|---| + | warm #1 | 1781246273405 | 1781246274299 | nextWarmAt < lastWarmAt (past) | + | warm #2 | 1781246294299 | 1781246295269 | = warm#1.lastWarmAt + 20 000 → still past | + | warm #3 | 1781246315269 | 1781246315998 | = warm#2.lastWarmAt + 20 000 → still past | + + The pattern shows the reschedule math exists (`next = lastWarm + interval`) but the surface + update is emitted with the PRE-warm snapshot; the post-reschedule (future) `nextWarmAt` is + never pushed. The FE countdown is authoritative off `nextWarmAt` (per the cache-warming + handoff design), so after the FIRST automatic warm the UI shows "Next warm in 0s" forever — + exactly the user's "I can't tell if it's working". +- Same staleness after a real chat turn while subscribed: last update after `turn-sealed` still + carried a past `nextWarmAt` (−10s and counting), even though a warm was presumably scheduled. +- **Browser-closed continuity ✅:** the schedule is fully server-side — warms fired with no + browser attached (only the headless probe socket). Symptom 3's "keep running when the window + closes" half already works; do not regress it. +- **Contract deviation (minor):** the initial `surface` reply to a conversation-scoped subscribe + does NOT echo `conversationId` (updates do). `ui-contract` says the echo should be present + ("echoes the subscribe's conversation … so the client routes it"). The FE currently tolerates + the missing echo (treats no-echo as current), but that weakens stale-scope filtering on fast + conversation switches — please echo it. + +## Asks + +### CR-4a — default warming to OFF for a new conversation +New conversations currently start `enabled: true`, interval 240s, first warm scheduled +immediately. Make the default `enabled: false` (no warm scheduled until the user opts in). +No contract change — it's the initial state of the existing surface. + +### CR-4b — push the refreshed (future) `nextWarmAt` after each automatic warm +After a warm completes + the next one is scheduled, the emitted surface `update`'s +`cache-warming-timer` payload must carry the NEW future `nextWarmAt` (and the new `lastWarmAt`). +Either emit the update after rescheduling or emit a second update — FE is indifferent; it just +renders the authoritative timestamp. (Same applies to the post-`turn-sealed` reschedule path.) +No contract change — it's the payload of the existing custom field. + +### CR-4c — a "conversation closed" affordance (stop warming + abort generation) +The FE needs to tell the backend "the user closed this conversation's tab": that should +(1) disable/stop cache-warming for the conversation and (2) abort any in-flight turn. +Today there is no path: +- `chat.unsubscribe` / socket close explicitly never stops the turn (by design — keep that); +- surface `unsubscribe` doesn't touch the warming schedule (correct for mere disconnects); +- `POST /conversations/:id/cancel` is DEFERRED in `transport-contract`; +- programmatically invoking `cache-warming/toggle` is unsuitable: it FLIPS with no payload, so + it's racy as an explicit "disable" (and doesn't abort generation). + +Preferred shape (backend's call): a single explicit `POST /conversations/:id/close` (or WS +message) that does both, OR un-defer `/cancel` + accept an optional explicit boolean payload on +`cache-warming/toggle`. Whatever ships, the FE wires it into its tab-close path. Note the +asymmetry the user wants: browser/socket disconnect ⇒ warming continues; explicit tab close ⇒ +warming + generation stop. + +### CR-4d (minor) — echo `conversationId` on the initial `surface` message +Per the `ui-contract` doc comment on `SurfaceMessage` (see deviation above). + +## FE-side follow-ups (ours, queued behind the above) +- Harden the countdown display: a past `nextWarmAt` renders as "waiting…" instead of a stuck + "0s" (cosmetic guard; CR-4b is the real fix). +- On CR-4c shipping: call the close affordance from `store.closeTab()`; re-pin + re-mirror the + contract; extend `scripts/probe-cache-warming.ts` to verify default-off + post-warm countdown. diff --git a/backend-handoff.md b/backend-handoff.md index b5cf1eb..30a1d64 100644 --- a/backend-handoff.md +++ b/backend-handoff.md @@ -5,18 +5,35 @@ > **From:** dispatch-web orchestrator · **To:** arch-rewrite orchestrator · **Courier:** the user. > `lsp` does NOT span the repos (AGENTS.md § Backend seam) — every cross-repo ask flows through here. -_Last updated: 2026-06-12. **FE is current on `[email protected]` / `[email protected]`.** All handoffs -to date are consumed: surfaces + WS, conversation transcript/metrics, tabs + model selector, -cache-warming (incl. authoritative timer + retention + cache-rate fix), **per-conversation cwd + LSP -status**, **context size** (the `contextSize` field — `done` live + `TurnMetrics` persisted — -rendered as a current-usage readout above the composer), and **turn continuity + multi-client live -view** (`chat.subscribe`/`chat.unsubscribe`; re-attach to a running turn on reconnect/reload/second -device; stream-derived "generating…" state). -**Open asks:** CR-1 (Loaded Extensions as a real table) + CR-2 (optional catalog `scope` flag) below. +_Last updated: 2026-06-12 (CR-4 consumed). **FE is current on `[email protected]` / +`[email protected]` / `[email protected]`.** All handoffs to date are consumed: surfaces + WS, +conversation transcript/metrics, tabs + model selector, cache-warming (incl. authoritative timer + +retention + cache-rate fix + the CR-4 lifecycle below), **per-conversation cwd + LSP status**, +**context size**, and **turn continuity + multi-client live view**. +**Open asks: NONE.** CR-1/CR-2/CR-4 all RESOLVED ✅ (see §2); §3 lists likely next asks. **CR-3 (watcher couldn't see the USER prompt until seal) → RESOLVED ✅** — backend shipped the -`user-message` turn event (`[email protected]` / `[email protected]`); FE re-pinned + consumption live. -The cwd/LSP draft-path verification (`backend-handoff-cwd-lsp.md`) came back **all ✅ confirmed** by the -backend (answers in their `frontend-lsp-cwd-handoff.md`) — see §2._ +`user-message` turn event; FE re-pinned + consumption live. +The cwd/LSP draft-path verification (`backend-handoff-cwd-lsp.md`) came back **all ✅ confirmed**._ + +**CR-4 cache-warming lifecycle (`frontend-cache-warming-lifecycle-handoff.md`) → CONSUMED ✅ +(live-probed 17/17 against `bin/up`).** Re-pinned `[email protected]→0.2.0` + +`[email protected]→0.9.0` (`wire` unchanged); re-mirrored both `.dispatch/*.reference.md`. FE +work: `store.closeTab()` now POSTs `POST /conversations/:id/close` (fire-and-forget, idempotent) — +the explicit "done with this chat" affordance that aborts an in-flight turn + stops/disables that +conversation's warming, while disconnect/`chat.unsubscribe` still leave both running; +`syncSubscriptions` honors the new catalog `scope` flag (a `scope:"global"` surface is no longer +re-subscribed on every conversation switch; absent = conversation-scoped, conservative); +`secondsUntilNext` gained a 3s belt-and-braces stale guard (a past `nextWarmAt` renders "waiting…", +should never trigger now). **CR-4d correction: the missing `conversationId` echo on the initial +`surface` message was OURS** — the backend was right, HEAD echoes it; our WS parser +(`adapters/ws/logic.ts` `case "surface"`) rebuilt the message and DROPPED the field. Fixed + unit +tests; the protocol reducer's stale-scope filtering now actually bites on the initial reply too. +Probe verified live: default OFF + nothing scheduled on a fresh conversation; toggle-on/interval +updates carry FUTURE `nextWarmAt`; repeated automatic warms each push a FUTURE `nextWarmAt`; +`nextWarmAt: null` pushed on `turn-start`; close mid-turn → 200 `abortedTurn:true`, watcher gets +`done` `reason:"aborted"` + `turn-sealed`, surface flips `enabled:false`/`nextWarmAt:null`; second +close idempotent (`abortedTurn:false`). CR-1 table payload also verified arriving (FE renderer +pre-existing). 568 tests green._ **Turn-continuity handoff (`frontend-turn-continuity-handoff.md`) → CONSUMED ✅.** Re-pinned `[email protected]→0.7.0` (additive; `wire` unchanged at `0.5.0`); re-mirrored @@ -43,7 +60,7 @@ backend ask — but the max-limit denominator is now a live FE need; see §3. ## 1. Pinned backend contracts (consumed by the FE) | Package | Used for | |---|---| @@ -54,62 +71,58 @@ Pinned as `file:` deps: **`[email protected]`; `[email protected]`; `transport-contract Endpoints in use (HTTP **24203**, WS **24205**, CORS `*` incl. `PUT`): `POST /chat` (NDJSON) · `GET /models` · `GET /conversations/:id?sinceSeq=<n>` · `GET /conversations/:id/metrics` · `GET`/`PUT /conversations/:id/cwd` · -`GET /conversations/:id/lsp` · `POST /chat/warm` · WS `chat.send`→`chat.delta` · +`GET /conversations/:id/lsp` · `POST /chat/warm` · `POST /conversations/:id/close` (explicit +tab-close: abort turn + stop/disable warming) · WS `chat.send`→`chat.delta` · WS `chat.subscribe`/`chat.unsubscribe` (watch a conversation's turns without sending; replay + live). Mirrored in-repo for headless agents: `.dispatch/{ui-contract,wire,transport-contract}.reference.md` -(regenerate on any contract bump; all current as of `[email protected]` / `[email protected]`). +(regenerate on any contract bump; all current as of `[email protected]` / +`[email protected]` / `[email protected]`). ## 2. Open asks FOR THE BACKEND -### CR-1 — emit the **Loaded Extensions** surface as a true table - -The user wants the Loaded Extensions surface rendered as a nice multi-column -table (e.g. `Name | Version | Trust | Scope`), listing **all** loaded extensions. - -**Already covered — do NOT redo these (no contract change needed):** -- The `custom` field kind + `rendererId` + graceful-skip already exist in - `[email protected]`. CR-1 uses that escape hatch — no `@dispatch/ui-contract` bump. -- The FE renderer is **done and shipped**: `SurfaceView` → `SurfaceTable` → - shared `Table`, dispatched on `rendererId === "table"`. It renders the moment - the surface emits the field below. -- The FE already groups consecutive `stat` fields into an aligned 2-column - (label → value) table, so the current surface (one `stat` per extension: - name → version) is **already readable as a table today**. CR-1 is the upgrade - to real columns, not a fix for something broken. -- The "frontend modules" half of the Extensions view is **100% FE-owned** - (aggregated from each FE feature's `manifest`) — backend has nothing to provide there. - -**What I NEED from the backend to finish it:** replace the N per-extension -`stat` fields with a SINGLE `custom` field: -```ts -{ - kind: "custom", - rendererId: "table", - payload: { - columns: string[], // header labels - rows: (string | number | boolean)[][], // each row aligns cell-for-cell to columns - }, -} -``` -- Cells are coerced to strings; a malformed payload renders nothing (safe skip). -- `rows` should enumerate **every** loaded extension (all trust tiers / kinds), - so "show all" is satisfied from this one surface. - -**Optional (data quality, not a blocker):** extension manifest `version`s all -read `0.0.0` (unversioned). If real versions should appear in the table column, -bump each extension's manifest `version` — otherwise the column is all `0.0.0`. - -### CR-2 (optional, low priority) — a `scope` flag on the surface catalog entry - -The catalog (`SurfaceCatalogEntry`) carries no hint of whether a surface is GLOBAL or -CONVERSATION-SCOPED, so the FE follows the handoff's "always send the focused `conversationId`" -policy. That works (global surfaces ignore it; the FE's routing accepts the no-echo global reply), -but it means the FE **re-subscribes every surface — including global ones like `loaded-extensions` — -on every conversation switch**, which is needless churn (one redundant unsubscribe+subscribe round -trip per global surface per switch; no user-visible bug, the old spec is retained so there's no -flicker). An optional `scope?: "global" | "conversation"` on `SurfaceCatalogEntry` would let the FE -skip re-subscribing globals on switch. **Not blocking** — only raise if cheap. +**None open.** Resolved history below. + +### CR-1 — Loaded Extensions as a true table → **RESOLVED ✅** (shipped + consumed) + +Backend now emits the "Loaded" count stat plus ONE +`{ kind: "custom", rendererId: "table", payload: { columns, rows } }` field +(`columns: ["Name", "Version", "Trust", "Activation"]`, one row per loaded extension, all trust +tiers). Verified arriving live; the FE's pre-existing `SurfaceTable` renderer (dispatch on +`rendererId === "table"`) shows it with no FE change. A typed `TablePayload` (+ `TABLE_RENDERER_ID`) +is exported from `@dispatch/surface-loaded-extensions` if we ever want to narrow instead of +duck-typing — not consumed (would add a dep for no behavior change). Data-quality note stands: +`Version` cells all read `0.0.0` (manifests genuinely unversioned; optional backend cleanup). + +### CR-2 — catalog `scope` flag → **RESOLVED ✅** (`[email protected]`, consumed) + +`SurfaceCatalogEntry.scope?: "global" | "conversation"` shipped (emitted: `loaded-extensions` → +global, `cache-warming` → conversation). FE consumed: `syncSubscriptions` subscribes a +`scope:"global"` surface WITHOUT a conversationId, so a conversation switch no longer churns a +redundant unsubscribe+subscribe per global surface. ABSENT scope = assume conversation-scoped +(conservative, per contract). + +### CR-4 — cache-warming lifecycle → **RESOLVED ✅** (courier `backend-handoff-cache-warming.md`; reply `frontend-cache-warming-lifecycle-handoff.md`; live-probed 17/17) + +All four asks shipped + consumed (`[email protected]`): +- **(a) default OFF** for a new conversation (interval default still 240s; re-enable restores the + persisted interval). Verified live. +- **(b) FUTURE `nextWarmAt`** pushed after every automatic warm + after `turn-sealed`; + `nextWarmAt: null` pushed on `turn-start` (FE renders "waiting…" while generating) and when + disabled. Verified live (2 automatic warms @10s, both future). +- **(c) `POST /conversations/:id/close`** (`CloseConversationResponse { conversationId, + abortedTurn }`): aborts an in-flight turn (partial persisted, seals with `reason: "aborted"` → + watchers' `generating` clears structurally) + stops/disables warming (persisted OFF), idempotent; + disconnect/`chat.unsubscribe` still never touch either. FE wires it in `store.closeTab()` + (fire-and-forget). Verified live incl. mid-turn abort + idempotent re-close. +- **(d) `conversationId` echo on the initial `surface` message — was an FE BUG, not backend.** + The backend's frame carries it (raw-frame verified); our WS parser + (`adapters/ws/logic.ts` `case "surface"`) rebuilt the message and dropped the field. Fixed FE-side + + unit-tested; stale-scope filtering now applies to the initial reply too. Backend owes nothing. + +**Known caveat (accepted, fail-safe):** the warming opt-in is NOT re-hydrated across a backend +RESTART — a conversation reads disabled until toggled again. Flag to the backend if persistence +across restarts becomes a product need (they offered boot hydration). ### cwd + LSP draft path → **VERIFIED ✅ (all 6 asks confirmed; courier `backend-handoff-cwd-lsp.md`)** @@ -184,7 +197,11 @@ errored turn would leave a persisted prompt with no reply). source the model's advertised window. - `GET /conversations` — conversation list / sidebar (history explorer / switcher); could also expose a per-conversation "last model" so a reopened tab seeds its model from the server instead of localStorage. -- `POST /conversations/:id/cancel` — "stop generating". +- ~~`POST /conversations/:id/cancel`~~ — **superseded by `POST /conversations/:id/close` + (CR-4c, shipped)**. A standalone "stop generating WITHOUT closing/disabling warming" button would + still need a separate affordance if the product ever wants it. +- **Warming opt-in persistence across backend restarts** — currently fail-safe-off after a restart; + backend offered boot hydration if it becomes a need (see CR-4 caveat in §2). - **LSP status over WS** (push) — today the FE HTTP-polls `GET /conversations/:id/lsp` on panel mount / cwd change + a manual refresh; a live surface/WS push would remove the manual refresh and reflect a server flipping to `error`/`connected` without a reload. (Backend flagged this as a future option.) diff --git a/scripts/probe-cache-warming.ts b/scripts/probe-cache-warming.ts new file mode 100644 index 0000000..470e43b --- /dev/null +++ b/scripts/probe-cache-warming.ts @@ -0,0 +1,277 @@ +/** + * scripts/probe-cache-warming.ts — LIVE probe of the `cache-warming` surface + + * conversation-close lifecycle against a RUNNING backend (bin/up: HTTP :24203 + + * surface WS :24205; override with PROBE_HTTP / PROBE_WS for bin/up2's +1000 + * ports). NOT part of `bun run test`. Verifies the CR-4 handoff end-to-end: + * + * A. draft subscribe (no conversationId) → degenerate "no conversation" spec + * B. fresh conversation → warming defaults OFF, nothing scheduled (CR-4a) + * C. toggle on + 10s interval → repeated automatic warms, each update carrying + * a FUTURE nextWarmAt (CR-4b), initial `surface` echoes conversationId (CR-4d) + * D. POST /conversations/:id/close mid-turn → abortedTurn, done.reason + * "aborted", turn-sealed, warming disabled + unscheduled (CR-4c) + * + * bun scripts/probe-cache-warming.ts + */ +import type { + ChatDeltaMessage, + ChatErrorMessage, + CloseConversationResponse, +} from "@dispatch/transport-contract"; +import type { SurfaceServerMessage, SurfaceSpec } from "@dispatch/ui-contract"; +import { createSurfaceSocket } from "../src/adapters/ws/index.ts"; +import { parseControls } from "../src/features/cache-warming/logic/view-model.ts"; + +const WS_URL = process.env.PROBE_WS ?? "ws://localhost:24205"; +const HTTP_BASE = process.env.PROBE_HTTP ?? "http://localhost:24203"; +const SURFACE_ID = "cache-warming"; + +const checks: { name: string; ok: boolean }[] = []; +const record = (name: string, ok: boolean, detail?: string) => { + checks.push({ name, ok }); + console.log(` ${ok ? "✅" : "❌"} ${name}${detail ? ` — ${detail}` : ""}`); +}; +const log = (msg: string) => console.log(`[${new Date().toISOString().slice(11, 19)}] ${msg}`); +const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms)); + +function summarize(spec: SurfaceSpec | null): string { + const c = parseControls(spec); + const next = + c.nextWarmAt === null ? "null" : `${Math.round((c.nextWarmAt - Date.now()) / 1000)}s`; + return `enabled=${c.enabled} interval=${c.intervalSeconds}s lastPct=${c.lastPct} next=${next} lastWarmAt=${c.lastWarmAt}`; +} + +let catalog: { id: string; scope?: string }[] = []; +let latestSpec: SurfaceSpec | null = null; +let latestSpecConv: string | undefined; +let specWaiter: (() => void) | null = null; + +const chatHandlers = new Map<string, (msg: ChatDeltaMessage | ChatErrorMessage) => void>(); + +const socket = createSurfaceSocket({ + url: WS_URL, + onMessage: (m: SurfaceServerMessage) => { + if (m.type === "catalog") { + catalog = [...m.catalog]; + log(`catalog: ${m.catalog.map((e) => `${e.id}(scope=${e.scope ?? "—"})`).join(", ")}`); + } else if (m.type === "surface") { + latestSpec = m.spec; + latestSpecConv = m.conversationId; + log(`surface(initial) conv=${m.conversationId ?? "—"}: ${summarize(m.spec)}`); + specWaiter?.(); + } else if (m.type === "update") { + if (m.update.surfaceId !== SURFACE_ID) return; + latestSpec = m.update.spec; + latestSpecConv = m.update.conversationId; + log(`update conv=${m.update.conversationId ?? "—"}: ${summarize(m.update.spec)}`); + specWaiter?.(); + } else if (m.type === "error") { + log(`surface ERROR: ${m.surfaceId ?? "—"}: ${m.message}`); + } + }, + onChat: (msg) => { + const id = msg.type === "chat.error" ? msg.conversationId : msg.event.conversationId; + if (id !== undefined) chatHandlers.get(id)?.(msg); + }, +}); + +/** Wait for the next surface/update message (or time out). */ +function nextSpec(timeoutMs: number): Promise<boolean> { + return new Promise((resolve) => { + const t = setTimeout(() => { + specWaiter = null; + resolve(false); + }, timeoutMs); + specWaiter = () => { + clearTimeout(t); + specWaiter = null; + resolve(true); + }; + }); +} + +async function runTinyTurn(conversationId: string, prompt: string): Promise<boolean> { + const done = Promise.withResolvers<boolean>(); + chatHandlers.set(conversationId, (msg) => { + if (msg.type === "chat.error") { + log(`chat.error: ${msg.message}`); + done.resolve(false); + } else if (msg.event.type === "turn-sealed") { + done.resolve(true); + } + }); + socket.send({ type: "chat.send", conversationId, message: prompt }); + const t = setTimeout(() => done.resolve(false), 90_000); + const ok = await done.promise; + clearTimeout(t); + chatHandlers.delete(conversationId); + return ok; +} + +function invoke(actionId: string, conversationId: string, payload?: unknown): void { + socket.send( + payload === undefined + ? { type: "invoke", surfaceId: SURFACE_ID, actionId, conversationId } + : { type: "invoke", surfaceId: SURFACE_ID, actionId, payload, conversationId }, + ); +} + +async function main() { + await sleep(600); + record( + "catalog includes cache-warming with scope=conversation", + catalog.some((e) => e.id === SURFACE_ID && e.scope === "conversation"), + ); + + // ── A: the DRAFT/new-tab path — subscribe with NO conversationId ─────────── + log("PHASE A: subscribe with NO conversationId (draft / new tab)"); + socket.send({ type: "subscribe", surfaceId: SURFACE_ID }); + await nextSpec(3000); + record( + "draft subscribe → degenerate spec (no toggle parsed)", + !parseControls(latestSpec).enabled, + ); + socket.send({ type: "unsubscribe", surfaceId: SURFACE_ID }); + await sleep(300); + + // ── B: a FRESH conversation defaults OFF (CR-4a) + echo (CR-4d) ──────────── + const conv = crypto.randomUUID(); + log(`PHASE B: creating conversation ${conv}`); + if (!(await runTinyTurn(conv, "Reply with exactly: ok"))) { + log("FATAL: could not create a conversation"); + process.exit(1); + } + socket.send({ type: "subscribe", surfaceId: SURFACE_ID, conversationId: conv }); + await nextSpec(3000); + const fresh = parseControls(latestSpec); + record("CR-4d: initial surface message echoes conversationId", latestSpecConv === conv); + record("CR-4a: fresh conversation defaults to warming OFF", fresh.enabled === false); + record("CR-4a: nothing scheduled while off (nextWarmAt null)", fresh.nextWarmAt === null); + + // ── C: opt in + 10s interval → repeated warms, FUTURE nextWarmAt (CR-4b) ─── + log("PHASE C: toggling warming ON"); + const toggleId = fresh.toggleActionId; + if (toggleId === null) { + record("toggle action present", false); + process.exit(1); + } + invoke(toggleId, conv); + await nextSpec(3000); + let c = parseControls(latestSpec); + record("toggle-on update arrived (enabled)", c.enabled === true); + record( + "CR-4b: enable schedules a FUTURE nextWarmAt", + c.nextWarmAt !== null && c.nextWarmAt > Date.now(), + ); + + const setIntervalId = c.setIntervalActionId; + if (setIntervalId !== null) { + log("PHASE C: set-interval = 10s"); + invoke(setIntervalId, conv, 10); + await nextSpec(3000); + c = parseControls(latestSpec); + record( + "set-interval update: interval=10 + FUTURE nextWarmAt", + c.intervalSeconds === 10 && c.nextWarmAt !== null && c.nextWarmAt > Date.now(), + ); + } + + log("PHASE C: waiting up to 45s for 2 automatic warms…"); + const deadline = Date.now() + 45_000; + let lastSeen = c.lastWarmAt; + let warms = 0; + let allFuture = true; + while (Date.now() < deadline && warms < 2) { + await nextSpec(Math.max(1, deadline - Date.now())); + const now = parseControls(latestSpec); + if (now.lastWarmAt !== null && now.lastWarmAt !== lastSeen) { + lastSeen = now.lastWarmAt; + warms++; + const future = now.nextWarmAt !== null && now.nextWarmAt > Date.now() - 1000; + if (!future) allFuture = false; + log( + ` automatic warm #${warms}: pct=${now.lastPct} retention=${now.retentionPct} ` + + `nextWarmAt ${future ? "FUTURE" : "STALE/PAST"}`, + ); + } + } + record("automatic warms repeat (2 observed @10s)", warms >= 2, `${warms} warm(s)`); + record("CR-4b: every post-warm update carries a FUTURE nextWarmAt", warms >= 2 && allFuture); + + // ── D: close mid-turn → abort + warming disabled (CR-4c) ─────────────────── + log("PHASE D: starting a long turn, then closing the conversation mid-turn…"); + const seenDone = Promise.withResolvers<string>(); // resolves with done.reason + const seenSealed = Promise.withResolvers<void>(); + let turnStarted = false; + const started = Promise.withResolvers<void>(); + chatHandlers.set(conv, (msg) => { + if (msg.type === "chat.error") { + log(`chat.error: ${msg.message}`); + return; + } + const ev = msg.event; + if (ev.type === "turn-start") { + turnStarted = true; + started.resolve(); + } else if (ev.type === "done") { + seenDone.resolve(ev.reason); + } else if (ev.type === "turn-sealed") { + seenSealed.resolve(); + } + }); + socket.send({ + type: "chat.send", + conversationId: conv, + message: + "Write a detailed 1000-word essay about the history of computing. Take your time and be thorough.", + }); + const startTimeout = setTimeout(() => started.resolve(), 15_000); + await started.promise; + clearTimeout(startTimeout); + record("turn started (watcher saw turn-start)", turnStarted); + await sleep(1000); // let it generate a moment + + const res = await fetch(`${HTTP_BASE}/conversations/${encodeURIComponent(conv)}/close`, { + method: "POST", + headers: { Origin: "http://localhost:24204" }, + }); + record("POST /conversations/:id/close → 200", res.ok, `HTTP ${res.status}`); + const body = (await res.json()) as CloseConversationResponse; + record("close aborted the in-flight turn (abortedTurn)", body.abortedTurn === true); + + const doneReason = await Promise.race([seenDone.promise, sleep(15_000).then(() => "(timeout)")]); + record('watcher received done with reason "aborted"', doneReason === "aborted", doneReason); + const sealed = await Promise.race([ + seenSealed.promise.then(() => true), + sleep(15_000).then(() => false), + ]); + record("turn sealed normally after abort", sealed); + chatHandlers.delete(conv); + + // The close also pushed a surface update: warming disabled + unscheduled. + await sleep(1500); + const closed = parseControls(latestSpec); + record( + "CR-4c: close disabled warming + cleared the schedule", + closed.enabled === false && closed.nextWarmAt === null, + summarize(latestSpec), + ); + + // Idempotency: closing again (now idle) succeeds with abortedTurn false. + const res2 = await fetch(`${HTTP_BASE}/conversations/${encodeURIComponent(conv)}/close`, { + method: "POST", + headers: { Origin: "http://localhost:24204" }, + }); + const body2 = (await res2.json()) as CloseConversationResponse; + record("close is idempotent (200 + abortedTurn:false)", res2.ok && body2.abortedTurn === false); + + socket.close(); + const passed = checks.filter((x) => x.ok).length; + console.log(`\n[probe-cache-warming] ${passed}/${checks.length} checks passed`); + process.exit(passed === checks.length ? 0 : 1); +} + +main().catch((e) => { + console.error(`[probe] FATAL: ${e}`); + process.exit(1); +}); diff --git a/src/adapters/ws/logic.test.ts b/src/adapters/ws/logic.test.ts index 546afe1..2784295 100644 --- a/src/adapters/ws/logic.test.ts +++ b/src/adapters/ws/logic.test.ts @@ -64,6 +64,29 @@ describe("parseServerMessage", () => { }); }); + it("preserves the conversationId echo on a scoped surface message", () => { + const data = JSON.stringify({ + type: "surface", + spec: { id: "s1", region: "r", title: "S1", fields: [] }, + conversationId: "c1", + }); + const result = parseServerMessage(data); + expect(result).toEqual({ + type: "surface", + spec: { id: "s1", region: "r", title: "S1", fields: [] }, + conversationId: "c1", + }); + }); + + it("rejects a surface message with a non-string conversationId", () => { + const data = JSON.stringify({ + type: "surface", + spec: { id: "s1", region: "r", title: "S1", fields: [] }, + conversationId: 42, + }); + expect(parseServerMessage(data)).toBeNull(); + }); + it("parses an update message", () => { const data = JSON.stringify({ type: "update", diff --git a/src/adapters/ws/logic.ts b/src/adapters/ws/logic.ts index 6592f1b..17e3951 100644 --- a/src/adapters/ws/logic.ts +++ b/src/adapters/ws/logic.ts @@ -59,7 +59,15 @@ export function parseServerMessage(data: string): WsServerMessage | null { if (typeof spec.region !== "string") return null; if (typeof spec.title !== "string") return null; if (!Array.isArray(spec.fields)) return null; - return { type: "surface", spec: spec as unknown as SurfaceMessage["spec"] }; + // Preserve the conversationId echo (a conversation-scoped surface's initial + // reply carries it) — dropping it would defeat the protocol reducer's + // stale-scope filtering on a fast conversation switch. + const conversationId = parsed.conversationId; + if (conversationId !== undefined && typeof conversationId !== "string") return null; + const surfaceSpec = spec as unknown as SurfaceMessage["spec"]; + return conversationId !== undefined + ? { type: "surface", spec: surfaceSpec, conversationId } + : { type: "surface", spec: surfaceSpec }; } case "update": { const update = parsed.update; diff --git a/src/app/store.svelte.ts b/src/app/store.svelte.ts index df92b31..2837bb5 100644 --- a/src/app/store.svelte.ts +++ b/src/app/store.svelte.ts @@ -256,6 +256,23 @@ export function createAppStore(opts?: CreateAppStoreOptions): AppStore { socket?.send({ type: "chat.unsubscribe", conversationId }); } + /** + * Tell the backend the user EXPLICITLY closed this conversation's tab + * (`POST /conversations/:id/close`): aborts any in-flight turn (it seals with + * `reason: "aborted"`) and stops + DISABLES its cache-warming (persisted OFF). + * Distinct from a disconnect / `chat.unsubscribe`, which deliberately leave + * both running. Fire-and-forget: a failure is non-fatal (worst case the + * warming keeps running until a later close/toggle), and the endpoint is + * idempotent server-side. + */ + function closeConversation(conversationId: string): void { + void fetchImpl(`${httpBase}/conversations/${encodeURIComponent(conversationId)}/close`, { + method: "POST", + }).catch(() => { + // Non-fatal — see doc comment. + }); + } + /** The conversation the surfaces should scope to (undefined for a draft). */ function focusedConversationId(): string | undefined { return tabsStore.activeConversationId ?? undefined; @@ -289,7 +306,12 @@ export function createAppStore(opts?: CreateAppStoreOptions): AppStore { function syncSubscriptions(): void { const cid = focusedConversationId(); for (const entry of protocol.catalog) { - const result = protocolSubscribe(protocol, entry.id, cid); + // A GLOBAL surface ignores conversation scope — subscribe it WITHOUT an id + // so a conversation switch doesn't churn a redundant unsubscribe+subscribe + // round trip ([email protected] catalog `scope`; ABSENT = assume + // conversation-scoped, the conservative pre-0.2.0 policy). + const scoped = entry.scope === "global" ? undefined : cid; + const result = protocolSubscribe(protocol, entry.id, scoped); protocol = result.state; for (const msg of result.outgoing) { socket?.send(msg); @@ -489,7 +511,10 @@ export function createAppStore(opts?: CreateAppStoreOptions): AppStore { closeTab(conversationId: string): void { tabsStore.closeTab(conversationId); - // Stop watching the closed conversation's turns (does NOT stop the turn). + // The user is DONE with this chat for now: abort any in-flight turn and + // stop + disable its cache-warming, server-side. + closeConversation(conversationId); + // Stop watching the closed conversation's turns. unsubscribeChat(conversationId); const store = chatStores.get(conversationId); if (store !== undefined) { diff --git a/src/app/store.test.ts b/src/app/store.test.ts index 803d7dc..f4b5a0f 100644 --- a/src/app/store.test.ts +++ b/src/app/store.test.ts @@ -674,6 +674,78 @@ describe("createAppStore", () => { store.dispose(); }); + it("closing a tab POSTs /conversations/:id/close (abort turn + stop warming)", async () => { + const calls: { url: string; method: string }[] = []; + const base = fakeFetchImpl(); + const fetchImpl: typeof fetch = async (input, init) => { + const url = typeof input === "string" ? input : input instanceof URL ? input.href : input.url; + calls.push({ url, method: init?.method ?? "GET" }); + if (url.endsWith("/close")) { + return new Response( + JSON.stringify({ conversationId: url.split("/").at(-2), abortedTurn: false }), + { status: 200 }, + ); + } + return base(input, init); + }; + const ws = fakeSocket(); + const store = createAppStore({ + socketFactory: () => ws, + fetchImpl, + localStorage: createFakeStorage(), + }); + ws.resolveOpen(); + + store.send("first"); + const convId = activeConversationId(store); + store.closeTab(convId); + await Promise.resolve(); // flush the fire-and-forget fetch + + const close = calls.find((c) => c.url.endsWith(`/conversations/${convId}/close`)); + expect(close).toBeDefined(); + expect(close?.method).toBe("POST"); + + store.dispose(); + }); + + it("does NOT re-scope a scope:'global' surface on conversation switch (no churn)", () => { + const ws = fakeSocket(); + const store = createAppStore({ + socketFactory: () => ws, + fetchImpl: fakeFetchImpl(), + localStorage: createFakeStorage(), + }); + ws.resolveOpen(); + + ws.feedSurfaceMessage({ + type: "catalog", + catalog: [ + { id: "s-global", region: "side", title: "Global", scope: "global" }, + { id: "s-conv", region: "side", title: "Scoped", scope: "conversation" }, + ], + }); + + ws.sent.length = 0; + store.send("promote the draft"); // draft → real conversation: surfaces re-scope + const convId = activeConversationId(store); + + const surfaceMsgs = parseSent(ws).filter( + (p): p is { type: string; surfaceId: string; conversationId?: string } => + (p as { type: string }).type === "subscribe" || + (p as { type: string }).type === "unsubscribe", + ); + // The conversation-scoped surface re-scopes: unsubscribe old + subscribe new id. + expect( + surfaceMsgs.some( + (m) => m.type === "subscribe" && m.surfaceId === "s-conv" && m.conversationId === convId, + ), + ).toBe(true); + // The global surface is untouched — no redundant unsubscribe+subscribe round trip. + expect(surfaceMsgs.some((m) => m.surfaceId === "s-global")).toBe(false); + + store.dispose(); + }); + it("tabs persist to the injected storage and restore on a new store", () => { const ws = fakeSocket(); const storage = createFakeStorage(); diff --git a/src/features/cache-warming/logic/view-model.test.ts b/src/features/cache-warming/logic/view-model.test.ts index 3d6f6d0..d5ea901 100644 --- a/src/features/cache-warming/logic/view-model.test.ts +++ b/src/features/cache-warming/logic/view-model.test.ts @@ -215,6 +215,14 @@ describe("secondsUntilNext (authoritative, from nextWarmAt)", () => { expect(secondsUntilNext(10_000, 10_000)).toBe(0); expect(secondsUntilNext(250_000, 10_000)).toBe(240); expect(secondsUntilNext(70_000, 10_000)).toBe(60); - expect(secondsUntilNext(5_000, 999_999)).toBe(0); // already past + }); + + it("treats a nextWarmAt past the stale grace as not scheduled (belt-and-braces)", () => { + // Within the 3s grace an on-time warm may briefly read "0s"… + expect(secondsUntilNext(10_000, 11_000)).toBe(0); + expect(secondsUntilNext(10_000, 13_000)).toBe(0); + // …but beyond it the value is stale → null (the "waiting…" state). + expect(secondsUntilNext(10_000, 13_001)).toBeNull(); + expect(secondsUntilNext(5_000, 999_999)).toBeNull(); }); }); diff --git a/src/features/cache-warming/logic/view-model.ts b/src/features/cache-warming/logic/view-model.ts index f7740d7..eb105f6 100644 --- a/src/features/cache-warming/logic/view-model.ts +++ b/src/features/cache-warming/logic/view-model.ts @@ -232,11 +232,23 @@ export function observeWarm( } /** + * Grace before a PAST `nextWarmAt` is treated as "not scheduled" (→ the + * "waiting…" state instead of a perpetual "0s"). The backend pushes the FUTURE + * `nextWarmAt` after every warm (CR-4b) and `null` while generating/disabled, so + * this is a belt-and-braces guard that should never trigger — it only matters + * against a stale/buggy emitter, and the small window lets an on-time warm show + * "0s" for the second it takes to complete. + */ +const STALE_NEXT_WARM_MS = 3000; + +/** * Seconds until the next automatic warm, AUTHORITATIVE: derived straight from the * backend's `nextWarmAt` epoch-ms (never FE-anchored/guessed). `null` when nothing - * is scheduled (disabled, or a turn is generating so the timer is cancelled). + * is scheduled (disabled, or a turn is generating so the timer is cancelled) — or + * when `nextWarmAt` is stale (further than the grace into the past). */ export function secondsUntilNext(nextWarmAt: number | null, now: number): number | null { if (nextWarmAt === null) return null; + if (now - nextWarmAt > STALE_NEXT_WARM_MS) return null; return Math.max(0, Math.ceil((nextWarmAt - now) / 1000)); } |
