From baa6f6c9d21de2f6ffc60e00f53c61d026155933 Mon Sep 17 00:00:00 2001 From: Adam Malczewski Date: Fri, 12 Jun 2026 20:38:57 +0900 Subject: feat(chat): reasoning-effort selector — sticky per-conversation thinking-depth knob MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Consume the backend's reasoning-effort handoff (wire@0.7.0 ReasoningEffort + transport-contract@0.11.0 GET/PUT /conversations/:id/reasoning-effort, ChatRequest.reasoningEffort): a 5-level selector in the sidebar Model view, under the provider + model dropdowns. null renders as 'high (default)' per the server-owned resolution chain; PUT on change (effective next turn); error + revert on 400; per-conversation re-mount incl. drafts (the draft id survives promotion, so an effort set on a draft applies from turn 1). Re-mirrored .dispatch references; GLOSSARY 'reasoning effort'; handoff updated. 616 tests green; live curl probe passed. --- .dispatch/transport-contract.reference.md | 66 ++++++++++++++- .dispatch/wire.reference.md | 28 ++++++- AGENTS.md | 4 +- GLOSSARY.md | 1 + backend-handoff.md | 41 ++++++--- src/app/App.svelte | 21 ++++- src/app/store.svelte.ts | 82 ++++++++++++++++++ src/app/store.test.ts | 97 ++++++++++++++++++++++ src/features/chat/index.ts | 13 +++ src/features/chat/reasoning-effort.test.ts | 45 ++++++++++ src/features/chat/reasoning-effort.ts | 66 +++++++++++++++ src/features/chat/ui.test.ts | 74 +++++++++++++++++ .../chat/ui/ReasoningEffortSelector.svelte | 75 +++++++++++++++++ 13 files changed, 593 insertions(+), 20 deletions(-) create mode 100644 src/features/chat/reasoning-effort.test.ts create mode 100644 src/features/chat/reasoning-effort.ts create mode 100644 src/features/chat/ui/ReasoningEffortSelector.svelte diff --git a/.dispatch/transport-contract.reference.md b/.dispatch/transport-contract.reference.md index 774cfb0..1c3d993 100644 --- a/.dispatch/transport-contract.reference.md +++ b/.dispatch/transport-contract.reference.md @@ -5,10 +5,27 @@ > hangs on a permission prompt). Your CODE still imports `@dispatch/transport-contract` normally — > this file is for READING only. > -> **Orchestrator:** SNAPSHOT of `transport-contract@0.10.0` (CR-5 history windowing shipped). -> Depends on `@dispatch/wire@0.6.1` (see `wire.reference.md`) + `@dispatch/ui-contract@0.2.0` (see +> **Orchestrator:** SNAPSHOT of `transport-contract@0.11.0` (reasoning effort shipped). +> Depends on `@dispatch/wire@0.7.0` (see `wire.reference.md`) + `@dispatch/ui-contract@0.2.0` (see > `ui-contract.reference.md`). > +> **2026-06-12 delta (reasoning-effort handoff — package bumped `0.10.0` → `0.11.0`, ADDITIVE):** +> the thinking-depth knob (`ReasoningEffort`, re-exported from `wire@0.7.0`) lands in TWO scopes, +> resolved server-side per turn (per-turn override → persisted conversation value → default +> `"high"`; do NOT re-implement the chain client-side): +> 1. **Per-turn override** — optional `reasoningEffort?: ReasoningEffort` on `ChatRequest` (and +> therefore on WS `chat.send`, which extends it). Applies to THAT turn only; never persists. +> OMIT the key for "no override" (never send `null`/`""`). +> 2. **Persisted per-conversation setting** — `GET /conversations/:id/reasoning-effort` → +> `ReasoningEffortResponse { conversationId, reasoningEffort: ReasoningEffort | null }` +> (`null` = never set ⇒ the default `"high"` applies, NOT "off") and +> `PUT /conversations/:id/reasoning-effort` body `SetReasoningEffortRequest +> { reasoningEffort }`. Takes effect from the NEXT turn. +> Validation: an unrecognized level → HTTP 400 `{ error }` listing the valid levels (same for the +> WS path via the standard `chat.send` error reply). Cache note: CHANGING the level changes the +> provider request shape and can bust the prompt cache for the next turn (one-time re-prefill); +> a stable setting stays cache-safe (warming uses the same resolved effort). +> > **2026-06-12 delta (CR-5 history windowing — package bumped `0.9.0` → `0.10.0`):** NO type-shape > change — `GET /conversations/:id` gains two OPTIONAL query params alongside `sinceSeq`: > **`limit=`** (the NEWEST `k` chunks of the selection, still ASCENDING; a selection with ≤ `k` @@ -126,6 +143,11 @@ - `GET /conversations/:id/lsp` — `LspStatusResponse`. LAZILY spawns+initializes the configured servers on the first call per cwd (can take a moment; cached after); returns once each settles to `connected`/`error`. `servers` is `[]` when `cwd` is null. +- `GET /conversations/:id/reasoning-effort` — `ReasoningEffortResponse` (`reasoningEffort` is `null` + when never set ⇒ default `"high"` applies). Works for an unseen/draft id. +- `PUT /conversations/:id/reasoning-effort` — body `SetReasoningEffortRequest` → + `200 ReasoningEffortResponse`; `400 { error }` on an unrecognized level (the message lists the + valid levels). Persists the conversation's sticky level; effective from the NEXT turn. - WebSocket on :24205 — ONE path-agnostic socket multiplexes surface ops (`@dispatch/ui-contract`) + chat ops (below). Open once, send `WsClientMessage`, receive `WsServerMessage`. Live `AgentEvent` deltas carry `conversationId`+`turnId` but **no `seq`** @@ -150,9 +172,15 @@ */ import type { SurfaceClientMessage, SurfaceServerMessage } from "@dispatch/ui-contract"; -import type { AgentEvent, StoredChunk, TurnMetrics } from "@dispatch/wire"; +import type { AgentEvent, ReasoningEffort, StoredChunk, TurnMetrics } from "@dispatch/wire"; -export type { AgentEvent, StepMetrics, StoredChunk, TurnMetrics } from "@dispatch/wire"; +export type { + AgentEvent, + ReasoningEffort, + StepMetrics, + StoredChunk, + TurnMetrics, +} from "@dispatch/wire"; /** * Request body for `POST /chat` (sent as JSON). @@ -184,6 +212,14 @@ export interface ChatRequest { * prompt (so it does not affect prompt caching). */ readonly cwd?: string; + + /** + * Reasoning-effort override for THIS turn only (does not persist). When + * omitted, the server resolves the conversation's persisted value, falling + * back to `"high"`. Must be one of the `ReasoningEffort` levels; an + * unrecognized value → HTTP 400 `{ error }`. + */ + readonly reasoningEffort?: ReasoningEffort; } /** @@ -315,6 +351,28 @@ export interface SetCwdRequest { readonly cwd: string; } +// ─── Per-conversation reasoning effort ──────────────────────────────────────── + +/** + * Response of `GET /conversations/:id/reasoning-effort`. `reasoningEffort` is + * null when never set (the server then resolves turns at the default, + * `"high"`). + */ +export interface ReasoningEffortResponse { + readonly conversationId: string; + readonly reasoningEffort: ReasoningEffort | null; +} + +/** + * Body of `PUT /conversations/:id/reasoning-effort` — persists the + * conversation's sticky reasoning-effort level (used for every later turn that + * does not carry a per-turn `ChatRequest.reasoningEffort` override). An + * unrecognized level → HTTP 400 `{ error }`. + */ +export interface SetReasoningEffortRequest { + readonly reasoningEffort: ReasoningEffort; +} + // ─── Conversation close (explicit tab close) ────────────────────────────────── /** diff --git a/.dispatch/wire.reference.md b/.dispatch/wire.reference.md index 1d761bf..34984d2 100644 --- a/.dispatch/wire.reference.md +++ b/.dispatch/wire.reference.md @@ -4,8 +4,18 @@ > types WITHOUT following the `file:` dep symlink out of this repo (which hangs on a permission > prompt). Your CODE still imports `@dispatch/wire` normally — this file is for READING only. > -> **Orchestrator:** SNAPSHOT of `wire@0.6.1` (doc-only bump: the 1-based gap-free seq guarantee -> codified on `StoredChunk`). Regenerate whenever `@dispatch/wire` changes. +> **Orchestrator:** SNAPSHOT of `wire@0.7.0` (reasoning effort — the thinking-depth knob). +> Regenerate whenever `@dispatch/wire` changes. +> +> **2026-06-12 delta (reasoning-effort handoff — package bumped `0.6.1` → `0.7.0`, ADDITIVE):** +> adds the **`ReasoningEffort`** type — the per-request thinking-depth ladder +> `"low" | "medium" | "high" | "xhigh" | "max"`. Provider-agnostic; the Anthropic provider maps +> levels to extended-thinking token budgets (low 4096 · medium 10240 · high 16384 · xhigh 32768 · +> max 65536); providers without a thinking knob ignore it. Resolution is SERVER-owned (do not +> re-implement): per-turn `ChatRequest.reasoningEffort` override → persisted per-conversation value +> (`GET`/`PUT /conversations/:id/reasoning-effort`, see `transport-contract@0.11.0`) → default +> `"high"`. Higher levels mean longer runs of `reasoning-delta` events before the first text delta. +> See the `ReasoningEffort` definition below. > > **2026-06-12 delta (CR-5 history windowing — package bumped `0.6.0` → `0.6.1`, DOC-ONLY):** the > per-conversation `seq` numbering is now a WRITTEN CONTRACTUAL GUARANTEE on `StoredChunk`: @@ -196,6 +206,20 @@ export interface StoredChunk { readonly chunk: Chunk; } +// ─── Reasoning effort ─────────────────────────────────────────────────────── + +/** + * The per-request thinking-depth knob: how much extended thinking / reasoning + * the model should spend before answering. Provider-agnostic ladder; each + * provider maps a level to its native knob in its own code (e.g. an Anthropic + * provider maps it to a `thinking.budget_tokens` value) and MAY ignore levels + * (or the field entirely) that its backend cannot express. + * + * Resolution (owned by the session-orchestrator): per-turn request value → + * persisted per-conversation value → default `"high"`. + */ +export type ReasoningEffort = "low" | "medium" | "high" | "xhigh" | "max"; + // ─── Usage ────────────────────────────────────────────────────────────────── /** diff --git a/AGENTS.md b/AGENTS.md index bc16ef5..4c9f3dd 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -140,7 +140,9 @@ context size, cache-warming (+ retention/timer), markdown, smart auto-scroll, mu live view (subscribe/reconnect + the user prompt on the event stream), and the chat limit (bulk quarter-unload past `dispatch.chatLimit`, 75% fresh-load window, show-earlier page-in; `core/chunks/trim.ts`; CR-5 `?limit=`/`?beforeSeq=` CONSUMED — server-windowed cold loads + -show-earlier server backfill; `hasOlder` from the 1-based gap-free seq contract). Plan in +show-earlier server backfill; `hasOlder` from the 1-based gap-free seq contract), and the +reasoning-effort selector (Model view, under the provider/model dropdowns; sticky per-conversation +`GET`/`PUT /reasoning-effort`, `null` ⇒ "high (default)"). Plan in `../arch-rewrite/notes/frontend-design.md` §10. ## Reports diff --git a/GLOSSARY.md b/GLOSSARY.md index a9c7017..90acdd8 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -20,6 +20,7 @@ | **TTFT** (time to first token) | Per-step latency: generation stream start → first content token (text or reasoning). One per step (each step re-prefills). On the wire as `step-complete.ttftMs` / `StepMetrics.ttftMs` (optional). | time-to-first-byte | | **decode time** | Per-step generation time after the first token (first token → stream end = `genTotalMs − ttftMs`). On the wire as `step-complete.decodeMs` / `StepMetrics.decodeMs` (optional). | — | | **context size** | The tokens a conversation currently occupies: the most recent turn's FINAL step `inputTokens + outputTokens` (NOT the aggregate per-turn `usage`, which sums per-step prompts and overcounts a multi-step turn). On the wire as `TurnDoneEvent.contextSize` (live `done`) + `TurnMetrics.contextSize` (persisted); the FE reads the LATEST turn's value as current usage, and treats `undefined` as "unknown" (renders a placeholder, never `0`). Mirrors the backend GLOSSARY. | context usage, context length, tokens used (and do NOT call it "context window" — that's the limit) | +| **reasoning effort** | The per-request thinking-depth knob: how much extended thinking the model spends before answering. Canonical ladder `ReasoningEffort = "low" \| "medium" \| "high" \| "xhigh" \| "max"` (`wire@0.7.0`). Resolution is SERVER-owned (never re-implement): per-turn `ChatRequest.reasoningEffort` override → persisted per-conversation value (`GET`/`PUT /conversations/:id/reasoning-effort`) → default `"high"` — so `null` from the GET means "default (`high`) applies", not "off". Changing the level can bust the prompt cache for the next turn (one-time re-prefill); a stable setting stays cache-safe. | thinking setting, thinking level, effort level, thinking budget | | **context window** | The model's MAXIMUM token capacity (the limit a **context size** is measured against). A FUTURE backend field — not on the wire yet. **Placeholder:** the composer status bar currently HARDCODES a `1,000,000`-token window for the `size / limit · pct%` readout + fill bar; swap to the real per-model value when the backend ships it (see `backend-handoff.md` §3). | max context, token limit (distinct from **context size**, the current usage) | ## Frontend-specific diff --git a/backend-handoff.md b/backend-handoff.md index 17b907b..7c7da05 100644 --- a/backend-handoff.md +++ b/backend-handoff.md @@ -5,17 +5,37 @@ > **From:** dispatch-web orchestrator · **To:** arch-rewrite orchestrator · **Courier:** the user. > `lsp` does NOT span the repos (AGENTS.md § Backend seam) — every cross-repo ask flows through here. -_Last updated: 2026-06-12 (CR-5 consumed). **FE is current on `ui-contract@0.2.0` / -`transport-contract@0.10.0` / `wire@0.6.1`.** All handoffs to date are consumed: surfaces + WS, -conversation transcript/metrics, tabs + model selector, cache-warming (incl. authoritative timer -+ retention + cache-rate fix + the CR-4 lifecycle below), **per-conversation cwd + LSP status**, -**context size**, **turn continuity + multi-client live view**, and the **chat limit + CR-5 -history windowing** (below). +_Last updated: 2026-06-12 (reasoning-effort handoff consumed). **FE is current on +`ui-contract@0.2.0` / `transport-contract@0.11.0` / `wire@0.7.0`.** All handoffs to date are +consumed: surfaces + WS, conversation transcript/metrics, tabs + model selector, cache-warming +(incl. authoritative timer + retention + cache-rate fix + the CR-4 lifecycle below), +**per-conversation cwd + LSP status**, **context size**, **turn continuity + multi-client live +view**, the **chat limit + CR-5 history windowing**, and the **reasoning effort +(thinking-depth knob)** (below). **Open asks: NONE.** CR-1/CR-2/CR-4/CR-5 all RESOLVED ✅ (see §2); §3 lists likely next asks. **CR-3 (watcher couldn't see the USER prompt until seal) → RESOLVED ✅** — backend shipped the `user-message` turn event; FE re-pinned + consumption live. The cwd/LSP draft-path verification (`backend-handoff-cwd-lsp.md`) came back **all ✅ confirmed**._ +**Reasoning-effort handoff (`frontend-reasoning-effort-handoff.md`) → CONSUMED ✅ +(curl-probed live: GET null on unseen id · PUT `xhigh` → echo + sticky GET · bad level → 400 +listing the ladder · CORS preflight allows PUT).** Re-pinned `wire@0.6.1→0.7.0` + +`transport-contract@0.10.0→0.11.0`; re-mirrored both `.dispatch/*.reference.md`; added +"reasoning effort" to FE `GLOSSARY.md`. FE work: a **per-conversation effort selector** in the +sidebar's **Model view**, under the provider + model dropdowns +(`features/chat/ui/ReasoningEffortSelector.svelte`, pure helpers in +`features/chat/reasoning-effort.ts`): renders `null` as "high (default)" per the server-owned +resolution chain, PUTs on change (effective next turn), shows the save error + reverts on 400, +disables while in flight; re-mounted per conversation (incl. drafts — the draft id survives +promotion, so an effort set on a draft applies from turn 1, same pattern as cwd). The app store +seeds it on every focus change via `GET /conversations/:id/reasoning-effort` (cleared first so a +switch never flashes the previous conversation's level) and exposes +`reasoningEffort`/`setReasoningEffort`. The optional per-turn `chat.send` override is NOT built +(no composer affordance yet — `chat.send` still omits the key, which the contract specifies as +"no override"). The "expect more thinking" note needs no change: the transcript already renders +arbitrary runs of reasoning deltas, and `generating` is structural (not timer-based). 616 tests +green. NO new backend ask._ + **CR-4 cache-warming lifecycle (`frontend-cache-warming-lifecycle-handoff.md`) → CONSUMED ✅ (live-probed 17/17 against `bin/up`).** Re-pinned `ui-contract@0.1.0→0.2.0` + `transport-contract@0.8.0→0.9.0` (`wire` unchanged); re-mirrored both `.dispatch/*.reference.md`. FE @@ -61,25 +81,26 @@ backend ask — but the max-limit denominator is now a live FE need; see §3. ## 1. Pinned backend contracts (consumed by the FE) -Pinned as `file:` deps: **`ui-contract@0.2.0`; `wire@0.6.1`; `transport-contract@0.10.0`**. +Pinned as `file:` deps: **`ui-contract@0.2.0`; `wire@0.7.0`; `transport-contract@0.11.0`**. | Package | Used for | |---|---| | `@dispatch/ui-contract` | surfaces + surface WS protocol | -| `@dispatch/wire` | `Chunk`/`StoredChunk`(+`seq`)/`ChatMessage`/`AgentEvent`/`TurnSealedEvent`/`Usage`/`StepId` + metrics: `StepMetrics`/`TurnMetrics`, `usage.stepId`, `step-complete`, `done.durationMs`/`done.usage`, `tool-result.durationMs`, **`done.contextSize`/`TurnMetrics.contextSize`** | -| `@dispatch/transport-contract` | `ChatRequest`/`ModelsResponse`/`ConversationHistoryResponse`/`ConversationMetricsResponse` + `WarmRequest`/`WarmResponse` + `CwdResponse`/`SetCwdRequest` + LSP (`LspStatusResponse`/`LspServerInfo`/`LspServerState`) + WS chat ops + `WsClientMessage`/`WsServerMessage` | +| `@dispatch/wire` | `Chunk`/`StoredChunk`(+`seq`)/`ChatMessage`/`AgentEvent`/`TurnSealedEvent`/`Usage`/`StepId` + metrics: `StepMetrics`/`TurnMetrics`, `usage.stepId`, `step-complete`, `done.durationMs`/`done.usage`, `tool-result.durationMs`, **`done.contextSize`/`TurnMetrics.contextSize`**, **`ReasoningEffort`** | +| `@dispatch/transport-contract` | `ChatRequest`(+`reasoningEffort`)/`ModelsResponse`/`ConversationHistoryResponse`/`ConversationMetricsResponse` + `WarmRequest`/`WarmResponse` + `CwdResponse`/`SetCwdRequest` + `ReasoningEffortResponse`/`SetReasoningEffortRequest` + LSP (`LspStatusResponse`/`LspServerInfo`/`LspServerState`) + WS chat ops + `WsClientMessage`/`WsServerMessage` | Endpoints in use (HTTP **24203**, WS **24205**, CORS `*` incl. `PUT`): `POST /chat` (NDJSON) · `GET /models` · `GET /conversations/:id?sinceSeq=&beforeSeq=&limit=` (CR-5 windowing) · `GET /conversations/:id/metrics` · `GET`/`PUT /conversations/:id/cwd` · +`GET`/`PUT /conversations/:id/reasoning-effort` (sticky thinking-depth; `null` ⇒ default `high`) · `GET /conversations/:id/lsp` · `POST /chat/warm` · `POST /conversations/:id/close` (explicit tab-close: abort turn + stop/disable warming) · WS `chat.send`→`chat.delta` · WS `chat.subscribe`/`chat.unsubscribe` (watch a conversation's turns without sending; replay + live). Mirrored in-repo for headless agents: `.dispatch/{ui-contract,wire,transport-contract}.reference.md` (regenerate on any contract bump; all current as of `ui-contract@0.2.0` / -`transport-contract@0.10.0` / `wire@0.6.1`). +`transport-contract@0.11.0` / `wire@0.7.0`). ## 2. Open asks FOR THE BACKEND diff --git a/src/app/App.svelte b/src/app/App.svelte index 4c5a82b..dffa937 100644 --- a/src/app/App.svelte +++ b/src/app/App.svelte @@ -1,4 +1,5 @@ + +
+ Reasoning effort +
+ + {#if saving} + + {/if} +
+ {#if error} +

{error}

+ {:else if justSaved} +

Saved — applies from the next turn.

+ {:else} +

+ How long the model thinks before answering. Changing it can re-prefill the prompt cache once. +

+ {/if} +
-- cgit v1.2.3