feat(chat): reasoning-effort selector — sticky per-conversation thinking-depth knob

Consume the backend's reasoning-effort handoff ([email protected] ReasoningEffort + [email protected] GET/PUT /conversations/:id/reasoning-effort, ChatRequest.reasoningEffort): a 5-level selector in the sidebar Model view, under the provider + model dropdowns. null renders as 'high (default)' per the server-owned resolution chain; PUT on change (effective next turn); error + revert on 400; per-conversation re-mount incl. drafts (the draft id survives promotion, so an effort set on a draft applies from turn 1). Re-mirrored .dispatch references; GLOSSARY 'reasoning effort'; handoff updated. 616 tests green; live curl probe passed.
author: Adam Malczewski <[email protected]> 2026-06-12 20:38:57 +0900
committer: Adam Malczewski <[email protected]> 2026-06-12 20:38:57 +0900
commit: baa6f6c9d21de2f6ffc60e00f53c61d026155933 (patch)
tree: fecae91d99d906a7b5054b398e4d3d90894567a0 /GLOSSARY.md
parent: 7dcc06eecb5b691b0c0daec26db9d5e407d0a60e (diff)
download: dispatch-web-baa6f6c9d21de2f6ffc60e00f53c61d026155933.tar.gz
dispatch-web-baa6f6c9d21de2f6ffc60e00f53c61d026155933.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/GLOSSARY.md b/GLOSSARY.md
index a9c7017..90acdd8 100644
--- a/GLOSSARY.md
+++ b/GLOSSARY.md
@@ -20,6 +20,7 @@
 | **TTFT** (time to first token) | Per-step latency: generation stream start → first content token (text or reasoning). One per step (each step re-prefills). On the wire as `step-complete.ttftMs` / `StepMetrics.ttftMs` (optional). | time-to-first-byte |
 | **decode time** | Per-step generation time after the first token (first token → stream end = `genTotalMs − ttftMs`). On the wire as `step-complete.decodeMs` / `StepMetrics.decodeMs` (optional). | — |
 | **context size** | The tokens a conversation currently occupies: the most recent turn's FINAL step `inputTokens + outputTokens` (NOT the aggregate per-turn `usage`, which sums per-step prompts and overcounts a multi-step turn). On the wire as `TurnDoneEvent.contextSize` (live `done`) + `TurnMetrics.contextSize` (persisted); the FE reads the LATEST turn's value as current usage, and treats `undefined` as "unknown" (renders a placeholder, never `0`). Mirrors the backend GLOSSARY. | context usage, context length, tokens used (and do NOT call it "context window" — that's the limit) |
+| **reasoning effort** | The per-request thinking-depth knob: how much extended thinking the model spends before answering. Canonical ladder `ReasoningEffort = "low" \| "medium" \| "high" \| "xhigh" \| "max"` (`[email protected]`). Resolution is SERVER-owned (never re-implement): per-turn `ChatRequest.reasoningEffort` override → persisted per-conversation value (`GET`/`PUT /conversations/:id/reasoning-effort`) → default `"high"` — so `null` from the GET means "default (`high`) applies", not "off". Changing the level can bust the prompt cache for the next turn (one-time re-prefill); a stable setting stays cache-safe. | thinking setting, thinking level, effort level, thinking budget |
 | **context window** | The model's MAXIMUM token capacity (the limit a **context size** is measured against). A FUTURE backend field — not on the wire yet. **Placeholder:** the composer status bar currently HARDCODES a `1,000,000`-token window for the `size / limit · pct%` readout + fill bar; swap to the real per-model value when the backend ships it (see `backend-handoff.md` §3). | max context, token limit (distinct from **context size**, the current usage) |
 
 ## Frontend-specific
author	Adam Malczewski <[email protected]>	2026-06-12 20:38:57 +0900
committer	Adam Malczewski <[email protected]>	2026-06-12 20:38:57 +0900
commit	baa6f6c9d21de2f6ffc60e00f53c61d026155933 (patch)
tree	fecae91d99d906a7b5054b398e4d3d90894567a0 /GLOSSARY.md
parent	7dcc06eecb5b691b0c0daec26db9d5e407d0a60e (diff)
download	dispatch-web-baa6f6c9d21de2f6ffc60e00f53c61d026155933.tar.gz dispatch-web-baa6f6c9d21de2f6ffc60e00f53c61d026155933.zip