diff options
| author | Adam Malczewski <[email protected]> | 2026-06-12 20:38:57 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-12 20:38:57 +0900 |
| commit | baa6f6c9d21de2f6ffc60e00f53c61d026155933 (patch) | |
| tree | fecae91d99d906a7b5054b398e4d3d90894567a0 /GLOSSARY.md | |
| parent | 7dcc06eecb5b691b0c0daec26db9d5e407d0a60e (diff) | |
| download | dispatch-web-baa6f6c9d21de2f6ffc60e00f53c61d026155933.tar.gz dispatch-web-baa6f6c9d21de2f6ffc60e00f53c61d026155933.zip | |
feat(chat): reasoning-effort selector — sticky per-conversation thinking-depth knob
Consume the backend's reasoning-effort handoff ([email protected] ReasoningEffort +
[email protected] GET/PUT /conversations/:id/reasoning-effort,
ChatRequest.reasoningEffort): a 5-level selector in the sidebar Model view,
under the provider + model dropdowns. null renders as 'high (default)' per
the server-owned resolution chain; PUT on change (effective next turn);
error + revert on 400; per-conversation re-mount incl. drafts (the draft id
survives promotion, so an effort set on a draft applies from turn 1).
Re-mirrored .dispatch references; GLOSSARY 'reasoning effort'; handoff
updated. 616 tests green; live curl probe passed.
Diffstat (limited to 'GLOSSARY.md')
| -rw-r--r-- | GLOSSARY.md | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/GLOSSARY.md b/GLOSSARY.md index a9c7017..90acdd8 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -20,6 +20,7 @@ | **TTFT** (time to first token) | Per-step latency: generation stream start → first content token (text or reasoning). One per step (each step re-prefills). On the wire as `step-complete.ttftMs` / `StepMetrics.ttftMs` (optional). | time-to-first-byte | | **decode time** | Per-step generation time after the first token (first token → stream end = `genTotalMs − ttftMs`). On the wire as `step-complete.decodeMs` / `StepMetrics.decodeMs` (optional). | — | | **context size** | The tokens a conversation currently occupies: the most recent turn's FINAL step `inputTokens + outputTokens` (NOT the aggregate per-turn `usage`, which sums per-step prompts and overcounts a multi-step turn). On the wire as `TurnDoneEvent.contextSize` (live `done`) + `TurnMetrics.contextSize` (persisted); the FE reads the LATEST turn's value as current usage, and treats `undefined` as "unknown" (renders a placeholder, never `0`). Mirrors the backend GLOSSARY. | context usage, context length, tokens used (and do NOT call it "context window" — that's the limit) | +| **reasoning effort** | The per-request thinking-depth knob: how much extended thinking the model spends before answering. Canonical ladder `ReasoningEffort = "low" \| "medium" \| "high" \| "xhigh" \| "max"` (`[email protected]`). Resolution is SERVER-owned (never re-implement): per-turn `ChatRequest.reasoningEffort` override → persisted per-conversation value (`GET`/`PUT /conversations/:id/reasoning-effort`) → default `"high"` — so `null` from the GET means "default (`high`) applies", not "off". Changing the level can bust the prompt cache for the next turn (one-time re-prefill); a stable setting stays cache-safe. | thinking setting, thinking level, effort level, thinking budget | | **context window** | The model's MAXIMUM token capacity (the limit a **context size** is measured against). A FUTURE backend field — not on the wire yet. **Placeholder:** the composer status bar currently HARDCODES a `1,000,000`-token window for the `size / limit · pct%` readout + fill bar; swap to the real per-model value when the backend ships it (see `backend-handoff.md` §3). | max context, token limit (distinct from **context size**, the current usage) | ## Frontend-specific |
