1 files changed, 3 insertions, 1 deletions
diff --git a/GLOSSARY.md b/GLOSSARY.md
index 90acdd8..e350ffa 100644
--- a/GLOSSARY.md
+++ b/GLOSSARY.md
@@ -22,6 +22,8 @@
 | **context size** | The tokens a conversation currently occupies: the most recent turn's FINAL step `inputTokens + outputTokens` (NOT the aggregate per-turn `usage`, which sums per-step prompts and overcounts a multi-step turn). On the wire as `TurnDoneEvent.contextSize` (live `done`) + `TurnMetrics.contextSize` (persisted); the FE reads the LATEST turn's value as current usage, and treats `undefined` as "unknown" (renders a placeholder, never `0`). Mirrors the backend GLOSSARY. | context usage, context length, tokens used (and do NOT call it "context window" — that's the limit) |
 | **reasoning effort** | The per-request thinking-depth knob: how much extended thinking the model spends before answering. Canonical ladder `ReasoningEffort = "low" \| "medium" \| "high" \| "xhigh" \| "max"` (`[email protected]`). Resolution is SERVER-owned (never re-implement): per-turn `ChatRequest.reasoningEffort` override → persisted per-conversation value (`GET`/`PUT /conversations/:id/reasoning-effort`) → default `"high"` — so `null` from the GET means "default (`high`) applies", not "off". Changing the level can bust the prompt cache for the next turn (one-time re-prefill); a stable setting stays cache-safe. | thinking setting, thinking level, effort level, thinking budget |
 | **context window** | The model's MAXIMUM token capacity (the limit a **context size** is measured against). A FUTURE backend field — not on the wire yet. **Placeholder:** the composer status bar currently HARDCODES a `1,000,000`-token window for the `size / limit · pct%` readout + fill bar; swap to the real per-model value when the backend ships it (see `backend-handoff.md` §3). | max context, token limit (distinct from **context size**, the current usage) |
+| **message queue** | A per-conversation buffer of user messages awaiting mid-turn **steering** delivery (owned by the `message-queue` backend extension). Transient + in-memory; exposed to the FE as a per-conversation **surface** (`message-queue`, scope `conversation`; one `custom` field, `rendererId: "message-queue"`, `payload: QueuePayload`). NOT on the chat stream — it is control/state. Enqueued via `chat.queue` (WS) or `POST /conversations/:id/queue` (HTTP); auto-starts a turn if idle. `[email protected]`. | pending messages, steering queue |
+| **steering** | A user message injected into an in-flight turn at the tool-result boundary (drawn from the **message queue**): the model sees it alongside the tool results and may adjust course. Emitted on the chat stream as a `steering` `AgentEvent` (`TurnSteeringEvent`); the queue surface clears on drain (move, don't duplicate). If the turn ends with a non-empty queue (no tool call fired), the queue carries into a NEW turn as its opening prompt (no `steering` event). `[email protected]`. | mid-turn injection, course correction, interruption |
 
 ## Frontend-specific
 | Term | Meaning | Aliases to avoid |
@@ -38,6 +40,6 @@
 | **surface interpreter** | The generic renderer: field kind → component. Knows kinds, never surface ids. | — |
 | **metrics bubble** | The FE chat element that renders a turn's **turn metrics** (one per-turn total) and **step metrics** (one per step) as muted system-style bubbles at a turn's tail. UI presentation of `TurnMetrics`/`StepMetrics`; never a surface. | telemetry bubble, usage bubble, stats bubble |
 | **TPS** (tokens per second) | A FE-DERIVED decode rate: `outputTokens / (decodeMs / 1000)` (per step; per turn over Σ `decodeMs`), falling back to `genTotalMs` when `decodeMs` is absent. The backend-recommended basis (excludes first-token latency). Not carried on the wire; omitted when timing is absent. | throughput |
-| **chat limit** | The max LOADED chunks per conversation (default 256; localStorage `dispatch.chatLimit`, no UI yet) before the oldest quarter is unloaded. Counts **chunks** (committed + provisional + accumulating). Policy in `core/chunks/trim.ts`. | chunk limit, message limit, history limit |
+| **chat limit** | The max LOADED chunks per conversation (default 256; persisted at localStorage `dispatch.chatLimit`, settable via the sidebar's Settings view) before the oldest quarter is unloaded. Counts **chunks** (committed + provisional + accumulating). Policy in `core/chunks/trim.ts`. | chunk limit, message limit, history limit |
 | **unload** | Drop the oldest COMMITTED chunks from the in-memory transcript (and DOM) past the **chat limit** — in BULK (`ceil(limit/4)` per pass, deferred while the reader is scrolled up), never one-per-delta (old Dispatch's scroll-jump bug). Purely local: the IndexedDB cache and the server keep everything; `TranscriptState.hiddenBeforeSeq` is the watermark. Distinct from the conversation-cache's cross-conversation **eviction**. | evict (reserved for the cross-conversation cache), prune, drop |
 | **show earlier** | The affordance at the top of a transcript with unloaded history ("Show earlier messages"): pages one unload-unit back in — local cache first, then the server (CR-5 `?beforeSeq=&limit=`) when the cache doesn't reach far enough back — preserving the reader's scroll position. Offered whenever the loaded window starts above seq 1 (the [email protected] 1-based gap-free seq contract). | load more, pagination |