summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-06-02 13:57:41 +0900
committerAdam Malczewski <[email protected]>2026-06-02 13:57:41 +0900
commitd27d97bb3aa0c13f4032bab54703ebb9e1c84c81 (patch)
treeb5bcfed65be5a4d27a7bbe6b46e338dcd489c2e0
parent3671b82cc624117476e30b95eaf7d2bc3b34ae28 (diff)
parentc1439ea8c677ddfd11c219de39c3e77c7e297a9b (diff)
downloaddispatch-d27d97bb3aa0c13f4032bab54703ebb9e1c84c81.tar.gz
dispatch-d27d97bb3aa0c13f4032bab54703ebb9e1c84c81.zip
Merge branch 'dev' into u3/agent-effort-level
# Conflicts: # packages/api/tests/agent-manager.test.ts
-rw-r--r--notes/wishlist.md88
-rw-r--r--packages/api/src/agent-manager.ts40
-rw-r--r--packages/api/src/routes/models.ts16
-rw-r--r--packages/api/src/routes/tabs.ts6
-rw-r--r--packages/api/tests/agent-manager.test.ts229
-rw-r--r--packages/api/tests/routes.test.ts114
-rw-r--r--packages/core/src/chunks/transform.ts5
-rw-r--r--packages/core/src/db/chunks.ts85
-rw-r--r--packages/core/src/index.ts7
-rw-r--r--packages/core/src/models/catalog.ts179
-rw-r--r--packages/core/src/models/index.ts4
-rw-r--r--packages/core/src/types/index.ts61
-rw-r--r--packages/core/tests/db/chunks.db.test.ts336
-rw-r--r--packages/core/tests/models/catalog.test.ts158
-rw-r--r--packages/frontend/src/App.svelte57
-rw-r--r--packages/frontend/src/lib/components/ContextWindowPanel.svelte85
-rw-r--r--packages/frontend/src/lib/components/SidebarPanel.svelte11
-rw-r--r--packages/frontend/src/lib/context-window.ts37
-rw-r--r--packages/frontend/src/lib/tabs.svelte.ts17
-rw-r--r--packages/frontend/src/lib/types.ts7
-rw-r--r--packages/frontend/tests/chat-store.test.ts333
-rw-r--r--packages/frontend/tests/context-window.test.ts84
22 files changed, 1916 insertions, 43 deletions
diff --git a/notes/wishlist.md b/notes/wishlist.md
index f4ecbeb..203f42a 100644
--- a/notes/wishlist.md
+++ b/notes/wishlist.md
@@ -1,45 +1,83 @@
# Wishlist
-- **Persist dashboard layout and chat history across sessions and devices.**
- - Restore any tabs that were left open when revisiting the page, including their order and active state.
- - If a chat was mid-generation (AI actively calling tools and streaming responses), automatically resume and continue from where it left off — even if the page was closed.
- - Chats continue processing server-side even when the frontend is entirely closed, meaning the AI keeps generating responses and calling tools without any browser open.
- - Start a chat on one device (e.g. desktop) and seamlessly pick it up later on another (e.g. phone).
- - Sidebar remembers which views were open and in what order, restoring them exactly as they were.
+## Session Persistence & Cross-Device Continuity
-- **Update the way tools appear in the chat UI.** Improve the visual presentation of tool calls and their results — make them more readable, compact, and scannable.
+- `[partial]` **Resume mid-generation after page close.** If a chat was mid-generation (AI actively calling tools and streaming responses), automatically resume and continue from where it left off — even if the page was closed. *(Currently catches up via TabStatusSnapshot over WebSocket, but in-flight chunks are in-memory only and lost on server restart.)*
-- **Show git diffs for edited files.** When the AI edits a file (write_file tool call), display a git diff in the UI rather than just the raw file content.
+- `[pending]` **Auto-close subtabs when parent tab is closed.** When the user closes a tab, automatically close all its subtabs first (cancelling any in-progress generation), then close the parent tab.
-- **Show live shell output in a collapsible block.** When a shell command is running, show live stdout/stderr in a collapsible shell block (similar to the thinking block), instead of requiring the user to expand the tool call and read raw JSON.
+- `[pending]` **Tab forking.** Allow the user to go to any message in a tab and click "fork" to create a new tab that branches the conversation from just before that message. The forked tab must correctly inherit and continue the caching context so that cache hits are preserved across the fork. Additionally, support agent-initiated forking: agents (both user agents and subagents) can fork a tab by receiving a message along with a tab ID, causing the system to fork from that point instead of starting fresh. The system should automatically resolve the correct agent, model, key, and tool set for the forked tab based on the source tab's configuration.
-- **ntfy push notifications.** Configurable ntfy.sh notifications — ping on chat completion, errors, permission prompts, and other events. Configure topic URL and which events trigger notifications.
+## Tool Call & Output Display
-- **Fix the todo system.** The current task list tool and its UI have bugs or limitations that need addressing.
+- `[partial]` **Update the way tools appear in the chat UI.** Improve the visual presentation of tool calls and their results — make them more readable, compact, and scannable. *(Collapsible blocks with status badges exist, but args/results are raw JSON in `<pre>` tags with no tool-type-specific visualizations.)*
-- **Track token usage in a tab.** Display token usage (e.g. prompt/completion/total tokens) for the chat within each tab.
+- `[pending]` **Show git diffs for edited files.** When the AI edits a file (write_file tool call), display a git diff in the UI rather than just the raw file content.
-- **Compaction tool.** A tool to compact/summarize the conversation history to reduce context size while preserving important information.
+- `[partial]` **Show live shell output in a collapsible block.** When a shell command is running, show live stdout/stderr in a collapsible shell block (similar to the thinking block), instead of requiring the user to expand the tool call and read raw JSON. *(Backend streams shell-output events in real-time and frontend shows them, but inside the tool call collapse — not a separate auto-expanding/auto-scrolling block.)*
-- **Make the plus button on tabs always on top and to the left.** The "+" button for creating new tabs is currently mixed in with the scrollable tab list. It should be fixed/absolute positioned at the top-left of the tab bar so it's always visible regardless of horizontal scrolling.
+## Context & Token Management
-- **Add a status bar under the chatbox with the send button.** Move the send button into a status bar that sits below the chat input/textarea. The status bar could show generation status, token counts, etc. Also consider whether we even need a send button at all — pressing Enter already sends the message, so the button may be redundant.
+- `[partial]` **Track token usage in a tab.** Display token usage (e.g. prompt/completion/total tokens) for the chat within each tab. Also track and display cache hit rate alongside it. Cache hit rate data should be loaded in the frontend on every turn regardless of whether the CacheRatePanel sidebar view is open, so it's always available at a glance. *(Backend emits usage events and a CacheRatePanel sidebar view exists, but nothing in the tab bar or chat panel itself, and stats are only populated when the panel is mounted.)*
-- **Move the copy button into a new "Debug" sidebar view.** The "Copy" button in the header copies the full conversation to clipboard. Move it into a new "Debug" sidebar panel/View that groups dev-facing actions.
+- `[pending]` **Compaction tool.** A tool to compact/summarize the conversation history to reduce context size while preserving important information.
-- **Move the theme picker into the Settings panel.** The "Theme" button in the header currently opens a modal ThemeSwitcher. Move theme selection into the existing "Settings" sidebar panel so there's one place for all settings, decluttering the header.
+## UI / UX Polish & Reorganization
-- **Update the sidebar button to a hamburger icon.** The sidebar toggle button currently just says "Sidebar" text. Replace it with a proper hamburger/three-line icon (☰) for a cleaner, more standard UI.
+- `[pending]` **Per-model/key effort level setting in agents page.** Allow setting the effort level (e.g. low, medium, high) for each model/key set directly in the agents configuration page. Display the configured effort level in the agents view so it is visible at a glance alongside the model and key info.
-- **Adopt Phosphor icons.** Start using the Phosphor icon set throughout the UI to replace text labels and ad-hoc SVG icons with a consistent, high-quality icon library.
+- `[pending]` **Per-tab chat input state.** Each tab should have its own chat input box. When switching tabs, the unsent text in the current tab should be saved and the text for the newly selected tab should be restored — so draft messages are never lost or clobbered by tab switching.
-- **Fix the Claude reset system.** The "Claude Wake Schedule" panel (ClaudeReset.svelte) allows scheduling model wake/reset times. There are bugs or limitations in the current implementation that need fixing — get it working reliably.
+- `[pending]` **Image attachments for supported models.** Allow attaching and uploading images in the chat input for models that support vision/multimodal input. Before sending, check (e.g. via a capabilities ping or metadata lookup) whether the current model supports image input — if it does not, show a clear message instead of silently failing.
-- **Tab-to-tab agent communication via visible IDs.** Each tab gets a short, human-readable unique ID visible somewhere in the UI (e.g. in the tab bar or a sidebar view). Agents get a tool that lets them send a user message to another agent by its tab ID, and another tool to retrieve the last turn's response from that tab. When an agent sends a message this way:
- - If the target agent is mid-turn, the message is queued (same as a user message).
- - If the target agent is idle, the message wakes it up and starts a new turn.
- This enables workflows like giving an agent a tab ID and asking it to steer another AI, chain agents together, or have one agent delegate subtasks to another.
+- `[pending]` **Better tab controls.** Add tab drag-and-drop to reorder tabs and double-click tab title to rename (click away or press Enter to confirm the new name).
-- **"User agents" — summon counterpart to subagents.** Currently agents can summon subagents which are owned by a parent tab (they appear indented under the parent in the tab bar). Add a "user agent" summon variant that spawns a standard top-level tab owned by the user rather than by another tab. This gives agents the ability to open new independent tabs (like a user would), enabling more complex multi-agent workflows where spawned agents persist as first-class tabs.
+### Layout & Positioning
+
+- `[pending]` **Make the plus button on tabs always on top and to the left.** The "+" button for creating new tabs is currently mixed in with the scrollable tab list. It should be fixed/absolute positioned at the top-left of the tab bar so it's always visible regardless of horizontal scrolling.
+
+- `[pending]` **Add a status bar under the chatbox with the send button.** Move the send button into a status bar that sits below the chat input/textarea. The status bar could show generation status, token counts, etc. Also consider whether we even need a send button at all — pressing Enter already sends the message, so the button may be redundant.
+
+## PWA
+
+- `[pending]` **PWA support with cache busting.** Add Progressive Web App support with a proper cache busting solution. The frontend should have a static `version.json` file that can be fetched at any time to check whether the current PWA version is out of date. Cache the current PWA version locally so we can compare against the remote `version.json` and know exactly when to unregister the service worker and reload the new version.
+
+## New Tools
+
+- `[pending]` **Implement a search code tool utilizing [cs](https://github.com/boyter/cs).** Add a dedicated tool that lets the agent search through the codebase using [cs](https://github.com/boyter/cs) — a fast code search utility. This would provide more efficient and targeted code search than relying on generic shell commands like `grep` or `find`.
+
+- `[pending]` **Key usage levels tool.** Add a tool that lets the agent read the current usage levels of API keys — including request counts, token consumption, rate limit proximity, and any other relevant metrics. This would allow the agent to make informed decisions about key selection, proactively warn about approaching limits, and help troubleshoot when requests start failing due to exhausted keys.
+
+## Workspaces
+
+- `[pending]` **Workspaces feature.** Allow users to organize tabs into separate workspaces. The homepage (`/`) shows a workspaces dashboard with all existing workspaces listed, where the user can click to open one. Any other sub-path (e.g., `/my_project`) acts as a new workspace — if the user visits a path that doesn't exist yet, prompt them to create it. Each workspace maintains its own set of open tabs, independent of other workspaces.
+ - `[pending]` **Workspace-scoped agents.** There is a general global agents configuration, but each workspace can also define agents scoped specifically to that workspace, overriding or extending the global set.
+
+## Reliability & Bug Fixes
+
+
+- `[partial]` **Fix the todo system.** The current task list tool and its UI have bugs or limitations that need addressing. *(The TaskList class and todo tool work with clean validation, but there's no dedicated frontend UI panel for todos beyond sidebar references.)*
+
+- `[partial]` **Fix the Claude reset system.** The "Claude Wake Schedule" panel (ClaudeReset.svelte) allows scheduling model wake/reset times. There are bugs or limitations in the current implementation that need fixing — get it working reliably. *(Major improvements made — SnapshotSequencer, global mutation lock, 4-probe coalescing, boot recovery — but server-side request reordering can still desync UI, and toggle endpoint ignores client intent.)*
- **Fix key switching not migrating context correctly.** When switching API keys (e.g. hitting usage limits on one key and switching to another), the new agent appears to receive only the initial system prompt — all subsequent thinking, tool calls, and conversation history are lost. The full chat context including all turns needs to be properly passed to the new key/model so the conversation continues seamlessly.
+
+- `[pending]` **Fix Mimo incorrect thinking levels.** Mimo doesn't have a "max" thinking level — the current hardcoded options are wrong. Explore dynamically obtaining the available thinking levels from the provider (e.g. via API metadata or model capabilities) rather than relying on static assumptions.
+
+- `[pending]` **Fix AI automatic tab naming.** The AI-powered automatic tab naming feature doesn't appear to be doing anything — tabs aren't being automatically renamed based on conversation content. Investigate and fix so that tabs get meaningful auto-generated names.
+
+- `[pending]` **Fix Chat Settings vs agent setting conflict.** Manual settings in the Chat Settings panel don't properly take effect — the agent-level setting is secretly overriding them, creating a confusing conflict where the user's explicit settings are silently ignored.
+
+- `[pending]` **Fix agent tools leaking across tabs.** Changing an agent in one tab causes its tools to persist globally across all tabs — switching tabs doesn't restore the correct per-tab tools. Tools should be loaded and persisted per-tab from the backend, not stored in shared frontend state. Investigate to determine the best solution for per-tab tool isolation.
+
+- `[pending]` **Fix agent and manual model setting changing on tab switch.** When switching tabs, the current agent selection and manual model override appear to change unexpectedly — possibly due to state leaking between tabs similar to the tools issue above. Investigate alongside the tools isolation fix.
+
+- `[pending]` **Backgrounding is too aggressive.** Agents sometimes background shell calls or subagents unnecessarily and then invoke shell calls with `sleep` to wait for them to finish. If the agent is just going to wait for results anyway, it should not background the calls in the first place — avoid the wasteful pattern of backgrounding then sleeping to await completion.
+
+## Minor Fixes
+
+- `[pending]` **Cache rate view requests bubble text wrapping.** The requests count bubble (e.g. "36 req") in the Cache Rate panel wraps when it shouldn't — should stay on one line with `whitespace-nowrap`.
+
+- `[pending]` **Remove cache cost explanation from Cache Rate panel.** Remove the "Cache reads cost ~10% of fresh input; writes cost ~25% more..." paragraph from CacheRatePanel.svelte.
+
+- `[pending]` **Key usage bar coloring.** In the key usage view, bar color should be: green if less than the time dot, orange if to the right of the time dot, red if greater than 90% in any case. \ No newline at end of file
diff --git a/packages/api/src/agent-manager.ts b/packages/api/src/agent-manager.ts
index d03e696..d339fbd 100644
--- a/packages/api/src/agent-manager.ts
+++ b/packages/api/src/agent-manager.ts
@@ -36,6 +36,7 @@ import {
getMessagesForTab,
getSetting,
getTab,
+ getUsageStatsForTab,
listOpenTabs,
loadAgent,
loadAgents,
@@ -56,6 +57,8 @@ import {
TaskList,
toAvailableSubagents,
toAvailableUserAgents,
+ type UsageData,
+ type UsageStats,
validateConfig,
} from "@dispatch/core";
import type { PermissionManager } from "./permission-manager.js";
@@ -1483,6 +1486,10 @@ export class AgentManager {
// turn (text / thinking / tool-batch / error / system), folded from
// the stream via the shared `appendEventToChunks` helper.
const chunks: Chunk[] = [];
+ // Per-attempt usage accumulator. Reset each fallback attempt so a
+ // superseded (rate-limited) attempt's usage is discarded alongside its
+ // `chunks`. One `usage` event → one UsageData row.
+ const usageRows: UsageData[] = [];
const assistantId = crypto.randomUUID();
let assistantPersisted = false;
tabAgent.currentChunks = chunks;
@@ -1493,8 +1500,17 @@ export class AgentManager {
// `tool-batch` into separate `tool_call` + `tool_result` rows and
// tags every row with `turn_id` + derived `step`.
const flushAssistant = (): void => {
- if (assistantPersisted || chunks.length === 0) return;
- appendChunks(tabId, explodeTurn(turnId, chunks));
+ if (assistantPersisted) return;
+ // Append usage as extra drafts in the SAME appendChunks call as the
+ // turn's content rows: one atomic write, one fsync, contiguous seqs.
+ // Usage rows are an invisible side channel (excluded from
+ // getChunksForTab); `step` is cosmetic for usage (never grouped).
+ const drafts = explodeTurn(turnId, chunks);
+ for (const u of usageRows) {
+ drafts.push({ turnId, step: 0, role: "assistant", type: "usage", data: u });
+ }
+ if (drafts.length === 0) return;
+ appendChunks(tabId, drafts);
assistantPersisted = true;
};
@@ -1548,6 +1564,15 @@ export class AgentManager {
allOutput += event.delta;
}
+ // Capture per-step usage as a side-channel row to persist with the
+ // turn (one row per `usage` event). The live `this.emit(event)`
+ // above still drives in-session accumulation; this is the reload-
+ // persistence path. `appendEventToChunks` intentionally ignores
+ // `usage`, so it never becomes message content.
+ if (event.type === "usage") {
+ usageRows.push({ ...event.usage });
+ }
+
// Route every content-bearing event through the shared helper.
// `appendEventToChunks` ignores lifecycle events (status / done
// / task-list-update / tab-created / message-* / etc), so it's
@@ -1622,7 +1647,16 @@ export class AgentManager {
// above). Signal the frontend that the turn's rows — with real seqs — are
// durable so it can fold its live representation into the sealed log.
// Emitted AFTER status:idle/error (which fire before the DB write).
- this.emit({ type: "turn-sealed", turnId }, tabId);
+ // Carry the authoritative usage aggregate (read AFTER the usage rows were
+ // persisted) so the frontend reconciles its live cacheStats to the DB truth
+ // — self-healing the live overshoot from a discarded rate-limited attempt.
+ let usageStats: UsageStats | null = null;
+ try {
+ usageStats = getUsageStatsForTab(tabId);
+ } catch {
+ // DB read failed — omit reconciliation rather than crash the turn.
+ }
+ this.emit({ type: "turn-sealed", turnId, usageStats }, tabId);
// Turn fully settled — clear the shared turn id.
tabAgent.currentTurnId = null;
diff --git a/packages/api/src/routes/models.ts b/packages/api/src/routes/models.ts
index 03c079a..6a0f5dc 100644
--- a/packages/api/src/routes/models.ts
+++ b/packages/api/src/routes/models.ts
@@ -17,6 +17,7 @@ import {
listStoredCredentials,
refreshAccountCredentialsAsync,
resolveApiKey,
+ resolveContextLimit,
setApiKey,
validateAccountCredentials,
} from "@dispatch/core";
@@ -161,6 +162,21 @@ modelsRoutes.get("/available", async (c) => {
return c.json({ models });
});
+// Resolve a model's MAXIMUM context window (in tokens) from the models.dev
+// catalog. Returns `{ contextLimit: number | null }`; `null` means the model's
+// limit is unknown (unsupported provider, unknown model, or catalog offline),
+// which the frontend renders without a denominator/percentage.
+modelsRoutes.get("/context-limit", async (c) => {
+ const provider = c.req.query("provider");
+ const modelId = c.req.query("modelId");
+ if (!provider || !modelId) {
+ return c.json({ error: "provider and modelId query parameters are required" }, 400);
+ }
+
+ const contextLimit = await resolveContextLimit(provider, modelId);
+ return c.json({ contextLimit });
+});
+
// List available Claude accounts with validated credentials
modelsRoutes.get("/claude-accounts", async (c) => {
const candidates = resolveClaudeAccounts();
diff --git a/packages/api/src/routes/tabs.ts b/packages/api/src/routes/tabs.ts
index b1e9659..f52ee99 100644
--- a/packages/api/src/routes/tabs.ts
+++ b/packages/api/src/routes/tabs.ts
@@ -6,6 +6,7 @@ import {
getSetting,
getTab,
getTotalChunkCount,
+ getUsageStatsForTab,
groupRowsToMessages,
listOpenTabs,
setSetting,
@@ -27,7 +28,10 @@ export function setTabsAgentManager(
}
tabsRoutes.get("/", (c) => {
- const tabs = listOpenTabs();
+ // Enrich each tab with its persisted usage aggregate so the frontend can
+ // seed `cacheStats` on reload without an extra round-trip. N small indexed
+ // queries — fine for tab counts.
+ const tabs = listOpenTabs().map((t) => ({ ...t, usageStats: getUsageStatsForTab(t.id) }));
return c.json({ tabs });
});
diff --git a/packages/api/tests/agent-manager.test.ts b/packages/api/tests/agent-manager.test.ts
index ffd87b5..9da6a70 100644
--- a/packages/api/tests/agent-manager.test.ts
+++ b/packages/api/tests/agent-manager.test.ts
@@ -98,6 +98,26 @@ function setFakeSetting(key: string, value: string): void {
fakeSettings.set(key, value);
}
+// Capture every appendChunks(tabId, drafts) call so tests can assert what got
+// persisted (e.g. usage side-channel rows). The real explodeTurn is mocked to
+// return [], so content drafts are empty here; usage rows are pushed directly
+// by processMessage's flushAssistant, making them the visible drafts.
+interface AppendChunksCall {
+ tabId: string;
+ drafts: Array<{ turnId: string; step: number; role: string; type: string; data: unknown }>;
+}
+const appendChunksCalls: AppendChunksCall[] = [];
+function resetAppendChunksCalls(): void {
+ appendChunksCalls.length = 0;
+}
+
+// Seedable return value for the mocked getUsageStatsForTab — what the backend
+// reads (post-write) to attach to the `turn-sealed` event.
+const fakeUsageStatsByTab = new Map<string, unknown>();
+function resetFakeUsageStats(): void {
+ fakeUsageStatsByTab.clear();
+}
+
// Allow tests to swap in a custom `run` generator (e.g. to simulate
// a fallback failure mid-stream). Returning to undefined restores
// the default.
@@ -358,7 +378,8 @@ vi.mock("@dispatch/core", () => ({
typeof value === "string" && ["none", "low", "medium", "high", "xhigh", "max"].includes(value)
);
},
- appendChunks() {
+ appendChunks(tabId: string, drafts: AppendChunksCall["drafts"]) {
+ appendChunksCalls.push({ tabId, drafts: [...drafts] });
return [];
},
explodeUserText() {
@@ -370,6 +391,9 @@ vi.mock("@dispatch/core", () => ({
getMessagesForTab(tabId: string) {
return fakeMessagesByTab.get(tabId) ?? [];
},
+ getUsageStatsForTab(tabId: string) {
+ return fakeUsageStatsByTab.get(tabId) ?? null;
+ },
appendEventToChunks: appendEventToChunksSpy,
applySystemEvent(_messages: unknown[], _event: unknown) {
return { messageId: "mock-system-msg" };
@@ -420,6 +444,8 @@ describe("AgentManager", () => {
resetFakeSettings();
setRunImpl(null);
appendEventToChunksSpy.mockClear();
+ resetAppendChunksCalls();
+ resetFakeUsageStats();
});
it("initial status is idle", () => {
@@ -1393,4 +1419,205 @@ describe("AgentManager", () => {
expect(tools).not.toContain("read_tab");
});
});
+
+ // ─── Usage side-channel persistence ──────────────────────────────
+ //
+ // `usage` AgentEvents (one per LLM round-trip) are persisted as invisible
+ // `type:"usage"` chunk rows so per-tab token/cache telemetry survives a
+ // reload. They ride the SAME atomic appendChunks call as the turn's content
+ // rows (one fsync, contiguous seqs). A superseded fallback attempt's usage is
+ // discarded with its `chunks` (per-attempt accumulator).
+ describe("usage persistence", () => {
+ it("writes one usage row per usage event emitted during a turn", async () => {
+ const manager = new AgentManager();
+ setRunImpl(async function* () {
+ yield { type: "status", status: "running" } as const;
+ yield {
+ type: "usage",
+ usage: { inputTokens: 1000, outputTokens: 40, cacheReadTokens: 0, cacheWriteTokens: 900 },
+ } as const;
+ yield { type: "text-delta", delta: "step two" } as const;
+ yield {
+ type: "usage",
+ usage: {
+ inputTokens: 1200,
+ outputTokens: 60,
+ cacheReadTokens: 1000,
+ cacheWriteTokens: 100,
+ },
+ } as const;
+ yield {
+ type: "done",
+ message: { role: "assistant", chunks: [{ type: "text", text: "step two" }] },
+ } as const;
+ yield { type: "status", status: "idle" } as const;
+ });
+
+ await manager.processMessage("tab-usage-rows", "go");
+
+ const usageDrafts = appendChunksCalls
+ .flatMap((c) => c.drafts)
+ .filter((d) => d.type === "usage");
+ expect(usageDrafts).toHaveLength(2);
+ // One row per event, role=assistant, step cosmetic (0).
+ expect(usageDrafts.every((d) => d.role === "assistant" && d.step === 0)).toBe(true);
+ expect(usageDrafts[0]?.data).toEqual({
+ inputTokens: 1000,
+ outputTokens: 40,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 900,
+ });
+ expect(usageDrafts[1]?.data).toEqual({
+ inputTokens: 1200,
+ outputTokens: 60,
+ cacheReadTokens: 1000,
+ cacheWriteTokens: 100,
+ });
+ });
+
+ it("attaches the DB usage aggregate to the turn-sealed event for live reconciliation", async () => {
+ const manager = new AgentManager();
+ const aggregate = {
+ inputTokens: 222,
+ outputTokens: 22,
+ cacheReadTokens: 100,
+ cacheWriteTokens: 5,
+ requests: 1,
+ last: { inputTokens: 222, outputTokens: 22, cacheReadTokens: 100, cacheWriteTokens: 5 },
+ };
+ fakeUsageStatsByTab.set("tab-sealed-usage", aggregate);
+
+ const events: AgentEvent[] = [];
+ manager.onEvent((event) => {
+ events.push(event);
+ });
+
+ await manager.processMessage("tab-sealed-usage", "go");
+
+ const sealed = events.find((e) => e.type === "turn-sealed") as
+ | Extract<AgentEvent, { type: "turn-sealed" }>
+ | undefined;
+ expect(sealed).toBeDefined();
+ // The aggregate read AFTER the write is carried on the event so the
+ // frontend can REPLACE its live cacheStats with the DB truth.
+ expect(sealed?.usageStats).toEqual(aggregate);
+ });
+
+ it("emits usage rows in the SAME appendChunks call as the turn's content (one atomic write)", async () => {
+ const manager = new AgentManager();
+ setRunImpl(async function* () {
+ yield { type: "status", status: "running" } as const;
+ yield { type: "text-delta", delta: "hi" } as const;
+ yield {
+ type: "usage",
+ usage: { inputTokens: 5, outputTokens: 1, cacheReadTokens: 2, cacheWriteTokens: 3 },
+ } as const;
+ yield {
+ type: "done",
+ message: { role: "assistant", chunks: [{ type: "text", text: "hi" }] },
+ } as const;
+ yield { type: "status", status: "idle" } as const;
+ });
+
+ await manager.processMessage("tab-usage-atomic", "go");
+
+ // Exactly one appendChunks call carries the usage draft (the flush). The
+ // user-message append and any system-row appends carry no usage rows.
+ const callsWithUsage = appendChunksCalls.filter((c) =>
+ c.drafts.some((d) => d.type === "usage"),
+ );
+ expect(callsWithUsage).toHaveLength(1);
+ expect(callsWithUsage[0]?.tabId).toBe("tab-usage-atomic");
+ });
+
+ it("discards a superseded (rate-limited) attempt's usage on fallback", async () => {
+ const manager = new AgentManager();
+ // Inject a minimal model registry so the rate-limit fallback path is
+ // taken (real `processMessage` requires modelRegistry + a resolved
+ // keyId + a next fallback entry to retry).
+ const markKeyExhausted = vi.fn();
+ (
+ manager as unknown as {
+ modelRegistry: {
+ getKeys(): Array<{ definition: Record<string, unknown> }>;
+ markKeyExhausted(): void;
+ };
+ }
+ ).modelRegistry = {
+ getKeys: () => [
+ {
+ definition: {
+ id: "k1",
+ provider: "openai-compatible",
+ env: "ENV1",
+ base_url: "http://x",
+ },
+ },
+ {
+ definition: {
+ id: "k2",
+ provider: "openai-compatible",
+ env: "ENV2",
+ base_url: "http://y",
+ },
+ },
+ ],
+ markKeyExhausted,
+ };
+
+ let attempt = 0;
+ setRunImpl(async function* () {
+ attempt++;
+ yield { type: "status", status: "running" } as const;
+ if (attempt === 1) {
+ // Attempt 1 emits usage then rate-limits — its usage must be dropped.
+ yield {
+ type: "usage",
+ usage: { inputTokens: 999, outputTokens: 9, cacheReadTokens: 0, cacheWriteTokens: 0 },
+ } as const;
+ yield { type: "error", error: "rate limit exceeded (status=429)" } as const;
+ return;
+ }
+ // Attempt 2 succeeds — only its usage should persist.
+ yield {
+ type: "usage",
+ usage: { inputTokens: 222, outputTokens: 22, cacheReadTokens: 100, cacheWriteTokens: 5 },
+ } as const;
+ yield {
+ type: "done",
+ message: { role: "assistant", chunks: [{ type: "text", text: "recovered" }] },
+ } as const;
+ yield { type: "status", status: "idle" } as const;
+ });
+
+ const agentModels = [
+ { key_id: "k1", model_id: "m1" },
+ { key_id: "k2", model_id: "m2" },
+ ];
+ await manager.processMessage(
+ "tab-usage-fallback",
+ "go",
+ undefined,
+ undefined,
+ undefined,
+ undefined,
+ agentModels,
+ );
+
+ expect(attempt).toBe(2); // confirm the fallback retry actually happened
+ expect(markKeyExhausted).toHaveBeenCalled();
+
+ const usageDrafts = appendChunksCalls
+ .flatMap((c) => c.drafts)
+ .filter((d) => d.type === "usage");
+ // Only attempt 2's usage survives.
+ expect(usageDrafts).toHaveLength(1);
+ expect(usageDrafts[0]?.data).toEqual({
+ inputTokens: 222,
+ outputTokens: 22,
+ cacheReadTokens: 100,
+ cacheWriteTokens: 5,
+ });
+ });
+ });
});
diff --git a/packages/api/tests/routes.test.ts b/packages/api/tests/routes.test.ts
index 3bf446d..c1971b0 100644
--- a/packages/api/tests/routes.test.ts
+++ b/packages/api/tests/routes.test.ts
@@ -1,6 +1,24 @@
import type { ToolDefinition } from "@dispatch/core";
import { describe, expect, it, vi } from "vitest";
+// Seedable backing stores for the tabs route (GET /tabs enrichment). Declared
+// before vi.mock so the hoisted factory closure can reference them; populated
+// per-test.
+interface FakeOpenTab {
+ id: string;
+ title: string;
+ keyId: string | null;
+ modelId: string | null;
+ parentTabId: string | null;
+ status: string;
+ isOpen: boolean;
+ position: number;
+ createdAt: number;
+ updatedAt: number;
+}
+const fakeOpenTabs: FakeOpenTab[] = [];
+const fakeUsageStats = new Map<string, unknown>();
+
// Mock @dispatch/core's Agent to avoid real LLM calls
vi.mock("@dispatch/core", () => ({
Agent: class MockAgent {
@@ -175,7 +193,7 @@ vi.mock("@dispatch/core", () => ({
);
},
listOpenTabs() {
- return [];
+ return [...fakeOpenTabs];
},
resolveTabPrefix() {
return { status: "none" };
@@ -235,6 +253,9 @@ vi.mock("@dispatch/core", () => ({
getTotalChunkCount() {
return 0;
},
+ getUsageStatsForTab(tabId: string) {
+ return fakeUsageStats.get(tabId) ?? null;
+ },
appendEventToChunks(_chunks: unknown[], _event: unknown) {
// no-op stub
},
@@ -273,6 +294,13 @@ vi.mock("@dispatch/core", () => ({
execute: async () => "mock",
};
},
+ // ── models.dev context-limit stub ─────────────────────────────
+ resolveContextLimit(provider: string, modelId: string) {
+ if (provider === "anthropic" && modelId === "claude-sonnet-4-5") {
+ return Promise.resolve(200000);
+ }
+ return Promise.resolve(null);
+ },
// ── ntfy notifications stubs ──────────────────────────────────
NotificationDispatcher: class MockNotificationDispatcher {
attachToAgentManager() {
@@ -446,6 +474,65 @@ describe("POST /chat", () => {
});
});
+describe("GET /tabs", () => {
+ it("enriches each open tab with its persisted usageStats aggregate", async () => {
+ fakeOpenTabs.length = 0;
+ fakeUsageStats.clear();
+ fakeOpenTabs.push({
+ id: "tab-u",
+ title: "Has usage",
+ keyId: null,
+ modelId: null,
+ parentTabId: null,
+ status: "idle",
+ isOpen: true,
+ position: 0,
+ createdAt: 0,
+ updatedAt: 0,
+ });
+ fakeOpenTabs.push({
+ id: "tab-none",
+ title: "No usage",
+ keyId: null,
+ modelId: null,
+ parentTabId: null,
+ status: "idle",
+ isOpen: true,
+ position: 1,
+ createdAt: 0,
+ updatedAt: 0,
+ });
+ fakeUsageStats.set("tab-u", {
+ inputTokens: 2200,
+ outputTokens: 100,
+ cacheReadTokens: 1000,
+ cacheWriteTokens: 1000,
+ requests: 2,
+ last: { inputTokens: 1200, outputTokens: 60, cacheReadTokens: 1000, cacheWriteTokens: 100 },
+ });
+
+ const res = await app.request("/tabs");
+ expect(res.status).toBe(200);
+ const body = await res.json();
+ expect(Array.isArray(body.tabs)).toBe(true);
+ const tabU = body.tabs.find((t: { id: string }) => t.id === "tab-u");
+ const tabNone = body.tabs.find((t: { id: string }) => t.id === "tab-none");
+ expect(tabU.usageStats).toEqual({
+ inputTokens: 2200,
+ outputTokens: 100,
+ cacheReadTokens: 1000,
+ cacheWriteTokens: 1000,
+ requests: 2,
+ last: { inputTokens: 1200, outputTokens: 60, cacheReadTokens: 1000, cacheWriteTokens: 100 },
+ });
+ // A tab with no usage rows surfaces null (not undefined/missing).
+ expect(tabNone.usageStats).toBeNull();
+
+ fakeOpenTabs.length = 0;
+ fakeUsageStats.clear();
+ });
+});
+
describe("GET /tabs/:id/chunks", () => {
it("returns the raw chunk window shape { chunks, total, oldestSeq }", async () => {
const res = await app.request("/tabs/tab-x/chunks?limit=50");
@@ -784,3 +871,28 @@ describe("Wake schedule routes", () => {
expect(body.schedule["13"]).toBeUndefined();
});
});
+
+describe("GET /models/context-limit", () => {
+ it("returns the resolved context limit for a known model", async () => {
+ const res = await app.request(
+ "/models/context-limit?provider=anthropic&modelId=claude-sonnet-4-5",
+ );
+ expect(res.status).toBe(200);
+ const body = (await res.json()) as { contextLimit: number | null };
+ expect(body.contextLimit).toBe(200000);
+ });
+
+ it("returns null contextLimit for an unknown model", async () => {
+ const res = await app.request("/models/context-limit?provider=anthropic&modelId=mystery");
+ expect(res.status).toBe(200);
+ const body = (await res.json()) as { contextLimit: number | null };
+ expect(body.contextLimit).toBeNull();
+ });
+
+ it("400s when provider or modelId is missing", async () => {
+ const res1 = await app.request("/models/context-limit?provider=anthropic");
+ expect(res1.status).toBe(400);
+ const res2 = await app.request("/models/context-limit?modelId=claude-sonnet-4-5");
+ expect(res2.status).toBe(400);
+ });
+});
diff --git a/packages/core/src/chunks/transform.ts b/packages/core/src/chunks/transform.ts
index a4c6fc8..e8f4a18 100644
--- a/packages/core/src/chunks/transform.ts
+++ b/packages/core/src/chunks/transform.ts
@@ -209,6 +209,11 @@ export function groupRowsToMessages(rows: ChunkRow[]): MessageRow[] {
continue;
}
+ // Usage rows are an invisible side channel (persisted for the backend
+ // aggregate only). They're already query-excluded from getChunksForTab,
+ // so this is defensive insurance: never let one leak into render grouping.
+ if (row.type === "usage") continue;
+
// assistant / tool rows → part of the current assistant message
const c = ensureAssistant(row);
switch (row.type) {
diff --git a/packages/core/src/db/chunks.ts b/packages/core/src/db/chunks.ts
index 077259d..e0aadf3 100644
--- a/packages/core/src/db/chunks.ts
+++ b/packages/core/src/db/chunks.ts
@@ -5,7 +5,14 @@ import {
groupRowsToMessages,
type MessageRow,
} from "../chunks/transform.js";
-import type { ChunkData, ChunkRow, ChunkRowDraft, TextData } from "../types/index.js";
+import type {
+ ChunkData,
+ ChunkRow,
+ ChunkRowDraft,
+ TextData,
+ UsageData,
+ UsageStats,
+} from "../types/index.js";
import { getDatabase } from "./index.js";
// Re-export the DB-free transforms so existing barrel consumers
@@ -101,7 +108,7 @@ export function getChunksForTab(
const db = getDatabase();
if (!options) {
const rows = db
- .query("SELECT * FROM chunks WHERE tab_id = $tabId ORDER BY seq ASC")
+ .query("SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' ORDER BY seq ASC")
.all({ $tabId: tabId }) as Array<Record<string, unknown>>;
return rows.map(mapRow);
}
@@ -110,24 +117,28 @@ export function getChunksForTab(
if (limit !== undefined) {
const rows = db
.query(
- "SELECT * FROM chunks WHERE tab_id = $tabId AND seq < $before ORDER BY seq DESC LIMIT $limit",
+ "SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' AND seq < $before ORDER BY seq DESC LIMIT $limit",
)
.all({ $tabId: tabId, $before: before, $limit: limit }) as Array<Record<string, unknown>>;
return rows.map(mapRow).reverse();
}
const rows = db
- .query("SELECT * FROM chunks WHERE tab_id = $tabId AND seq < $before ORDER BY seq DESC")
+ .query(
+ "SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' AND seq < $before ORDER BY seq DESC",
+ )
.all({ $tabId: tabId, $before: before }) as Array<Record<string, unknown>>;
return rows.map(mapRow).reverse();
}
if (limit !== undefined) {
const rows = db
- .query("SELECT * FROM chunks WHERE tab_id = $tabId ORDER BY seq DESC LIMIT $limit")
+ .query(
+ "SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' ORDER BY seq DESC LIMIT $limit",
+ )
.all({ $tabId: tabId, $limit: limit }) as Array<Record<string, unknown>>;
return rows.map(mapRow).reverse();
}
const rows = db
- .query("SELECT * FROM chunks WHERE tab_id = $tabId ORDER BY seq ASC")
+ .query("SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' ORDER BY seq ASC")
.all({ $tabId: tabId }) as Array<Record<string, unknown>>;
return rows.map(mapRow);
}
@@ -145,11 +156,71 @@ export function getMessagesForTab(tabId: string): MessageRow[] {
export function getTotalChunkCount(tabId: string): number {
const db = getDatabase();
const row = db
- .query("SELECT COUNT(*) as count FROM chunks WHERE tab_id = $tabId")
+ .query("SELECT COUNT(*) as count FROM chunks WHERE tab_id = $tabId AND type != 'usage'")
.get({ $tabId: tabId }) as { count: number } | null;
return row?.count ?? 0;
}
+/**
+ * Aggregate per-tab token/cache usage across ALL persisted `usage` chunk rows.
+ *
+ * Usage rows are written as an invisible side channel (one row per `usage`
+ * AgentEvent) and are query-excluded from `getChunksForTab`/`getTotalChunkCount`,
+ * so this aggregate is the read path. Because it sums server-side over every
+ * row, it stays complete even after the frontend evicts/pages out old turns
+ * (eviction is in-memory only). The return shape is structurally identical to
+ * the frontend `CacheStats`, so reload can seed it directly.
+ *
+ * - cumulative `inputTokens`/`outputTokens`/`cacheReadTokens`/`cacheWriteTokens`
+ * = SUM over all usage rows;
+ * - `requests` = COUNT of usage rows;
+ * - `last` = the highest-seq usage row's split (most recent request);
+ * - `null` when the tab has no usage rows.
+ *
+ * Sums in JS after selecting the rows (mirroring `mapRow`) to avoid relying on
+ * `json_extract` over the freeform `data_json`.
+ */
+export function getUsageStatsForTab(tabId: string): UsageStats | null {
+ const db = getDatabase();
+ const rows = db
+ .query("SELECT data_json FROM chunks WHERE tab_id = $tabId AND type = 'usage' ORDER BY seq ASC")
+ .all({ $tabId: tabId }) as Array<{ data_json: string }>;
+ if (rows.length === 0) return null;
+
+ let inputTokens = 0;
+ let outputTokens = 0;
+ let cacheReadTokens = 0;
+ let cacheWriteTokens = 0;
+ let last: UsageData | null = null;
+ for (const row of rows) {
+ let u: UsageData;
+ try {
+ u = JSON.parse(row.data_json) as UsageData;
+ } catch {
+ continue;
+ }
+ inputTokens += u.inputTokens ?? 0;
+ outputTokens += u.outputTokens ?? 0;
+ cacheReadTokens += u.cacheReadTokens ?? 0;
+ cacheWriteTokens += u.cacheWriteTokens ?? 0;
+ last = {
+ inputTokens: u.inputTokens ?? 0,
+ outputTokens: u.outputTokens ?? 0,
+ cacheReadTokens: u.cacheReadTokens ?? 0,
+ cacheWriteTokens: u.cacheWriteTokens ?? 0,
+ };
+ }
+
+ return {
+ inputTokens,
+ outputTokens,
+ cacheReadTokens,
+ cacheWriteTokens,
+ requests: rows.length,
+ last,
+ };
+}
+
export function clearChunksForTab(tabId: string): void {
const db = getDatabase();
db.query("DELETE FROM chunks WHERE tab_id = $tabId").run({ $tabId: tabId });
diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts
index 327b0a5..7818024 100644
--- a/packages/core/src/index.ts
+++ b/packages/core/src/index.ts
@@ -37,6 +37,7 @@ export {
getChunksForTab,
getMessagesForTab,
getTotalChunkCount,
+ getUsageStatsForTab,
groupRowsToMessages,
type MessageRow,
} from "./db/chunks.js";
@@ -67,7 +68,11 @@ export {
} from "./llm/debug-logger.js";
export { createProvider } from "./llm/provider.js";
// Models
-export { ModelRegistry } from "./models/index.js";
+export {
+ getModelsCatalog,
+ ModelRegistry,
+ resolveContextLimit,
+} from "./models/index.js";
// Notifications (ntfy.sh)
export * from "./notifications/index.js";
export * from "./permission/index.js";
diff --git a/packages/core/src/models/catalog.ts b/packages/core/src/models/catalog.ts
new file mode 100644
index 0000000..dea4647
--- /dev/null
+++ b/packages/core/src/models/catalog.ts
@@ -0,0 +1,179 @@
+import { mkdirSync, readFileSync, renameSync, statSync, writeFileSync } from "node:fs";
+import { dirname } from "node:path";
+
+/**
+ * models.dev-backed model catalog. Resolves a model's MAXIMUM context window
+ * (`limit.context`) dynamically from the public models.dev API, mirroring how
+ * opencode determines per-model context limits — no hardcoded table.
+ *
+ * The catalog is fetched once, cached on disk with a short TTL, and reused. On
+ * fetch failure we fall back to a stale-but-present cache so the lookup keeps
+ * working offline. Lookups never throw: an unknown/unreachable model resolves
+ * to `null`, which the UI renders as "max unknown".
+ */
+
+/** Shape of the slice of models.dev's `/api.json` we consume. */
+interface ModelsDevModel {
+ limit?: {
+ context?: number;
+ output?: number;
+ };
+}
+
+interface ModelsDevProvider {
+ id: string;
+ models: Record<string, ModelsDevModel | undefined>;
+}
+
+type ModelsDevCatalog = Record<string, ModelsDevProvider | undefined>;
+
+/** Where models.dev's API lives. Overridable for tests / private mirrors. */
+const MODELS_URL = process.env.DISPATCH_MODELS_URL || "https://models.dev";
+
+/** Disk cache path (reuses the repo's `/tmp/dispatch` convention). */
+const CACHE_PATH = "/tmp/dispatch/models-dev.json";
+
+/** How long a cached catalog stays fresh before we re-fetch. */
+const CACHE_TTL_MS = 5 * 60 * 1000;
+
+/** Network timeout for the catalog fetch. */
+const FETCH_TIMEOUT_MS = 10_000;
+
+/**
+ * After a failed fetch we memoize the fallback for this long before retrying,
+ * so a sustained outage doesn't make every lookup hang on a fresh timeout.
+ */
+const FETCH_PENALTY_MS = 60_000;
+
+/**
+ * Dispatch provider id → models.dev provider ids to search, in priority order.
+ * We only support Claude-backed providers (per product scope). `anthropic` and
+ * `opencode-anthropic` are both Claude; we try the first-party `anthropic`
+ * catalog first, then the `opencode` gateway catalog as a fallback.
+ */
+const PROVIDER_MAP: Record<string, string[]> = {
+ anthropic: ["anthropic", "opencode"],
+ "opencode-anthropic": ["anthropic", "opencode"],
+};
+
+/** In-process memoized catalog promise (one fetch/parse per TTL window). */
+let cached: { catalog: ModelsDevCatalog; fetchedAt: number } | null = null;
+let inflight: Promise<ModelsDevCatalog> | null = null;
+
+function readDiskCache(): { catalog: ModelsDevCatalog; mtimeMs: number } | null {
+ try {
+ const stat = statSync(CACHE_PATH);
+ const text = readFileSync(CACHE_PATH, "utf-8");
+ return { catalog: JSON.parse(text) as ModelsDevCatalog, mtimeMs: stat.mtimeMs };
+ } catch {
+ return null;
+ }
+}
+
+function writeDiskCache(text: string): void {
+ try {
+ mkdirSync(dirname(CACHE_PATH), { recursive: true });
+ // Write-then-rename so a concurrent reader never sees a half-written
+ // file (rename is atomic on the same filesystem). The temp name is
+ // process-scoped to avoid two writers clobbering each other's temp.
+ const tmp = `${CACHE_PATH}.${process.pid}.tmp`;
+ writeFileSync(tmp, text, "utf-8");
+ renameSync(tmp, CACHE_PATH);
+ } catch {
+ // Best-effort: a read-only /tmp shouldn't break lookups.
+ }
+}
+
+async function fetchCatalog(): Promise<ModelsDevCatalog> {
+ const controller = new AbortController();
+ const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
+ try {
+ const res = await fetch(`${MODELS_URL}/api.json`, { signal: controller.signal });
+ if (!res.ok) throw new Error(`models.dev returned HTTP ${res.status}`);
+ const text = await res.text();
+ const catalog = JSON.parse(text) as ModelsDevCatalog;
+ writeDiskCache(text);
+ return catalog;
+ } finally {
+ clearTimeout(timer);
+ }
+}
+
+/**
+ * Load the models.dev catalog, preferring in-process memo, then a fresh disk
+ * cache, then a network fetch. On network failure, falls back to any stale
+ * disk cache; if nothing is available, returns an empty catalog.
+ */
+export async function getModelsCatalog(): Promise<ModelsDevCatalog> {
+ if (process.env.DISPATCH_DISABLE_MODELS_FETCH) {
+ const disk = readDiskCache();
+ return disk?.catalog ?? {};
+ }
+
+ const now = Date.now();
+ if (cached && now - cached.fetchedAt < CACHE_TTL_MS) return cached.catalog;
+
+ // Fresh disk cache satisfies the request without a network round-trip.
+ const disk = readDiskCache();
+ if (disk && now - disk.mtimeMs < CACHE_TTL_MS) {
+ // Inherit the file's mtime as `fetchedAt` so loading a disk cache into
+ // a fresh process doesn't reset its TTL (which would otherwise double
+ // the worst-case staleness across process boundaries).
+ cached = { catalog: disk.catalog, fetchedAt: disk.mtimeMs };
+ return disk.catalog;
+ }
+
+ if (!inflight) {
+ inflight = fetchCatalog()
+ .then((catalog) => {
+ cached = { catalog, fetchedAt: Date.now() };
+ return catalog;
+ })
+ .catch((err) => {
+ // Network failed — serve a stale cache if we have one.
+ console.warn(
+ `dispatch: failed to fetch models.dev catalog: ${err instanceof Error ? err.message : String(err)}`,
+ );
+ const fallback = disk?.catalog ?? ({} as ModelsDevCatalog);
+ // Memoize the fallback with a short "penalty" TTL so a sustained
+ // outage doesn't make every lookup hang on a fresh 10s timeout.
+ // `fetchedAt` is backdated so the memo expires after FETCH_PENALTY_MS.
+ cached = {
+ catalog: fallback,
+ fetchedAt: Date.now() - CACHE_TTL_MS + FETCH_PENALTY_MS,
+ };
+ return fallback;
+ })
+ .finally(() => {
+ inflight = null;
+ });
+ }
+ return inflight;
+}
+
+/**
+ * Resolve a model's maximum context window (in tokens) for the given Dispatch
+ * provider + model id. Returns `null` when the provider is unsupported, the
+ * model is unknown, or the catalog is unavailable — callers should render that
+ * as "max unknown" (no denominator / percentage).
+ */
+export async function resolveContextLimit(
+ provider: string,
+ modelId: string,
+): Promise<number | null> {
+ const candidates = PROVIDER_MAP[provider];
+ if (!candidates || !modelId) return null;
+
+ const catalog = await getModelsCatalog();
+ for (const providerId of candidates) {
+ const ctx = catalog[providerId]?.models?.[modelId]?.limit?.context;
+ if (typeof ctx === "number" && ctx > 0) return ctx;
+ }
+ return null;
+}
+
+/** Test-only: reset the in-process memo so a test can re-exercise loading. */
+export function __resetCatalogCacheForTests(): void {
+ cached = null;
+ inflight = null;
+}
diff --git a/packages/core/src/models/index.ts b/packages/core/src/models/index.ts
index cf59749..2fcd657 100644
--- a/packages/core/src/models/index.ts
+++ b/packages/core/src/models/index.ts
@@ -1 +1,5 @@
+export {
+ getModelsCatalog,
+ resolveContextLimit,
+} from "./catalog.js";
export { ModelRegistry } from "./registry.js";
diff --git a/packages/core/src/types/index.ts b/packages/core/src/types/index.ts
index bab17f1..30afbd9 100644
--- a/packages/core/src/types/index.ts
+++ b/packages/core/src/types/index.ts
@@ -90,7 +90,14 @@ export interface ChatMessage {
export type ChunkRole = "user" | "assistant" | "tool" | "system";
/** Discriminator for a persisted chunk row's payload. */
-export type ChunkType = "text" | "thinking" | "tool_call" | "tool_result" | "error" | "system";
+export type ChunkType =
+ | "text"
+ | "thinking"
+ | "tool_call"
+ | "tool_result"
+ | "error"
+ | "system"
+ | "usage";
export interface TextData {
text: string;
@@ -119,6 +126,45 @@ export interface SystemData {
kind: SystemChunkKind;
text: string;
}
+/**
+ * Per-request token usage persisted as a SIDE-CHANNEL chunk row (one row per
+ * `usage` AgentEvent, i.e. one per LLM round-trip). These rows are deliberately
+ * EXCLUDED from `getChunksForTab`/`getTotalChunkCount` so they never enter the
+ * render, pagination, eviction, or agent-history-rebuild paths — they exist
+ * only to feed the backend aggregate `getUsageStatsForTab`, which seeds the
+ * frontend's `cacheStats` on reload. `inputTokens` is the TOTAL prompt
+ * (cached + fresh); `cacheReadTokens`/`cacheWriteTokens` are Anthropic's
+ * prompt-cache split. Mirrors the `usage` AgentEvent payload.
+ */
+export interface UsageData {
+ inputTokens: number;
+ outputTokens: number;
+ cacheReadTokens: number;
+ cacheWriteTokens: number;
+}
+
+/**
+ * Aggregate per-tab usage telemetry: the cumulative sum across ALL persisted
+ * `usage` rows, the request count, and the most recent request's split. This is
+ * the server-side source of truth (complete regardless of frontend
+ * eviction/pagination) returned by `getUsageStatsForTab`. Structurally
+ * identical to the frontend `CacheStats` so it can seed it directly. `null` when
+ * the tab has no usage rows.
+ */
+export interface UsageStats {
+ inputTokens: number;
+ outputTokens: number;
+ cacheReadTokens: number;
+ cacheWriteTokens: number;
+ /** Number of LLM requests (usage rows) counted. */
+ requests: number;
+ last: {
+ inputTokens: number;
+ outputTokens: number;
+ cacheReadTokens: number;
+ cacheWriteTokens: number;
+ } | null;
+}
export type ChunkData =
| TextData
@@ -126,7 +172,8 @@ export type ChunkData =
| ToolCallData
| ToolResultData
| ErrorData
- | SystemData;
+ | SystemData
+ | UsageData;
/**
* A persisted chunk row — the append-only unit of conversation storage and
@@ -225,8 +272,16 @@ export type AgentEvent =
* fold its transient live representation into the sealed chunk log. Emitted
* after `status: idle`/`error` (which fire before the DB write). Display/sync
* only — not conversation content.
+ *
+ * Carries `usageStats`: the tab's authoritative usage aggregate read from the
+ * DB AFTER the turn's usage rows were written. The frontend REPLACES (not adds)
+ * its live `cacheStats` with this, reconciling the live accumulator to the
+ * persisted truth every turn. This self-heals the live overshoot that occurs
+ * when a rate-limited fallback attempt's usage is streamed live but then
+ * discarded server-side (never persisted). `null` ⇒ tab has no usage rows;
+ * absent ⇒ leave `cacheStats` untouched (back-compat).
*/
- | { type: "turn-sealed"; turnId: string }
+ | { type: "turn-sealed"; turnId: string; usageStats?: UsageStats | null }
| { type: "text-delta"; delta: string }
| { type: "reasoning-delta"; delta: string }
/**
diff --git a/packages/core/tests/db/chunks.db.test.ts b/packages/core/tests/db/chunks.db.test.ts
new file mode 100644
index 0000000..4f7d517
--- /dev/null
+++ b/packages/core/tests/db/chunks.db.test.ts
@@ -0,0 +1,336 @@
+import { beforeAll, beforeEach, describe, expect, it, vi } from "vitest";
+import type { ChunkRowDraft, UsageData } from "../../src/types/index.js";
+
+/**
+ * Internal row shape — matches the production `chunks` table columns.
+ * Kept loose at the `query()` boundary to mirror bun:sqlite's dynamic
+ * return type.
+ */
+interface ChunkRecord {
+ id: string;
+ tab_id: string;
+ seq: number;
+ turn_id: string;
+ step: number;
+ role: string;
+ type: string;
+ data_json: string;
+ created_at: number;
+}
+
+/**
+ * In-memory fake of `bun:sqlite`'s Database implementing only the queries
+ * `chunks.ts` actually issues. Same approach as `tabs.test.ts`: match exact
+ * normalized query strings as fixed branches (no SQL parser), so a query-string
+ * change fails loudly as "unsupported" instead of silently returning wrong data.
+ *
+ * This lets the DB-backed `getChunksForTab` / `getTotalChunkCount` /
+ * `getUsageStatsForTab` logic run under vitest, where `bun:sqlite` can't load.
+ */
+class FakeDatabase {
+ rows: ChunkRecord[] = [];
+ private idCounter = 0;
+
+ query(sql: string): {
+ all: (params?: Record<string, unknown>) => unknown[];
+ get: (params?: Record<string, unknown>) => unknown;
+ run: (params?: Record<string, unknown>) => void;
+ } {
+ return {
+ all: (params) => this.execSelect(sql, params),
+ get: (params) => this.execSelect(sql, params)[0] ?? null,
+ run: (params) => {
+ this.execMutation(sql, params);
+ },
+ };
+ }
+
+ /** bun:sqlite's `db.transaction(fn)` returns a callable that runs `fn`. */
+ transaction(fn: () => void): () => void {
+ return () => {
+ fn();
+ };
+ }
+
+ private execSelect(sql: string, params?: Record<string, unknown>): unknown[] {
+ const norm = sql.replace(/\s+/g, " ").trim();
+ const tabId = params?.$tabId as string | undefined;
+ const forTab = this.rows.filter((r) => r.tab_id === tabId);
+ const visible = forTab.filter((r) => r.type !== "usage");
+
+ // appendChunks: next-seq lookup (counts ALL rows, incl. usage)
+ if (norm === "SELECT COALESCE(MAX(seq), -1) as max_seq FROM chunks WHERE tab_id = $tabId") {
+ const seqs = forTab.map((r) => r.seq);
+ return [{ max_seq: seqs.length > 0 ? Math.max(...seqs) : -1 }];
+ }
+
+ // getChunksForTab — no options (usage excluded)
+ if (
+ norm === "SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' ORDER BY seq ASC"
+ ) {
+ return [...visible].sort((a, b) => a.seq - b.seq);
+ }
+
+ // getChunksForTab — before + limit (usage excluded)
+ if (
+ norm ===
+ "SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' AND seq < $before ORDER BY seq DESC LIMIT $limit"
+ ) {
+ const before = params?.$before as number;
+ const limit = params?.$limit as number;
+ return visible
+ .filter((r) => r.seq < before)
+ .sort((a, b) => b.seq - a.seq)
+ .slice(0, limit);
+ }
+
+ // getChunksForTab — before only (usage excluded)
+ if (
+ norm ===
+ "SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' AND seq < $before ORDER BY seq DESC"
+ ) {
+ const before = params?.$before as number;
+ return visible.filter((r) => r.seq < before).sort((a, b) => b.seq - a.seq);
+ }
+
+ // getChunksForTab — limit only (usage excluded)
+ if (
+ norm ===
+ "SELECT * FROM chunks WHERE tab_id = $tabId AND type != 'usage' ORDER BY seq DESC LIMIT $limit"
+ ) {
+ const limit = params?.$limit as number;
+ return [...visible].sort((a, b) => b.seq - a.seq).slice(0, limit);
+ }
+
+ // getTotalChunkCount (usage excluded)
+ if (norm === "SELECT COUNT(*) as count FROM chunks WHERE tab_id = $tabId AND type != 'usage'") {
+ return [{ count: visible.length }];
+ }
+
+ // getUsageStatsForTab: usage rows only, in seq order
+ if (
+ norm ===
+ "SELECT data_json FROM chunks WHERE tab_id = $tabId AND type = 'usage' ORDER BY seq ASC"
+ ) {
+ return forTab
+ .filter((r) => r.type === "usage")
+ .sort((a, b) => a.seq - b.seq)
+ .map((r) => ({ data_json: r.data_json }));
+ }
+
+ throw new Error(`FakeDatabase: unsupported SELECT: ${norm}`);
+ }
+
+ private execMutation(sql: string, params?: Record<string, unknown>): void {
+ const norm = sql.replace(/\s+/g, " ").trim();
+
+ // appendChunks: single-row insert
+ if (
+ norm ===
+ "INSERT INTO chunks (id, tab_id, seq, turn_id, step, role, type, data_json, created_at) VALUES ($id, $tabId, $seq, $turnId, $step, $role, $type, $dataJson, $now)"
+ ) {
+ this.rows.push({
+ id: (params?.$id as string) ?? `c${this.idCounter++}`,
+ tab_id: params?.$tabId as string,
+ seq: params?.$seq as number,
+ turn_id: params?.$turnId as string,
+ step: (params?.$step as number) ?? 0,
+ role: params?.$role as string,
+ type: params?.$type as string,
+ data_json: params?.$dataJson as string,
+ created_at: (params?.$now as number) ?? 0,
+ });
+ return;
+ }
+
+ throw new Error(`FakeDatabase: unsupported mutation: ${norm}`);
+ }
+}
+
+let fakeDb: FakeDatabase;
+
+vi.mock("../../src/db/index.js", () => ({
+ getDatabase: vi.fn(() => fakeDb),
+}));
+
+const { appendChunks, getChunksForTab, getTotalChunkCount, getUsageStatsForTab } = await import(
+ "../../src/db/chunks.js"
+);
+
+function usageDraft(turnId: string, u: UsageData): ChunkRowDraft {
+ return { turnId, step: 0, role: "assistant", type: "usage", data: u };
+}
+
+beforeAll(() => {
+ fakeDb = new FakeDatabase();
+});
+
+beforeEach(() => {
+ fakeDb.rows = [];
+});
+
+// ---------------------------------------------------------------------------
+// usage chunk persistence + side-channel invariants
+// ---------------------------------------------------------------------------
+describe("usage chunk rows (DB-backed)", () => {
+ const TAB = "tab-usage";
+
+ it("persists usage rows alongside content rows with contiguous seqs", () => {
+ appendChunks(TAB, [
+ { turnId: "t1", step: 0, role: "user", type: "text", data: { text: "hi" } },
+ { turnId: "t1", step: 0, role: "assistant", type: "text", data: { text: "yo" } },
+ usageDraft("t1", {
+ inputTokens: 100,
+ outputTokens: 10,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 90,
+ }),
+ ]);
+ // All three rows landed with contiguous seqs.
+ expect(fakeDb.rows.map((r) => r.seq)).toEqual([0, 1, 2]);
+ expect(fakeDb.rows.map((r) => r.type)).toEqual(["text", "text", "usage"]);
+ });
+
+ it("excludes usage rows from getChunksForTab (all variants)", () => {
+ appendChunks(TAB, [
+ { turnId: "t1", step: 0, role: "user", type: "text", data: { text: "q" } },
+ usageDraft("t1", {
+ inputTokens: 100,
+ outputTokens: 10,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 90,
+ }),
+ { turnId: "t1", step: 0, role: "assistant", type: "text", data: { text: "a" } },
+ usageDraft("t1", {
+ inputTokens: 200,
+ outputTokens: 20,
+ cacheReadTokens: 150,
+ cacheWriteTokens: 0,
+ }),
+ ]);
+
+ // no options
+ const all = getChunksForTab(TAB);
+ expect(all.every((r) => r.type !== "usage")).toBe(true);
+ expect(all.map((r) => r.type)).toEqual(["text", "text"]);
+
+ // limit only
+ const limited = getChunksForTab(TAB, { limit: 10 });
+ expect(limited.every((r) => r.type !== "usage")).toBe(true);
+ expect(limited).toHaveLength(2);
+
+ // before only — `before` is a seq cursor; usage seqs must never surface
+ const before = getChunksForTab(TAB, { before: 100 });
+ expect(before.every((r) => r.type !== "usage")).toBe(true);
+ expect(before).toHaveLength(2);
+
+ // before + limit
+ const bl = getChunksForTab(TAB, { before: 100, limit: 10 });
+ expect(bl.every((r) => r.type !== "usage")).toBe(true);
+ expect(bl).toHaveLength(2);
+ });
+
+ it("excludes usage rows from getTotalChunkCount", () => {
+ appendChunks(TAB, [
+ { turnId: "t1", step: 0, role: "user", type: "text", data: { text: "q" } },
+ { turnId: "t1", step: 0, role: "assistant", type: "text", data: { text: "a" } },
+ usageDraft("t1", {
+ inputTokens: 100,
+ outputTokens: 10,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 90,
+ }),
+ ]);
+ // 3 rows total, but only 2 visible.
+ expect(getTotalChunkCount(TAB)).toBe(2);
+ });
+});
+
+// ---------------------------------------------------------------------------
+// getUsageStatsForTab — backend aggregate
+// ---------------------------------------------------------------------------
+describe("getUsageStatsForTab", () => {
+ const TAB = "tab-agg";
+
+ it("returns null when the tab has no usage rows", () => {
+ appendChunks(TAB, [
+ { turnId: "t1", step: 0, role: "assistant", type: "text", data: { text: "a" } },
+ ]);
+ expect(getUsageStatsForTab(TAB)).toBeNull();
+ });
+
+ it("sums cumulative tokens, counts requests, and reports the last request's split", () => {
+ appendChunks(TAB, [
+ usageDraft("t1", {
+ inputTokens: 1000,
+ outputTokens: 40,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 900,
+ }),
+ usageDraft("t1", {
+ inputTokens: 1200,
+ outputTokens: 60,
+ cacheReadTokens: 1000,
+ cacheWriteTokens: 100,
+ }),
+ ]);
+
+ const stats = getUsageStatsForTab(TAB);
+ expect(stats).not.toBeNull();
+ expect(stats?.requests).toBe(2);
+ expect(stats?.inputTokens).toBe(2200);
+ expect(stats?.outputTokens).toBe(100);
+ expect(stats?.cacheReadTokens).toBe(1000);
+ expect(stats?.cacheWriteTokens).toBe(1000);
+ // `last` = the most recent (highest-seq) usage row.
+ expect(stats?.last).toEqual({
+ inputTokens: 1200,
+ outputTokens: 60,
+ cacheReadTokens: 1000,
+ cacheWriteTokens: 100,
+ });
+ });
+
+ it("is structurally identical to the frontend CacheStats shape (seeds directly)", () => {
+ appendChunks(TAB, [
+ usageDraft("t1", {
+ inputTokens: 5,
+ outputTokens: 1,
+ cacheReadTokens: 2,
+ cacheWriteTokens: 3,
+ }),
+ ]);
+ const stats = getUsageStatsForTab(TAB);
+ expect(Object.keys(stats ?? {}).sort()).toEqual(
+ [
+ "cacheReadTokens",
+ "cacheWriteTokens",
+ "inputTokens",
+ "last",
+ "outputTokens",
+ "requests",
+ ].sort(),
+ );
+ });
+
+ it("is scoped per tab", () => {
+ appendChunks("tab-a", [
+ usageDraft("t1", {
+ inputTokens: 10,
+ outputTokens: 1,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 0,
+ }),
+ ]);
+ appendChunks("tab-b", [
+ usageDraft("t2", {
+ inputTokens: 20,
+ outputTokens: 2,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 0,
+ }),
+ ]);
+ expect(getUsageStatsForTab("tab-a")?.inputTokens).toBe(10);
+ expect(getUsageStatsForTab("tab-b")?.inputTokens).toBe(20);
+ });
+});
diff --git a/packages/core/tests/models/catalog.test.ts b/packages/core/tests/models/catalog.test.ts
new file mode 100644
index 0000000..51043e6
--- /dev/null
+++ b/packages/core/tests/models/catalog.test.ts
@@ -0,0 +1,158 @@
+import { existsSync, rmSync, utimesSync, writeFileSync } from "node:fs";
+import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
+import {
+ __resetCatalogCacheForTests,
+ getModelsCatalog,
+ resolveContextLimit,
+} from "../../src/models/catalog.js";
+
+const CACHE_PATH = "/tmp/dispatch/models-dev.json";
+
+// A trimmed models.dev-shaped catalog covering the providers we support.
+const CATALOG = {
+ anthropic: {
+ id: "anthropic",
+ models: {
+ "claude-sonnet-4-5": { limit: { context: 200000, output: 64000 } },
+ "claude-sonnet-4-6": { limit: { context: 1000000, output: 64000 } },
+ },
+ },
+ opencode: {
+ id: "opencode",
+ models: {
+ "glm-4-6": { limit: { context: 131072, output: 8192 } },
+ },
+ },
+};
+
+function mockFetchOnce(catalog: unknown, ok = true, status = 200) {
+ const fn = vi.fn(() =>
+ Promise.resolve({
+ ok,
+ status,
+ text: () => Promise.resolve(JSON.stringify(catalog)),
+ } as Response),
+ );
+ vi.stubGlobal("fetch", fn);
+ return fn;
+}
+
+beforeEach(() => {
+ __resetCatalogCacheForTests();
+ if (existsSync(CACHE_PATH)) rmSync(CACHE_PATH);
+ delete process.env.DISPATCH_DISABLE_MODELS_FETCH;
+});
+
+afterEach(() => {
+ vi.unstubAllGlobals();
+ if (existsSync(CACHE_PATH)) rmSync(CACHE_PATH);
+});
+
+describe("resolveContextLimit", () => {
+ it("resolves a known anthropic model to its context window", async () => {
+ mockFetchOnce(CATALOG);
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-5")).toBe(200000);
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-6")).toBe(1000000);
+ });
+
+ it("maps opencode-anthropic to the anthropic catalog, then opencode fallback", async () => {
+ mockFetchOnce(CATALOG);
+ // Present in the anthropic catalog.
+ expect(await resolveContextLimit("opencode-anthropic", "claude-sonnet-4-5")).toBe(200000);
+ // Absent in anthropic, found in the opencode gateway catalog.
+ expect(await resolveContextLimit("opencode-anthropic", "glm-4-6")).toBe(131072);
+ });
+
+ it("returns null for an unknown model id", async () => {
+ mockFetchOnce(CATALOG);
+ expect(await resolveContextLimit("anthropic", "no-such-model")).toBeNull();
+ });
+
+ it("returns null for an unsupported provider (no network needed)", async () => {
+ const fetchFn = mockFetchOnce(CATALOG);
+ expect(await resolveContextLimit("google", "gemini-2.5-pro")).toBeNull();
+ expect(await resolveContextLimit("anthropic", "")).toBeNull();
+ expect(fetchFn).not.toHaveBeenCalled();
+ });
+
+ it("returns null when the model has no positive context limit", async () => {
+ mockFetchOnce({
+ anthropic: { id: "anthropic", models: { broken: { limit: { context: 0 } } } },
+ });
+ expect(await resolveContextLimit("anthropic", "broken")).toBeNull();
+ });
+
+ it("does not throw on a malformed provider entry missing `models`", async () => {
+ // A provider object without a `models` map must degrade to null, not crash.
+ mockFetchOnce({ anthropic: { id: "anthropic" } });
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-5")).toBeNull();
+ });
+
+ it("does not throw when limit/context fields are absent", async () => {
+ mockFetchOnce({ anthropic: { id: "anthropic", models: { m: {} } } });
+ expect(await resolveContextLimit("anthropic", "m")).toBeNull();
+ });
+});
+
+describe("getModelsCatalog caching", () => {
+ it("fetches once and serves the in-process memo on subsequent calls", async () => {
+ const fetchFn = mockFetchOnce(CATALOG);
+ await resolveContextLimit("anthropic", "claude-sonnet-4-5");
+ await resolveContextLimit("anthropic", "claude-sonnet-4-6");
+ await getModelsCatalog();
+ expect(fetchFn).toHaveBeenCalledTimes(1);
+ });
+
+ it("reuses a fresh disk cache without re-fetching across processes", async () => {
+ // Simulate another process having written a fresh cache.
+ writeFileSync(CACHE_PATH, JSON.stringify(CATALOG), "utf-8");
+ const fetchFn = vi.fn(() => Promise.reject(new Error("network should not be hit")));
+ vi.stubGlobal("fetch", fetchFn);
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-5")).toBe(200000);
+ expect(fetchFn).not.toHaveBeenCalled();
+ });
+
+ it("falls back to a STALE disk cache when the network fails", async () => {
+ writeFileSync(CACHE_PATH, JSON.stringify(CATALOG), "utf-8");
+ // Age the cache well past the TTL so the fetch path is taken.
+ const old = Date.now() / 1000 - 3600;
+ utimesSync(CACHE_PATH, old, old);
+ const fetchFn = vi.fn(() => Promise.reject(new Error("offline")));
+ vi.stubGlobal("fetch", fetchFn);
+ const warn = vi.spyOn(console, "warn").mockImplementation(() => {});
+
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-5")).toBe(200000);
+ expect(fetchFn).toHaveBeenCalledTimes(1);
+ warn.mockRestore();
+ });
+
+ it("returns null when fetch fails and no cache exists", async () => {
+ const fetchFn = vi.fn(() => Promise.reject(new Error("offline")));
+ vi.stubGlobal("fetch", fetchFn);
+ const warn = vi.spyOn(console, "warn").mockImplementation(() => {});
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-5")).toBeNull();
+ warn.mockRestore();
+ });
+
+ it("does not hit the network when DISPATCH_DISABLE_MODELS_FETCH is set", async () => {
+ process.env.DISPATCH_DISABLE_MODELS_FETCH = "1";
+ const fetchFn = vi.fn(() => Promise.reject(new Error("should not fetch")));
+ vi.stubGlobal("fetch", fetchFn);
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-5")).toBeNull();
+ expect(fetchFn).not.toHaveBeenCalled();
+ });
+
+ it("memoizes the fallback after a failed fetch so it does not re-hit the network", async () => {
+ const fetchFn = vi.fn(() => Promise.reject(new Error("offline")));
+ vi.stubGlobal("fetch", fetchFn);
+ const warn = vi.spyOn(console, "warn").mockImplementation(() => {});
+
+ // First lookup triggers the (failing) fetch.
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-5")).toBeNull();
+ // Subsequent lookups within the penalty window must NOT re-fetch.
+ expect(await resolveContextLimit("anthropic", "claude-sonnet-4-6")).toBeNull();
+ await getModelsCatalog();
+ expect(fetchFn).toHaveBeenCalledTimes(1);
+ warn.mockRestore();
+ });
+});
diff --git a/packages/frontend/src/App.svelte b/packages/frontend/src/App.svelte
index 3f1c500..ecfdc9f 100644
--- a/packages/frontend/src/App.svelte
+++ b/packages/frontend/src/App.svelte
@@ -75,6 +75,62 @@ $effect(() => {
}
});
+// ─── Context-window max lookup ─────────────────────────────────
+// Resolve the active model's MAXIMUM context window from models.dev (via the
+// API), so the Context Window sidebar view can show `current / max`. Cached
+// per provider+model; `null` when unknown (the view then hides the
+// denominator/percentage). Only Claude-backed providers are resolvable.
+let contextLimit = $state<number | null>(null);
+const contextLimitCache = new Map<string, number | null>();
+
+$effect(() => {
+ const tab = tabStore.activeTab;
+ const keyId = tab?.keyId ?? null;
+ const modelId = tab?.modelId ?? null;
+ const provider = keyId ? (modelsData.keys.find((k) => k.id === keyId)?.provider ?? null) : null;
+
+ if (!provider || !modelId) {
+ contextLimit = null;
+ return;
+ }
+
+ const cacheKey = `${provider}/${modelId}`;
+ if (contextLimitCache.has(cacheKey)) {
+ contextLimit = contextLimitCache.get(cacheKey) ?? null;
+ return;
+ }
+
+ // Clear immediately so a slow/failed fetch can't leave the PREVIOUS
+ // model's max on screen (which would render this model's tokens against
+ // the wrong denominator). The view degrades to a bare token count until
+ // the fetch resolves.
+ contextLimit = null;
+
+ // Fetch is async; guard against a stale response overwriting a newer
+ // selection by re-checking the active tab's key/model on resolve.
+ void (async () => {
+ try {
+ const res = await fetch(
+ `${config.apiBase}/models/context-limit?provider=${encodeURIComponent(provider)}&modelId=${encodeURIComponent(modelId)}`,
+ );
+ if (!res.ok) return;
+ const data = (await res.json()) as { contextLimit?: number | null };
+ const limit = data.contextLimit ?? null;
+ contextLimitCache.set(cacheKey, limit);
+ const current = tabStore.activeTab;
+ const currentProvider = current?.keyId
+ ? (modelsData.keys.find((k) => k.id === current.keyId)?.provider ?? null)
+ : null;
+ if (currentProvider === provider && current?.modelId === modelId) {
+ contextLimit = limit;
+ }
+ } catch {
+ // Leave contextLimit as-is on network error; view falls back to
+ // showing the bare token count.
+ }
+ })();
+});
+
onMount(() => {
// Apply persisted theme (or the shared DEFAULT_THEME if nothing is
// stored) so the first paint matches what the Settings panel will
@@ -138,6 +194,7 @@ onMount(() => {
tasks={tabStore.activeTab?.tasks ?? []}
cacheStats={tabStore.activeTab?.cacheStats ?? null}
cacheTabTitle={tabStore.activeTab?.title ?? null}
+ {contextLimit}
permissionLog={tabStore.permissionLog}
apiBase={config.apiBase}
activeKeyId={tabStore.activeTab?.keyId ?? null}
diff --git a/packages/frontend/src/lib/components/ContextWindowPanel.svelte b/packages/frontend/src/lib/components/ContextWindowPanel.svelte
new file mode 100644
index 0000000..6c7de05
--- /dev/null
+++ b/packages/frontend/src/lib/components/ContextWindowPanel.svelte
@@ -0,0 +1,85 @@
+<script lang="ts">
+import { computeContextUsage } from "../context-window.js";
+import type { CacheStats } from "../types.js";
+
+const {
+ cacheStats = null,
+ contextLimit = null,
+ tabTitle = null,
+ modelId = null,
+}: {
+ cacheStats?: CacheStats | null;
+ contextLimit?: number | null;
+ tabTitle?: string | null;
+ modelId?: string | null;
+} = $props();
+
+const usage = $derived(computeContextUsage(cacheStats, contextLimit));
+
+// As the window fills, escalate color: calm → warning → danger.
+function fillClass(pct: number): string {
+ if (pct >= 90) return "progress-error";
+ if (pct >= 70) return "progress-warning";
+ return "progress-success";
+}
+
+function fmt(n: number): string {
+ return n.toLocaleString();
+}
+
+const hasUsage = $derived((cacheStats?.last ?? null) !== null);
+</script>
+
+<div class="flex flex-col gap-3 flex-1 min-h-0 overflow-y-auto">
+ {#if !hasUsage}
+ <p class="text-xs text-base-content/50">
+ No context data yet. Send a message — the current context size appears
+ here after the first response.
+ </p>
+ {:else}
+ <div class="bg-base-200 rounded-lg p-2">
+ <div class="flex items-center gap-1.5 mb-2">
+ <span class="text-xs font-semibold">Context Window</span>
+ {#if tabTitle}
+ <span class="badge badge-xs badge-ghost">{tabTitle}</span>
+ {/if}
+ {#if usage.percent !== null}
+ <span class="badge badge-xs ml-auto">{usage.percent.toFixed(2)}%</span>
+ {/if}
+ </div>
+
+ <!-- Headline: current / max (or just current when max is unknown) -->
+ <div class="flex items-baseline gap-1.5">
+ <span class="text-lg font-mono font-semibold">{fmt(usage.current)}</span>
+ {#if usage.max !== null}
+ <span class="text-xs text-base-content/50 font-mono">/ {fmt(usage.max)}</span>
+ {/if}
+ <span class="text-xs text-base-content/40 ml-1">tokens</span>
+ </div>
+
+ {#if usage.percent !== null}
+ <progress
+ class="progress w-full h-2 mt-1.5 {fillClass(usage.percent)}"
+ value={usage.percent}
+ max="100"
+ ></progress>
+ {:else}
+ <p class="text-xs text-base-content/40 mt-1.5">
+ Max context size unknown for this model.
+ </p>
+ {/if}
+
+ {#if modelId}
+ <div class="text-xs text-base-content/40 mt-1.5 truncate" title={modelId}>
+ {modelId}
+ </div>
+ {/if}
+ </div>
+
+ <p class="text-xs text-base-content/40">
+ Current context = the most recent request's prompt + output (what the
+ model actually held in its window that turn). Grows as the conversation
+ gets longer. Resets on reload.
+ </p>
+ {/if}
+</div>
diff --git a/packages/frontend/src/lib/components/SidebarPanel.svelte b/packages/frontend/src/lib/components/SidebarPanel.svelte
index 3372396..519f411 100644
--- a/packages/frontend/src/lib/components/SidebarPanel.svelte
+++ b/packages/frontend/src/lib/components/SidebarPanel.svelte
@@ -4,6 +4,7 @@ import type { CacheStats, KeyInfo, LogEntry, TaskItem } from "../types.js";
import CacheRatePanel from "./CacheRatePanel.svelte";
import ClaudeReset from "./ClaudeReset.svelte";
import ConfigPanel from "./ConfigPanel.svelte";
+import ContextWindowPanel from "./ContextWindowPanel.svelte";
import DebugPanel from "./DebugPanel.svelte";
import KeyUsage from "./KeyUsage.svelte";
import ModelSelector from "./ModelSelector.svelte";
@@ -27,6 +28,7 @@ const {
tasks = [],
cacheStats = null,
cacheTabTitle = null,
+ contextLimit = null,
permissionLog = [],
apiBase = "",
activeKeyId = null,
@@ -47,6 +49,7 @@ const {
tasks?: TaskItem[];
cacheStats?: CacheStats | null;
cacheTabTitle?: string | null;
+ contextLimit?: number | null;
permissionLog?: LogEntry[];
apiBase?: string;
activeKeyId?: string | null;
@@ -89,6 +92,7 @@ const viewOptions = [
"Chat Settings",
"Key Usage",
"Cache Rate",
+ "Context Window",
"Claude Reset",
"Model Status",
"Tasks",
@@ -170,6 +174,13 @@ function contentClass(_selected: string): string {
<KeyUsage {keys} {apiBase} />
{:else if panel.selected === "Cache Rate"}
<CacheRatePanel {cacheStats} tabTitle={cacheTabTitle} />
+ {:else if panel.selected === "Context Window"}
+ <ContextWindowPanel
+ {cacheStats}
+ {contextLimit}
+ tabTitle={cacheTabTitle}
+ modelId={activeModelId}
+ />
{:else if panel.selected === "Claude Reset"}
<ClaudeReset {apiBase} />
{:else if panel.selected === "Model Status"}
diff --git a/packages/frontend/src/lib/context-window.ts b/packages/frontend/src/lib/context-window.ts
new file mode 100644
index 0000000..c4321f8
--- /dev/null
+++ b/packages/frontend/src/lib/context-window.ts
@@ -0,0 +1,37 @@
+import type { CacheStats } from "./types.js";
+
+/**
+ * Context-window occupancy for the current tab/model.
+ *
+ * `current` is the size of the model's context on the MOST RECENT request —
+ * the last turn's full prompt (`inputTokens`, which already includes cached
+ * tokens for Anthropic) plus what the model generated that turn
+ * (`outputTokens`). This mirrors how opencode derives context fullness from
+ * the last assistant message, and reflects what actually occupies the model's
+ * window — NOT the session-cumulative totals shown by the Cache Rate view.
+ *
+ * `max` is the model's maximum context window from models.dev (or `null` when
+ * unknown). `percent` is `current / max * 100` clamped to [0, 100] (unrounded;
+ * the UI decides the displayed precision), or `null` when
+ * `max` is unknown — in which case the UI shows the bare token count with no
+ * denominator or progress bar.
+ */
+export interface ContextUsage {
+ current: number;
+ max: number | null;
+ percent: number | null;
+}
+
+export function computeContextUsage(
+ cacheStats: CacheStats | null | undefined,
+ contextLimit: number | null | undefined,
+): ContextUsage {
+ const last = cacheStats?.last ?? null;
+ const current = last ? last.inputTokens + last.outputTokens : 0;
+ const max = typeof contextLimit === "number" && contextLimit > 0 ? contextLimit : null;
+ // Precise (unrounded) percentage clamped to [0, 100]; the UI formats the
+ // decimal places. Kept unrounded so small contexts against huge windows
+ // (e.g. a few thousand tokens vs. 1,000,000) still read non-zero.
+ const percent = max ? Math.max(0, Math.min(100, (current / max) * 100)) : null;
+ return { current, max, percent };
+}
diff --git a/packages/frontend/src/lib/tabs.svelte.ts b/packages/frontend/src/lib/tabs.svelte.ts
index d3061c3..ec718bd 100644
--- a/packages/frontend/src/lib/tabs.svelte.ts
+++ b/packages/frontend/src/lib/tabs.svelte.ts
@@ -761,6 +761,13 @@ export function createTabStore() {
keyId?: string | null;
modelId?: string | null;
parentTabId?: string | null;
+ // Backend usage aggregate (GET /tabs). Structurally identical to
+ // CacheStats, so it seeds `cacheStats` directly on reload. This is the
+ // initial seed (hydrate runs only when tabs.length === 0, i.e. a true
+ // reload); thereafter `turn-sealed` REPLACES cacheStats with the same
+ // aggregate each turn, keeping the live accumulator reconciled to the DB
+ // truth. Neither path ADDS to live events, so there is no double-count.
+ usageStats?: CacheStats | null;
}> = [];
try {
const res = await fetch(`${config.apiBase}/tabs`);
@@ -849,6 +856,7 @@ export function createTabStore() {
chunkLimit: appSettings.chunkLimit,
oldestLoadedSeq: win.oldestSeq,
totalChunks: win.total,
+ cacheStats: row.usageStats ?? undefined,
};
});
@@ -933,6 +941,15 @@ export function createTabStore() {
// tail into the sealed chunk log (refetch real seqs), preserving any
// newer in-flight turn. Deferred while scrolled up.
reconcileSealedTurn(tabId, event.turnId);
+ // Reconcile cacheStats to the DB source-of-truth carried on the event.
+ // REPLACE (not add): the aggregate already includes every persisted
+ // usage row for this tab, so this both lands the just-sealed turn's
+ // usage AND self-heals any live overshoot (e.g. a rate-limited
+ // fallback attempt streamed usage live but was discarded server-side).
+ // `usageStats === undefined` (older backend) leaves cacheStats as-is.
+ if (event.usageStats !== undefined) {
+ updateTab(tabId, { cacheStats: event.usageStats ?? undefined });
+ }
break;
}
case "statuses": {
diff --git a/packages/frontend/src/lib/types.ts b/packages/frontend/src/lib/types.ts
index 285b4d2..173f68c 100644
--- a/packages/frontend/src/lib/types.ts
+++ b/packages/frontend/src/lib/types.ts
@@ -140,7 +140,12 @@ export type AgentEvent =
| { type: "turn-start"; turnId: string }
// Fires after the turn settled AND its chunks were persisted (after the DB
// write, post status:idle). Triggers the frontend's reconcile-from-DB.
- | { type: "turn-sealed"; turnId: string }
+ // `usageStats` carries the tab's authoritative usage aggregate (read after the
+ // usage rows were persisted); the store REPLACES `cacheStats` with it,
+ // reconciling the live accumulator to the DB truth (self-heals the live
+ // overshoot from a discarded rate-limited fallback attempt). null ⇒ no usage
+ // rows; absent ⇒ leave cacheStats untouched.
+ | { type: "turn-sealed"; turnId: string; usageStats?: CacheStats | null }
// Sent on every WS (re)connect: a snapshot of every tab the backend is
// currently tracking and its live status. The frontend uses this to
// detect desync after a reconnect (e.g. bun --watch restart killed the
diff --git a/packages/frontend/tests/chat-store.test.ts b/packages/frontend/tests/chat-store.test.ts
index 33a9f69..c0763cd 100644
--- a/packages/frontend/tests/chat-store.test.ts
+++ b/packages/frontend/tests/chat-store.test.ts
@@ -71,6 +71,7 @@ beforeEach(() => {
);
});
+import { computeContextUsage } from "../src/lib/context-window.js";
import { appSettings } from "../src/lib/settings.svelte.js";
import { createTabStore } from "../src/lib/tabs.svelte.js";
import type { Chunk, PermissionPrompt } from "../src/lib/types.js";
@@ -1005,6 +1006,257 @@ describe("hydrateFromBackend", () => {
expect(tC?.renderGroups.length).toBe(0);
expect(tC?.agentStatus).toBe("idle");
});
+
+ // ─── usage persistence: seed cacheStats from the backend aggregate ──
+ it("seeds cacheStats from a tab's usageStats on hydrate (reload persistence)", async () => {
+ const usageStats = {
+ inputTokens: 2200,
+ outputTokens: 100,
+ cacheReadTokens: 1000,
+ cacheWriteTokens: 1000,
+ requests: 2,
+ last: { inputTokens: 1200, outputTokens: 60, cacheReadTokens: 1000, cacheWriteTokens: 100 },
+ };
+ vi.stubGlobal(
+ "fetch",
+ vi.fn((url: string) => {
+ if (url.endsWith("/tabs")) {
+ return Promise.resolve({
+ ok: true,
+ json: () =>
+ Promise.resolve({
+ tabs: [
+ {
+ id: "tu",
+ title: "Has usage",
+ keyId: null,
+ modelId: null,
+ parentTabId: null,
+ usageStats,
+ },
+ {
+ id: "tn",
+ title: "No usage",
+ keyId: null,
+ modelId: null,
+ parentTabId: null,
+ usageStats: null,
+ },
+ ],
+ }),
+ });
+ }
+ if (url.endsWith("/status")) {
+ return Promise.resolve({ ok: true, json: () => Promise.resolve({ statuses: {} }) });
+ }
+ if (url.split("?")[0]?.endsWith("/tabs/tu/chunks")) {
+ return Promise.resolve({
+ ok: true,
+ json: () => Promise.resolve({ chunks: [], total: 0, oldestSeq: null }),
+ });
+ }
+ if (url.split("?")[0]?.endsWith("/tabs/tn/chunks")) {
+ return Promise.resolve({
+ ok: true,
+ json: () => Promise.resolve({ chunks: [], total: 0, oldestSeq: null }),
+ });
+ }
+ return Promise.reject(new Error(`unexpected fetch ${url}`));
+ }),
+ );
+
+ const store = createTabStore();
+ const n = await store.hydrateFromBackend();
+ expect(n).toBe(2);
+ // Tab with persisted usage → cacheStats seeded directly from the aggregate.
+ expect(store.tabs.find((t) => t.id === "tu")?.cacheStats).toEqual(usageStats);
+ // Tab with null usageStats → cacheStats stays undefined (no usage yet).
+ expect(store.tabs.find((t) => t.id === "tn")?.cacheStats).toBeUndefined();
+ });
+
+ it("does not re-seed or double-count cacheStats on a statuses reconnect after hydrate", async () => {
+ const usageStats = {
+ inputTokens: 1000,
+ outputTokens: 40,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 900,
+ requests: 1,
+ last: { inputTokens: 1000, outputTokens: 40, cacheReadTokens: 0, cacheWriteTokens: 900 },
+ };
+ vi.stubGlobal(
+ "fetch",
+ vi.fn((url: string) => {
+ if (url.endsWith("/tabs")) {
+ return Promise.resolve({
+ ok: true,
+ json: () =>
+ Promise.resolve({
+ tabs: [
+ {
+ id: "tr",
+ title: "Reconnect",
+ keyId: null,
+ modelId: null,
+ parentTabId: null,
+ usageStats,
+ },
+ ],
+ }),
+ });
+ }
+ if (url.endsWith("/status")) {
+ return Promise.resolve({ ok: true, json: () => Promise.resolve({ statuses: {} }) });
+ }
+ if (url.split("?")[0]?.endsWith("/tabs/tr/chunks")) {
+ return Promise.resolve({
+ ok: true,
+ json: () => Promise.resolve({ chunks: [], total: 0, oldestSeq: null }),
+ });
+ }
+ return Promise.reject(new Error(`unexpected fetch ${url}`));
+ }),
+ );
+
+ const store = createTabStore();
+ await store.hydrateFromBackend();
+ expect(store.tabs.find((t) => t.id === "tr")?.cacheStats).toEqual(usageStats);
+
+ // A WS reconnect snapshot must NOT touch cacheStats (in-session live
+ // `usage` events own the running totals; the aggregate seed is reload-only).
+ store.handleEvent({ type: "statuses", statuses: { tr: { status: "idle" } } });
+ expect(store.tabs.find((t) => t.id === "tr")?.cacheStats).toEqual(usageStats);
+ });
+
+ it("keeps accumulating live usage events after a hydrate seed", async () => {
+ const usageStats = {
+ inputTokens: 1000,
+ outputTokens: 40,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 900,
+ requests: 1,
+ last: { inputTokens: 1000, outputTokens: 40, cacheReadTokens: 0, cacheWriteTokens: 900 },
+ };
+ vi.stubGlobal(
+ "fetch",
+ vi.fn((url: string) => {
+ if (url.endsWith("/tabs")) {
+ return Promise.resolve({
+ ok: true,
+ json: () =>
+ Promise.resolve({
+ tabs: [
+ {
+ id: "tl",
+ title: "Live after hydrate",
+ keyId: null,
+ modelId: null,
+ parentTabId: null,
+ usageStats,
+ },
+ ],
+ }),
+ });
+ }
+ if (url.endsWith("/status")) {
+ return Promise.resolve({ ok: true, json: () => Promise.resolve({ statuses: {} }) });
+ }
+ if (url.split("?")[0]?.endsWith("/tabs/tl/chunks")) {
+ return Promise.resolve({
+ ok: true,
+ json: () => Promise.resolve({ chunks: [], total: 0, oldestSeq: null }),
+ });
+ }
+ return Promise.reject(new Error(`unexpected fetch ${url}`));
+ }),
+ );
+
+ const store = createTabStore();
+ await store.hydrateFromBackend();
+
+ // A new in-session usage event folds ON TOP of the seeded aggregate.
+ store.handleEvent({
+ type: "usage",
+ tabId: "tl",
+ usage: { inputTokens: 200, outputTokens: 10, cacheReadTokens: 150, cacheWriteTokens: 0 },
+ });
+ const stats = store.tabs.find((t) => t.id === "tl")?.cacheStats;
+ expect(stats?.requests).toBe(2);
+ expect(stats?.inputTokens).toBe(1200);
+ expect(stats?.outputTokens).toBe(50);
+ expect(stats?.cacheReadTokens).toBe(150);
+ expect(stats?.cacheWriteTokens).toBe(900);
+ expect(stats?.last).toEqual({
+ inputTokens: 200,
+ outputTokens: 10,
+ cacheReadTokens: 150,
+ cacheWriteTokens: 0,
+ });
+ });
+
+ // Cross-feature contract (Context Window view, branch u2): the panel derives
+ // current context size from `cacheStats.last` via computeContextUsage. This
+ // test proves persistence restores that field on hydrate, so the view shows a
+ // real "x / max" immediately after a reload on a NEW DEVICE — not "No context
+ // data yet". Guards the contract so neither side can silently break it.
+ it("restores cacheStats.last on hydrate so the Context Window view has data after reload", async () => {
+ const usageStats = {
+ inputTokens: 90000,
+ outputTokens: 3000,
+ cacheReadTokens: 40000,
+ cacheWriteTokens: 5000,
+ requests: 3,
+ // Most recent request's snapshot — the numerator the view reads.
+ last: { inputTokens: 47000, outputTokens: 1200, cacheReadTokens: 30000, cacheWriteTokens: 0 },
+ };
+ vi.stubGlobal(
+ "fetch",
+ vi.fn((url: string) => {
+ if (url.endsWith("/tabs")) {
+ return Promise.resolve({
+ ok: true,
+ json: () =>
+ Promise.resolve({
+ tabs: [
+ {
+ id: "tc",
+ title: "Context after reload",
+ keyId: null,
+ modelId: null,
+ parentTabId: null,
+ usageStats,
+ },
+ ],
+ }),
+ });
+ }
+ if (url.endsWith("/status")) {
+ return Promise.resolve({ ok: true, json: () => Promise.resolve({ statuses: {} }) });
+ }
+ if (url.split("?")[0]?.endsWith("/tabs/tc/chunks")) {
+ return Promise.resolve({
+ ok: true,
+ json: () => Promise.resolve({ chunks: [], total: 0, oldestSeq: null }),
+ });
+ }
+ return Promise.reject(new Error(`unexpected fetch ${url}`));
+ }),
+ );
+
+ const store = createTabStore();
+ await store.hydrateFromBackend();
+
+ // What App.svelte passes into ContextWindowPanel: the active tab's cacheStats
+ // plus the model's max (re-resolved from models.dev on load — here 200k).
+ const cacheStats = store.tabs.find((t) => t.id === "tc")?.cacheStats ?? null;
+ expect(cacheStats?.last).not.toBeNull();
+
+ const usage = computeContextUsage(cacheStats, 200000);
+ // current = last.inputTokens + last.outputTokens (47000 + 1200), NOT the
+ // cumulative session totals (which would double-count history).
+ expect(usage.current).toBe(48200);
+ expect(usage.max).toBe(200000);
+ expect(usage.percent).toBeCloseTo(24.1, 5);
+ });
});
// ─── statuses WS event with the wider TabStatusSnapshot shape ───
@@ -1120,6 +1372,87 @@ describe("tabStore — cache rate (usage events)", () => {
});
expect(store.tabs[0]?.cacheStats).toBeUndefined();
});
+
+ it("turn-sealed REPLACES cacheStats with the carried DB aggregate (reconcile to truth)", async () => {
+ const { store, tabId } = await setupStoreWithTab();
+ // Live events accumulate during the turn.
+ store.handleEvent({
+ type: "usage",
+ tabId,
+ usage: { inputTokens: 1000, outputTokens: 40, cacheReadTokens: 0, cacheWriteTokens: 900 },
+ });
+ expect(store.tabs.find((t) => t.id === tabId)?.cacheStats?.inputTokens).toBe(1000);
+
+ // turn-sealed carries the authoritative aggregate → cacheStats is REPLACED.
+ const aggregate = {
+ inputTokens: 1000,
+ outputTokens: 40,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 900,
+ requests: 1,
+ last: { inputTokens: 1000, outputTokens: 40, cacheReadTokens: 0, cacheWriteTokens: 900 },
+ };
+ store.handleEvent({ type: "turn-sealed", turnId: "t1", tabId, usageStats: aggregate });
+ expect(store.tabs.find((t) => t.id === tabId)?.cacheStats).toEqual(aggregate);
+ });
+
+ it("turn-sealed self-heals a live overshoot from a discarded fallback attempt", async () => {
+ const { store, tabId } = await setupStoreWithTab();
+ // Attempt 1 streamed usage live (overshoot), then rate-limited & discarded
+ // server-side; attempt 2's usage also streamed live. Live = sum of BOTH.
+ store.handleEvent({
+ type: "usage",
+ tabId,
+ usage: { inputTokens: 999, outputTokens: 9, cacheReadTokens: 0, cacheWriteTokens: 0 },
+ });
+ store.handleEvent({
+ type: "usage",
+ tabId,
+ usage: { inputTokens: 222, outputTokens: 22, cacheReadTokens: 100, cacheWriteTokens: 5 },
+ });
+ const overshoot = store.tabs.find((t) => t.id === tabId)?.cacheStats;
+ expect(overshoot?.requests).toBe(2);
+ expect(overshoot?.inputTokens).toBe(1221); // inflated: includes discarded attempt
+
+ // The DB only persisted attempt 2 (the survivor). turn-sealed reconciles.
+ const persisted = {
+ inputTokens: 222,
+ outputTokens: 22,
+ cacheReadTokens: 100,
+ cacheWriteTokens: 5,
+ requests: 1,
+ last: { inputTokens: 222, outputTokens: 22, cacheReadTokens: 100, cacheWriteTokens: 5 },
+ };
+ store.handleEvent({ type: "turn-sealed", turnId: "t1", tabId, usageStats: persisted });
+ // Overshoot healed: cacheStats now matches the DB truth exactly.
+ expect(store.tabs.find((t) => t.id === tabId)?.cacheStats).toEqual(persisted);
+ });
+
+ it("turn-sealed without usageStats leaves cacheStats untouched (back-compat)", async () => {
+ const { store, tabId } = await setupStoreWithTab();
+ store.handleEvent({
+ type: "usage",
+ tabId,
+ usage: { inputTokens: 500, outputTokens: 5, cacheReadTokens: 0, cacheWriteTokens: 0 },
+ });
+ const before = store.tabs.find((t) => t.id === tabId)?.cacheStats;
+ // Older backend: turn-sealed carries no usageStats field.
+ store.handleEvent({ type: "turn-sealed", turnId: "t1", tabId });
+ expect(store.tabs.find((t) => t.id === tabId)?.cacheStats).toEqual(before);
+ });
+
+ it("turn-sealed with usageStats: null clears cacheStats", async () => {
+ const { store, tabId } = await setupStoreWithTab();
+ store.handleEvent({
+ type: "usage",
+ tabId,
+ usage: { inputTokens: 500, outputTokens: 5, cacheReadTokens: 0, cacheWriteTokens: 0 },
+ });
+ expect(store.tabs.find((t) => t.id === tabId)?.cacheStats).toBeDefined();
+ // A null aggregate (no persisted usage rows) explicitly clears live stats.
+ store.handleEvent({ type: "turn-sealed", turnId: "t1", tabId, usageStats: null });
+ expect(store.tabs.find((t) => t.id === tabId)?.cacheStats).toBeUndefined();
+ });
});
// ─── chunk-native store: eviction, pagination, reconcile ────────
diff --git a/packages/frontend/tests/context-window.test.ts b/packages/frontend/tests/context-window.test.ts
new file mode 100644
index 0000000..bb64ed5
--- /dev/null
+++ b/packages/frontend/tests/context-window.test.ts
@@ -0,0 +1,84 @@
+import { describe, expect, it } from "vitest";
+import { computeContextUsage } from "../src/lib/context-window.js";
+import type { CacheStats } from "../src/lib/types.js";
+
+function stats(last: CacheStats["last"]): CacheStats {
+ return {
+ inputTokens: 0,
+ outputTokens: 0,
+ cacheReadTokens: 0,
+ cacheWriteTokens: 0,
+ requests: last ? 1 : 0,
+ last,
+ };
+}
+
+describe("computeContextUsage", () => {
+ it("derives current context from the LAST request's input + output", () => {
+ const usage = computeContextUsage(
+ stats({
+ inputTokens: 47000,
+ outputTokens: 1200,
+ cacheReadTokens: 40000,
+ cacheWriteTokens: 0,
+ }),
+ 200000,
+ );
+ // 47000 + 1200 — NOT the cumulative totals, and cache tokens are already
+ // inside inputTokens (not re-added).
+ expect(usage.current).toBe(48200);
+ expect(usage.max).toBe(200000);
+ expect(usage.percent).toBeCloseTo(24.1, 5); // 48200 / 200000 * 100, unrounded
+ });
+
+ it("returns max=null and percent=null when the limit is unknown", () => {
+ const usage = computeContextUsage(
+ stats({ inputTokens: 100, outputTokens: 0, cacheReadTokens: 0, cacheWriteTokens: 0 }),
+ null,
+ );
+ expect(usage.current).toBe(100);
+ expect(usage.max).toBeNull();
+ expect(usage.percent).toBeNull();
+ });
+
+ it("treats a non-positive limit as unknown", () => {
+ const usage = computeContextUsage(
+ stats({ inputTokens: 100, outputTokens: 0, cacheReadTokens: 0, cacheWriteTokens: 0 }),
+ 0,
+ );
+ expect(usage.max).toBeNull();
+ expect(usage.percent).toBeNull();
+ });
+
+ it("reports zero usage when no request has completed yet", () => {
+ expect(computeContextUsage(null, 200000)).toEqual({
+ current: 0,
+ max: 200000,
+ percent: 0,
+ });
+ expect(computeContextUsage(stats(null), 200000)).toEqual({
+ current: 0,
+ max: 200000,
+ percent: 0,
+ });
+ });
+
+ it("clamps percent to 100 when context overflows the window", () => {
+ const usage = computeContextUsage(
+ stats({ inputTokens: 250000, outputTokens: 5000, cacheReadTokens: 0, cacheWriteTokens: 0 }),
+ 200000,
+ );
+ expect(usage.current).toBe(255000);
+ expect(usage.percent).toBe(100);
+ });
+
+ it("keeps an unrounded percent so the UI can show 2 decimals", () => {
+ const usage = computeContextUsage(
+ stats({ inputTokens: 3690, outputTokens: 0, cacheReadTokens: 0, cacheWriteTokens: 0 }),
+ 1000000,
+ );
+ // 3690 / 1,000,000 * 100 = 0.369 → displayed as "0.37%" (toFixed(2)).
+ expect(usage.percent).toBeCloseTo(0.369, 6);
+ expect((usage.percent as number).toFixed(2)).toBe("0.37");
+ });
+});