summaryrefslogtreecommitdiffhomepage
path: root/packages/api
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-05-28 07:15:32 +0900
committerAdam Malczewski <[email protected]>2026-05-28 07:15:32 +0900
commit2f14260bb0f1a51d51e516feda285b68f793ae1b (patch)
treef366e3576c65b0e11d81fca0e84714771ebf83ce /packages/api
parent8b17d929e70a43749fd962554214bf8ba3e9380f (diff)
downloaddispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.tar.gz
dispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.zip
fix(api): pre-populate Agent.messages from DB on construction so model switches preserve chat history
Before this change, swapping the model mid-conversation via the sidebar slider lost all prior turns: the new model saw only the current user message and treated the conversation as brand-new. Root cause: `getOrCreateAgentForTab` invalidates the cached Agent (`tabAgent.agent = null`) whenever the effective keyId, modelId, permission key, or working directory differs from the cached values. The replacement Agent was then constructed with `messages: []` and the post-construction step that loads prior turns from the SQLite `messages` table simply did not exist. `processMessage` had already appended the current turn's user message to the DB (line 960) before calling `getOrCreateAgentForTab` (line 1015), so the DB held the full context — it was just never read. Fix: after every `new Agent(...)` in `getOrCreateAgentForTab`, call `getMessagesForTab(tabId)`, walk backwards to the most recent user-role row, and assign all strictly-prior rows to `tabAgent.agent.messages`. The walk-backwards strategy correctly handles two boundary cases: 1. Simple model switch — last DB row is the current user message; drop it (`Agent.run()` will push it again at line 546). 2. Agent-mode auto-fallback retry — last DB row may be a partial assistant response flushed by the previous failed attempt; we drop both that and the current user message in one pass. System-role rows (config-reload notices, etc.) are preserved verbatim; `toModelMessages` already strips them before the wire payload, so this is safe. The fix covers every Agent-reconstruction trigger, not just the model slider: - Sidebar model/key change (the reported case) - Permission setting change - Working-directory change (`processMessage` line 951) - dispatch.toml config-watcher reload (lines 236–237) - Skills directory watcher reload (lines 249–250) - `stopTab` after user cancellation (line 775) If `getMessagesForTab` throws (e.g. DB locked, schema mismatch), we swallow the error and leave `messages: []` — matching pre-fix behaviour for that case so this commit never regresses. Tests (+6 in `packages/api/tests/agent-manager.test.ts`, total 26): - pre-populates Agent.messages from DB history - leaves messages empty when DB has only the current turn (first msg) - excludes a partial assistant trail from a prior fallback attempt - preserves system-role rows in pre-populated history - survives a getMessagesForTab failure without crashing - reloads history on every Agent reconstruction (simulated slider switch from Opus to DeepSeek across two processMessage calls) The test rig was extended with: - `fakeMessagesByTab` map + `setFakeMessages` helper letting tests inject arbitrary DB rows for the mocked `getMessagesForTab`. - `constructedAgents` array captured at `run()` entry (not at construction) so each test sees the post-pre-populate snapshot; the production code reassigns `agent.messages` after `new Agent()` returns, so capturing at construction yielded a stale empty array. - Pluggable `runImpl` hook for tests that want a custom event stream (not yet exercised; staged for the next round of agent-mode fallback tests). Totals: 229 tests across 3 packages all green; typecheck clean on core + api + frontend; biome clean across 124 files.
Diffstat (limited to 'packages/api')
-rw-r--r--packages/api/src/agent-manager.ts49
-rw-r--r--packages/api/tests/agent-manager.test.ts300
2 files changed, 320 insertions, 29 deletions
diff --git a/packages/api/src/agent-manager.ts b/packages/api/src/agent-manager.ts
index 6e03adb..efdd732 100644
--- a/packages/api/src/agent-manager.ts
+++ b/packages/api/src/agent-manager.ts
@@ -654,6 +654,55 @@ export class AgentManager {
waitForQueuedMessage: () => this.waitForQueuedMessage(tabId),
},
);
+
+ // Pre-populate the Agent's in-memory message history from the DB
+ // so prior turns survive Agent recreation. The Agent is
+ // constructed fresh here in three scenarios that ALL discard
+ // the previous in-memory `messages` array:
+ // 1. First call for this tab (no prior Agent existed)
+ // 2. Model/key/permission/working-directory change — the
+ // invalidation gate above set `tabAgent.agent = null`.
+ // This is the model-switcher-slider case: without this
+ // pre-population, DeepSeek would see zero context after
+ // switching from Opus mid-conversation.
+ // 3. Config or skills reload (configWatcher / skillsWatcher
+ // also null out `tabAgent.agent`).
+ //
+ // Boundary semantics: `processMessage` calls `appendMessage`
+ // for the current turn's user message BEFORE calling this
+ // function, so the DB ends in `[..., u_current]`. In the
+ // fallback retry path (agent-mode automatic model fallback),
+ // the previous attempt may also have flushed a partial
+ // assistant response, so the DB ends in
+ // `[..., u_current, partial_a]`. Either way, we walk
+ // backwards to the most recent user-role row and load only
+ // strictly-prior rows: `agent.run()` will push the current
+ // user message itself at agent.ts:546, so including it here
+ // would duplicate it.
+ //
+ // `toModelMessages` already filters out `role === "system"`
+ // rows and strips `error` / `system` chunks, so it's safe to
+ // load system messages verbatim.
+ try {
+ const rows = getMessagesForTab(tabId);
+ let cutIdx = rows.length;
+ for (let i = rows.length - 1; i >= 0; i--) {
+ const row = rows[i];
+ if (row && row.role === "user") {
+ cutIdx = i;
+ break;
+ }
+ }
+ if (cutIdx > 0) {
+ tabAgent.agent.messages = rows
+ .slice(0, cutIdx)
+ .map((r) => ({ role: r.role, chunks: r.chunks }));
+ }
+ } catch {
+ // DB read failed — leave `messages: []`. The agent still
+ // works, just without prior history (matches pre-fix
+ // behaviour, so this is no worse than what we had before).
+ }
}
return tabAgent.agent;
}
diff --git a/packages/api/tests/agent-manager.test.ts b/packages/api/tests/agent-manager.test.ts
index ea8dc46..71d43d8 100644
--- a/packages/api/tests/agent-manager.test.ts
+++ b/packages/api/tests/agent-manager.test.ts
@@ -1,42 +1,103 @@
import type { AgentEvent, ToolDefinition } from "@dispatch/core";
-import { describe, expect, it, vi } from "vitest";
+import { beforeEach, describe, expect, it, vi } from "vitest";
// Spy on appendEventToChunks so we can assert persistence calls
const appendEventToChunksSpy = vi.fn((_chunks: unknown[], _event: unknown) => {
// no-op; we inspect calls in tests
});
+// Configurable stub for `getMessagesForTab`. Tests can push rows
+// before invoking `processMessage` to simulate prior conversation
+// history persisted in the DB (model-switch / history-replay path).
+interface FakeMessageRow {
+ id: string;
+ tabId: string;
+ seq: number;
+ role: "user" | "assistant" | "system";
+ chunks: unknown[];
+ createdAt: number;
+}
+const fakeMessagesByTab = new Map<string, FakeMessageRow[]>();
+function resetFakeMessages(): void {
+ fakeMessagesByTab.clear();
+}
+function setFakeMessages(tabId: string, rows: FakeMessageRow[]): void {
+ fakeMessagesByTab.set(tabId, rows);
+}
+function makeRow(
+ tabId: string,
+ seq: number,
+ role: "user" | "assistant" | "system",
+ chunks: unknown[],
+): FakeMessageRow {
+ return { id: `msg-${tabId}-${seq}`, tabId, seq, role, chunks, createdAt: seq };
+}
+
+// Hook into Agent construction so tests can assert what
+// `messages` was pre-populated with at the moment `run()` was
+// called (after the post-construction pre-populate step in
+// `getOrCreateAgentForTab` has had a chance to assign).
+//
+// We snapshot at `run()` invocation rather than at construction
+// because the production code reassigns `agent.messages =
+// rows.slice(...)` AFTER `new Agent()` returns — capturing a
+// reference at construction would yield a stale empty array.
+const constructedAgents: Array<{ initialMessages: unknown[] }> = [];
+function resetConstructedAgents(): void {
+ constructedAgents.length = 0;
+}
+
+// Allow tests to swap in a custom `run` generator (e.g. to simulate
+// a fallback failure mid-stream). Returning to undefined restores
+// the default.
+type RunGen = (msg: string) => AsyncGenerator<unknown>;
+let runImpl: RunGen | null = null;
+function setRunImpl(impl: RunGen | null): void {
+ runImpl = impl;
+}
+async function* defaultRun(_message: string): AsyncGenerator<unknown> {
+ yield { type: "status", status: "running" } as const;
+ await new Promise<void>((r) => setTimeout(r, 10));
+ yield { type: "reasoning-delta", delta: "thinking about it" } as const;
+ yield {
+ type: "reasoning-end",
+ metadata: { anthropic: { signature: "mock-sig" } },
+ } as const;
+ yield { type: "text-delta", delta: "Hello " } as const;
+ yield { type: "text-delta", delta: "world" } as const;
+ yield {
+ type: "done",
+ message: {
+ role: "assistant",
+ chunks: [
+ {
+ type: "thinking",
+ text: "thinking about it",
+ metadata: { anthropic: { signature: "mock-sig" } },
+ },
+ { type: "text", text: "Hello world" },
+ ],
+ },
+ } as const;
+ yield { type: "status", status: "idle" } as const;
+}
+
// Mock @dispatch/core's Agent to avoid real LLM calls
vi.mock("@dispatch/core", () => ({
Agent: class MockAgent {
status = "idle";
messages: unknown[] = [];
- async *run(_message: string) {
- yield { type: "status", status: "running" } as const;
- await new Promise<void>((r) => setTimeout(r, 10));
- // v6-style reasoning turn: delta(s) then end with providerMetadata
- yield { type: "reasoning-delta", delta: "thinking about it" } as const;
- yield {
- type: "reasoning-end",
- metadata: { anthropic: { signature: "mock-sig" } },
- } as const;
- yield { type: "text-delta", delta: "Hello " } as const;
- yield { type: "text-delta", delta: "world" } as const;
- yield {
- type: "done",
- message: {
- role: "assistant",
- chunks: [
- {
- type: "thinking",
- text: "thinking about it",
- metadata: { anthropic: { signature: "mock-sig" } },
- },
- { type: "text", text: "Hello world" },
- ],
- },
- } as const;
- yield { type: "status", status: "idle" } as const;
+ async *run(message: string): AsyncGenerator<unknown> {
+ // Snapshot the post-construction pre-populated message list
+ // the first thing `run()` does, before the real `Agent.run`
+ // would push the current user message at line 546. Tests
+ // inspect this to verify history was loaded correctly.
+ constructedAgents.push({ initialMessages: [...this.messages] });
+ if (runImpl) {
+ for await (const ev of runImpl(message)) yield ev;
+ return;
+ }
+ for await (const ev of defaultRun(message)) yield ev;
}
},
PermissionService: class MockPermissionService {
@@ -200,8 +261,8 @@ vi.mock("@dispatch/core", () => ({
},
appendMessage() {},
updateMessage() {},
- getMessagesForTab() {
- return [];
+ getMessagesForTab(tabId: string) {
+ return fakeMessagesByTab.get(tabId) ?? [];
},
appendEventToChunks: appendEventToChunksSpy,
applySystemEvent(_messages: unknown[], _event: unknown) {
@@ -245,6 +306,13 @@ vi.mock("@dispatch/core", () => ({
const { AgentManager } = await import("../src/agent-manager.js");
describe("AgentManager", () => {
+ beforeEach(() => {
+ resetFakeMessages();
+ resetConstructedAgents();
+ setRunImpl(null);
+ appendEventToChunksSpy.mockClear();
+ });
+
it("initial status is idle", () => {
const manager = new AgentManager();
expect(manager.getStatus()).toBe("idle");
@@ -416,4 +484,178 @@ describe("AgentManager", () => {
metadata: { anthropic: { signature: "mock-sig" } },
});
});
+
+ // ─── History pre-population on Agent (re)construction ────────────
+ //
+ // These tests guard the fix that prior conversation turns survive
+ // switching models mid-conversation via the sidebar slider. Without
+ // it, a fresh `Agent` is constructed with `messages: []` and the
+ // next LLM call sees zero prior context.
+
+ it("pre-populates Agent.messages from DB history when constructing a fresh Agent", async () => {
+ const manager = new AgentManager();
+ const tabId = "tab-history";
+
+ // Simulate prior conversation in the DB:
+ // u1, a1, u_current
+ // (the current turn's user message has already been appended
+ // by `processMessage` before `getOrCreateAgentForTab` runs)
+ setFakeMessages(tabId, [
+ makeRow(tabId, 0, "user", [{ type: "text", text: "first question" }]),
+ makeRow(tabId, 1, "assistant", [{ type: "text", text: "first answer" }]),
+ makeRow(tabId, 2, "user", [{ type: "text", text: "follow-up" }]),
+ ]);
+
+ await manager.processMessage(tabId, "follow-up");
+
+ // Exactly one Agent should have been constructed for this tab,
+ // and its messages must be the prior two rows (excluding the
+ // current user message — `Agent.run()` pushes that itself).
+ expect(constructedAgents.length).toBe(1);
+ const inst = constructedAgents[0];
+ expect(inst).toBeDefined();
+ if (!inst) return;
+ const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+ expect(init.length).toBe(2);
+ expect(init[0]).toMatchObject({
+ role: "user",
+ chunks: [{ type: "text", text: "first question" }],
+ });
+ expect(init[1]).toMatchObject({
+ role: "assistant",
+ chunks: [{ type: "text", text: "first answer" }],
+ });
+ });
+
+ it("leaves messages empty when the DB has only the current turn's user message (first turn)", async () => {
+ const manager = new AgentManager();
+ const tabId = "tab-first-turn";
+
+ // First-ever turn: DB has only the just-appended user message.
+ setFakeMessages(tabId, [makeRow(tabId, 0, "user", [{ type: "text", text: "hello" }])]);
+
+ await manager.processMessage(tabId, "hello");
+
+ expect(constructedAgents.length).toBe(1);
+ const inst = constructedAgents[0];
+ expect(inst).toBeDefined();
+ if (!inst) return;
+ // The user message at idx 0 is the current turn — must be excluded.
+ expect((inst.initialMessages as unknown[]).length).toBe(0);
+ });
+
+ it("excludes a partial assistant trail from a prior fallback attempt", async () => {
+ const manager = new AgentManager();
+ const tabId = "tab-fallback-partial";
+
+ // Scenario: the agent-mode fallback path. Attempt 1 (Opus) errored
+ // mid-stream after flushing some chunks; attempt 2 (DeepSeek) is
+ // about to start. DB looks like:
+ // u1, a1, u_current, partial_a_attempt1
+ // The fresh Agent for attempt 2 must see [u1, a1] — not the
+ // current user message and not the failed attempt's partial.
+ setFakeMessages(tabId, [
+ makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]),
+ makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]),
+ makeRow(tabId, 2, "user", [{ type: "text", text: "q2" }]),
+ makeRow(tabId, 3, "assistant", [{ type: "text", text: "half-baked..." }]),
+ ]);
+
+ await manager.processMessage(tabId, "q2");
+
+ expect(constructedAgents.length).toBe(1);
+ const inst = constructedAgents[0];
+ expect(inst).toBeDefined();
+ if (!inst) return;
+ const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+ expect(init.length).toBe(2);
+ expect(init[0]).toMatchObject({ role: "user", chunks: [{ type: "text", text: "q1" }] });
+ expect(init[1]).toMatchObject({ role: "assistant", chunks: [{ type: "text", text: "a1" }] });
+ });
+
+ it("preserves system-role rows in pre-populated history (toModelMessages filters them later)", async () => {
+ const manager = new AgentManager();
+ const tabId = "tab-with-system-rows";
+
+ setFakeMessages(tabId, [
+ makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]),
+ makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]),
+ makeRow(tabId, 2, "system", [
+ { type: "system", kind: "config-reload", text: "Configuration reloaded" },
+ ]),
+ makeRow(tabId, 3, "user", [{ type: "text", text: "q2" }]),
+ ]);
+
+ await manager.processMessage(tabId, "q2");
+
+ expect(constructedAgents.length).toBe(1);
+ const inst = constructedAgents[0];
+ expect(inst).toBeDefined();
+ if (!inst) return;
+ const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+ // All three prior rows (user/assistant/system) preserved; the
+ // LLM-facing `toModelMessages` strips the system row later.
+ expect(init.length).toBe(3);
+ expect(init[2]).toMatchObject({ role: "system" });
+ });
+
+ it("survives a getMessagesForTab failure without crashing (messages stays empty)", async () => {
+ const manager = new AgentManager();
+ const tabId = "tab-db-error";
+
+ // Simulate DB error by stubbing the fake-store getter to throw
+ // for this specific tab. We use a Proxy on the Map's get method
+ // for the duration of one call.
+ const realGet = fakeMessagesByTab.get.bind(fakeMessagesByTab);
+ fakeMessagesByTab.get = ((key: string) => {
+ if (key === tabId) throw new Error("simulated DB error");
+ return realGet(key);
+ }) as typeof fakeMessagesByTab.get;
+
+ try {
+ await expect(manager.processMessage(tabId, "anything")).resolves.toBeUndefined();
+ } finally {
+ fakeMessagesByTab.get = realGet;
+ }
+
+ // Agent still constructed, just with empty messages.
+ expect(constructedAgents.length).toBe(1);
+ const inst = constructedAgents[0];
+ expect(inst).toBeDefined();
+ if (!inst) return;
+ expect((inst.initialMessages as unknown[]).length).toBe(0);
+ });
+
+ it("reloads history on every Agent reconstruction (simulated model switch)", async () => {
+ const manager = new AgentManager();
+ const tabId = "tab-model-switch";
+
+ // Turn 1: empty DB → just the first user message.
+ setFakeMessages(tabId, [makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }])]);
+ await manager.processMessage(tabId, "q1", "key-opus", "claude-opus-4-7");
+
+ // Turn 2: DB now has the full prior turn + new user message.
+ // User has switched models via the sidebar slider — different
+ // (keyId, modelId) triggers Agent invalidation and reconstruction.
+ setFakeMessages(tabId, [
+ makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]),
+ makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]),
+ makeRow(tabId, 2, "user", [{ type: "text", text: "q2" }]),
+ ]);
+ await manager.processMessage(tabId, "q2", "key-deepseek", "deepseek-v3");
+
+ // Exactly two Agents constructed across the two turns (the
+ // invalidation gate fires when keyId/modelId change).
+ expect(constructedAgents.length).toBe(2);
+
+ // Second Agent (the DeepSeek one) was pre-populated with the
+ // completed first turn — not empty, not duplicating q2.
+ const second = constructedAgents[1];
+ expect(second).toBeDefined();
+ if (!second) return;
+ const init = second.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+ expect(init.length).toBe(2);
+ expect(init[0]).toMatchObject({ role: "user", chunks: [{ type: "text", text: "q1" }] });
+ expect(init[1]).toMatchObject({ role: "assistant", chunks: [{ type: "text", text: "a1" }] });
+ });
});