fix(api): pre-populate Agent.messages from DB on construction so model switches preserve chat history

Before this change, swapping the model mid-conversation via the sidebar slider lost all prior turns: the new model saw only the current user message and treated the conversation as brand-new. Root cause: `getOrCreateAgentForTab` invalidates the cached Agent (`tabAgent.agent = null`) whenever the effective keyId, modelId, permission key, or working directory differs from the cached values. The replacement Agent was then constructed with `messages: []` and the post-construction step that loads prior turns from the SQLite `messages` table simply did not exist. `processMessage` had already appended the current turn's user message to the DB (line 960) before calling `getOrCreateAgentForTab` (line 1015), so the DB held the full context — it was just never read. Fix: after every `new Agent(...)` in `getOrCreateAgentForTab`, call `getMessagesForTab(tabId)`, walk backwards to the most recent user-role row, and assign all strictly-prior rows to `tabAgent.agent.messages`. The walk-backwards strategy correctly handles two boundary cases: 1. Simple model switch — last DB row is the current user message; drop it (`Agent.run()` will push it again at line 546). 2. Agent-mode auto-fallback retry — last DB row may be a partial assistant response flushed by the previous failed attempt; we drop both that and the current user message in one pass. System-role rows (config-reload notices, etc.) are preserved verbatim; `toModelMessages` already strips them before the wire payload, so this is safe. The fix covers every Agent-reconstruction trigger, not just the model slider: - Sidebar model/key change (the reported case) - Permission setting change - Working-directory change (`processMessage` line 951) - dispatch.toml config-watcher reload (lines 236–237) - Skills directory watcher reload (lines 249–250) - `stopTab` after user cancellation (line 775) If `getMessagesForTab` throws (e.g. DB locked, schema mismatch), we swallow the error and leave `messages: []` — matching pre-fix behaviour for that case so this commit never regresses. Tests (+6 in `packages/api/tests/agent-manager.test.ts`, total 26): - pre-populates Agent.messages from DB history - leaves messages empty when DB has only the current turn (first msg) - excludes a partial assistant trail from a prior fallback attempt - preserves system-role rows in pre-populated history - survives a getMessagesForTab failure without crashing - reloads history on every Agent reconstruction (simulated slider switch from Opus to DeepSeek across two processMessage calls) The test rig was extended with: - `fakeMessagesByTab` map + `setFakeMessages` helper letting tests inject arbitrary DB rows for the mocked `getMessagesForTab`. - `constructedAgents` array captured at `run()` entry (not at construction) so each test sees the post-pre-populate snapshot; the production code reassigns `agent.messages` after `new Agent()` returns, so capturing at construction yielded a stale empty array. - Pluggable `runImpl` hook for tests that want a custom event stream (not yet exercised; staged for the next round of agent-mode fallback tests). Totals: 229 tests across 3 packages all green; typecheck clean on core + api + frontend; biome clean across 124 files.
author: Adam Malczewski <[email protected]> 2026-05-28 07:15:32 +0900
committer: Adam Malczewski <[email protected]> 2026-05-28 07:15:32 +0900
commit: 2f14260bb0f1a51d51e516feda285b68f793ae1b (patch)
tree: f366e3576c65b0e11d81fca0e84714771ebf83ce
parent: 8b17d929e70a43749fd962554214bf8ba3e9380f (diff)
download: dispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.tar.gz
dispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.zip
2 files changed, 320 insertions, 29 deletions
diff --git a/packages/api/src/agent-manager.ts b/packages/api/src/agent-manager.ts
index 6e03adb..efdd732 100644
--- a/packages/api/src/agent-manager.ts
+++ b/packages/api/src/agent-manager.ts
@@ -654,6 +654,55 @@ export class AgentManager {
 					waitForQueuedMessage: () => this.waitForQueuedMessage(tabId),
 				},
 			);
+
+			// Pre-populate the Agent's in-memory message history from the DB
+			// so prior turns survive Agent recreation. The Agent is
+			// constructed fresh here in three scenarios that ALL discard
+			// the previous in-memory `messages` array:
+			//   1. First call for this tab (no prior Agent existed)
+			//   2. Model/key/permission/working-directory change — the
+			//      invalidation gate above set `tabAgent.agent = null`.
+			//      This is the model-switcher-slider case: without this
+			//      pre-population, DeepSeek would see zero context after
+			//      switching from Opus mid-conversation.
+			//   3. Config or skills reload (configWatcher / skillsWatcher
+			//      also null out `tabAgent.agent`).
+			//
+			// Boundary semantics: `processMessage` calls `appendMessage`
+			// for the current turn's user message BEFORE calling this
+			// function, so the DB ends in `[..., u_current]`. In the
+			// fallback retry path (agent-mode automatic model fallback),
+			// the previous attempt may also have flushed a partial
+			// assistant response, so the DB ends in
+			// `[..., u_current, partial_a]`. Either way, we walk
+			// backwards to the most recent user-role row and load only
+			// strictly-prior rows: `agent.run()` will push the current
+			// user message itself at agent.ts:546, so including it here
+			// would duplicate it.
+			//
+			// `toModelMessages` already filters out `role === "system"`
+			// rows and strips `error` / `system` chunks, so it's safe to
+			// load system messages verbatim.
+			try {
+				const rows = getMessagesForTab(tabId);
+				let cutIdx = rows.length;
+				for (let i = rows.length - 1; i >= 0; i--) {
+					const row = rows[i];
+					if (row && row.role === "user") {
+						cutIdx = i;
+						break;
+					}
+				}
+				if (cutIdx > 0) {
+					tabAgent.agent.messages = rows
+						.slice(0, cutIdx)
+						.map((r) => ({ role: r.role, chunks: r.chunks }));
+				}
+			} catch {
+				// DB read failed — leave `messages: []`. The agent still
+				// works, just without prior history (matches pre-fix
+				// behaviour, so this is no worse than what we had before).
+			}
 		}
 		return tabAgent.agent;
 	}
diff --git a/packages/api/tests/agent-manager.test.ts b/packages/api/tests/agent-manager.test.ts
index ea8dc46..71d43d8 100644
--- a/packages/api/tests/agent-manager.test.ts
+++ b/packages/api/tests/agent-manager.test.ts
@@ -1,42 +1,103 @@
 import type { AgentEvent, ToolDefinition } from "@dispatch/core";
-import { describe, expect, it, vi } from "vitest";
+import { beforeEach, describe, expect, it, vi } from "vitest";
 
 // Spy on appendEventToChunks so we can assert persistence calls
 const appendEventToChunksSpy = vi.fn((_chunks: unknown[], _event: unknown) => {
 	// no-op; we inspect calls in tests
 });
 
+// Configurable stub for `getMessagesForTab`. Tests can push rows
+// before invoking `processMessage` to simulate prior conversation
+// history persisted in the DB (model-switch / history-replay path).
+interface FakeMessageRow {
+	id: string;
+	tabId: string;
+	seq: number;
+	role: "user" | "assistant" | "system";
+	chunks: unknown[];
+	createdAt: number;
+}
+const fakeMessagesByTab = new Map<string, FakeMessageRow[]>();
+function resetFakeMessages(): void {
+	fakeMessagesByTab.clear();
+}
+function setFakeMessages(tabId: string, rows: FakeMessageRow[]): void {
+	fakeMessagesByTab.set(tabId, rows);
+}
+function makeRow(
+	tabId: string,
+	seq: number,
+	role: "user" | "assistant" | "system",
+	chunks: unknown[],
+): FakeMessageRow {
+	return { id: `msg-${tabId}-${seq}`, tabId, seq, role, chunks, createdAt: seq };
+}
+
+// Hook into Agent construction so tests can assert what
+// `messages` was pre-populated with at the moment `run()` was
+// called (after the post-construction pre-populate step in
+// `getOrCreateAgentForTab` has had a chance to assign).
+//
+// We snapshot at `run()` invocation rather than at construction
+// because the production code reassigns `agent.messages =
+// rows.slice(...)` AFTER `new Agent()` returns — capturing a
+// reference at construction would yield a stale empty array.
+const constructedAgents: Array<{ initialMessages: unknown[] }> = [];
+function resetConstructedAgents(): void {
+	constructedAgents.length = 0;
+}
+
+// Allow tests to swap in a custom `run` generator (e.g. to simulate
+// a fallback failure mid-stream). Returning to undefined restores
+// the default.
+type RunGen = (msg: string) => AsyncGenerator<unknown>;
+let runImpl: RunGen | null = null;
+function setRunImpl(impl: RunGen | null): void {
+	runImpl = impl;
+}
+async function* defaultRun(_message: string): AsyncGenerator<unknown> {
+	yield { type: "status", status: "running" } as const;
+	await new Promise<void>((r) => setTimeout(r, 10));
+	yield { type: "reasoning-delta", delta: "thinking about it" } as const;
+	yield {
+		type: "reasoning-end",
+		metadata: { anthropic: { signature: "mock-sig" } },
+	} as const;
+	yield { type: "text-delta", delta: "Hello " } as const;
+	yield { type: "text-delta", delta: "world" } as const;
+	yield {
+		type: "done",
+		message: {
+			role: "assistant",
+			chunks: [
+				{
+					type: "thinking",
+					text: "thinking about it",
+					metadata: { anthropic: { signature: "mock-sig" } },
+				},
+				{ type: "text", text: "Hello world" },
+			],
+		},
+	} as const;
+	yield { type: "status", status: "idle" } as const;
+}
+
 // Mock @dispatch/core's Agent to avoid real LLM calls
 vi.mock("@dispatch/core", () => ({
 	Agent: class MockAgent {
 		status = "idle";
 		messages: unknown[] = [];
-		async *run(_message: string) {
-			yield { type: "status", status: "running" } as const;
-			await new Promise<void>((r) => setTimeout(r, 10));
-			// v6-style reasoning turn: delta(s) then end with providerMetadata
-			yield { type: "reasoning-delta", delta: "thinking about it" } as const;
-			yield {
-				type: "reasoning-end",
-				metadata: { anthropic: { signature: "mock-sig" } },
-			} as const;
-			yield { type: "text-delta", delta: "Hello " } as const;
-			yield { type: "text-delta", delta: "world" } as const;
-			yield {
-				type: "done",
-				message: {
-					role: "assistant",
-					chunks: [
-						{
-							type: "thinking",
-							text: "thinking about it",
-							metadata: { anthropic: { signature: "mock-sig" } },
-						},
-						{ type: "text", text: "Hello world" },
-					],
-				},
-			} as const;
-			yield { type: "status", status: "idle" } as const;
+		async *run(message: string): AsyncGenerator<unknown> {
+			// Snapshot the post-construction pre-populated message list
+			// the first thing `run()` does, before the real `Agent.run`
+			// would push the current user message at line 546. Tests
+			// inspect this to verify history was loaded correctly.
+			constructedAgents.push({ initialMessages: [...this.messages] });
+			if (runImpl) {
+				for await (const ev of runImpl(message)) yield ev;
+				return;
+			}
+			for await (const ev of defaultRun(message)) yield ev;
 		}
 	},
 	PermissionService: class MockPermissionService {
@@ -200,8 +261,8 @@ vi.mock("@dispatch/core", () => ({
 	},
 	appendMessage() {},
 	updateMessage() {},
-	getMessagesForTab() {
-		return [];
+	getMessagesForTab(tabId: string) {
+		return fakeMessagesByTab.get(tabId) ?? [];
 	},
 	appendEventToChunks: appendEventToChunksSpy,
 	applySystemEvent(_messages: unknown[], _event: unknown) {
@@ -245,6 +306,13 @@ vi.mock("@dispatch/core", () => ({
 const { AgentManager } = await import("../src/agent-manager.js");
 
 describe("AgentManager", () => {
+	beforeEach(() => {
+		resetFakeMessages();
+		resetConstructedAgents();
+		setRunImpl(null);
+		appendEventToChunksSpy.mockClear();
+	});
+
 	it("initial status is idle", () => {
 		const manager = new AgentManager();
 		expect(manager.getStatus()).toBe("idle");
@@ -416,4 +484,178 @@ describe("AgentManager", () => {
 			metadata: { anthropic: { signature: "mock-sig" } },
 		});
 	});
+
+	// ─── History pre-population on Agent (re)construction ────────────
+	//
+	// These tests guard the fix that prior conversation turns survive
+	// switching models mid-conversation via the sidebar slider. Without
+	// it, a fresh `Agent` is constructed with `messages: []` and the
+	// next LLM call sees zero prior context.
+
+	it("pre-populates Agent.messages from DB history when constructing a fresh Agent", async () => {
+		const manager = new AgentManager();
+		const tabId = "tab-history";
+
+		// Simulate prior conversation in the DB:
+		//   u1, a1, u_current
+		// (the current turn's user message has already been appended
+		//  by `processMessage` before `getOrCreateAgentForTab` runs)
+		setFakeMessages(tabId, [
+			makeRow(tabId, 0, "user", [{ type: "text", text: "first question" }]),
+			makeRow(tabId, 1, "assistant", [{ type: "text", text: "first answer" }]),
+			makeRow(tabId, 2, "user", [{ type: "text", text: "follow-up" }]),
+		]);
+
+		await manager.processMessage(tabId, "follow-up");
+
+		// Exactly one Agent should have been constructed for this tab,
+		// and its messages must be the prior two rows (excluding the
+		// current user message — `Agent.run()` pushes that itself).
+		expect(constructedAgents.length).toBe(1);
+		const inst = constructedAgents[0];
+		expect(inst).toBeDefined();
+		if (!inst) return;
+		const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+		expect(init.length).toBe(2);
+		expect(init[0]).toMatchObject({
+			role: "user",
+			chunks: [{ type: "text", text: "first question" }],
+		});
+		expect(init[1]).toMatchObject({
+			role: "assistant",
+			chunks: [{ type: "text", text: "first answer" }],
+		});
+	});
+
+	it("leaves messages empty when the DB has only the current turn's user message (first turn)", async () => {
+		const manager = new AgentManager();
+		const tabId = "tab-first-turn";
+
+		// First-ever turn: DB has only the just-appended user message.
+		setFakeMessages(tabId, [makeRow(tabId, 0, "user", [{ type: "text", text: "hello" }])]);
+
+		await manager.processMessage(tabId, "hello");
+
+		expect(constructedAgents.length).toBe(1);
+		const inst = constructedAgents[0];
+		expect(inst).toBeDefined();
+		if (!inst) return;
+		// The user message at idx 0 is the current turn — must be excluded.
+		expect((inst.initialMessages as unknown[]).length).toBe(0);
+	});
+
+	it("excludes a partial assistant trail from a prior fallback attempt", async () => {
+		const manager = new AgentManager();
+		const tabId = "tab-fallback-partial";
+
+		// Scenario: the agent-mode fallback path. Attempt 1 (Opus) errored
+		// mid-stream after flushing some chunks; attempt 2 (DeepSeek) is
+		// about to start. DB looks like:
+		//   u1, a1, u_current, partial_a_attempt1
+		// The fresh Agent for attempt 2 must see [u1, a1] — not the
+		// current user message and not the failed attempt's partial.
+		setFakeMessages(tabId, [
+			makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]),
+			makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]),
+			makeRow(tabId, 2, "user", [{ type: "text", text: "q2" }]),
+			makeRow(tabId, 3, "assistant", [{ type: "text", text: "half-baked..." }]),
+		]);
+
+		await manager.processMessage(tabId, "q2");
+
+		expect(constructedAgents.length).toBe(1);
+		const inst = constructedAgents[0];
+		expect(inst).toBeDefined();
+		if (!inst) return;
+		const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+		expect(init.length).toBe(2);
+		expect(init[0]).toMatchObject({ role: "user", chunks: [{ type: "text", text: "q1" }] });
+		expect(init[1]).toMatchObject({ role: "assistant", chunks: [{ type: "text", text: "a1" }] });
+	});
+
+	it("preserves system-role rows in pre-populated history (toModelMessages filters them later)", async () => {
+		const manager = new AgentManager();
+		const tabId = "tab-with-system-rows";
+
+		setFakeMessages(tabId, [
+			makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]),
+			makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]),
+			makeRow(tabId, 2, "system", [
+				{ type: "system", kind: "config-reload", text: "Configuration reloaded" },
+			]),
+			makeRow(tabId, 3, "user", [{ type: "text", text: "q2" }]),
+		]);
+
+		await manager.processMessage(tabId, "q2");
+
+		expect(constructedAgents.length).toBe(1);
+		const inst = constructedAgents[0];
+		expect(inst).toBeDefined();
+		if (!inst) return;
+		const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+		// All three prior rows (user/assistant/system) preserved; the
+		// LLM-facing `toModelMessages` strips the system row later.
+		expect(init.length).toBe(3);
+		expect(init[2]).toMatchObject({ role: "system" });
+	});
+
+	it("survives a getMessagesForTab failure without crashing (messages stays empty)", async () => {
+		const manager = new AgentManager();
+		const tabId = "tab-db-error";
+
+		// Simulate DB error by stubbing the fake-store getter to throw
+		// for this specific tab. We use a Proxy on the Map's get method
+		// for the duration of one call.
+		const realGet = fakeMessagesByTab.get.bind(fakeMessagesByTab);
+		fakeMessagesByTab.get = ((key: string) => {
+			if (key === tabId) throw new Error("simulated DB error");
+			return realGet(key);
+		}) as typeof fakeMessagesByTab.get;
+
+		try {
+			await expect(manager.processMessage(tabId, "anything")).resolves.toBeUndefined();
+		} finally {
+			fakeMessagesByTab.get = realGet;
+		}
+
+		// Agent still constructed, just with empty messages.
+		expect(constructedAgents.length).toBe(1);
+		const inst = constructedAgents[0];
+		expect(inst).toBeDefined();
+		if (!inst) return;
+		expect((inst.initialMessages as unknown[]).length).toBe(0);
+	});
+
+	it("reloads history on every Agent reconstruction (simulated model switch)", async () => {
+		const manager = new AgentManager();
+		const tabId = "tab-model-switch";
+
+		// Turn 1: empty DB → just the first user message.
+		setFakeMessages(tabId, [makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }])]);
+		await manager.processMessage(tabId, "q1", "key-opus", "claude-opus-4-7");
+
+		// Turn 2: DB now has the full prior turn + new user message.
+		// User has switched models via the sidebar slider — different
+		// (keyId, modelId) triggers Agent invalidation and reconstruction.
+		setFakeMessages(tabId, [
+			makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]),
+			makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]),
+			makeRow(tabId, 2, "user", [{ type: "text", text: "q2" }]),
+		]);
+		await manager.processMessage(tabId, "q2", "key-deepseek", "deepseek-v3");
+
+		// Exactly two Agents constructed across the two turns (the
+		// invalidation gate fires when keyId/modelId change).
+		expect(constructedAgents.length).toBe(2);
+
+		// Second Agent (the DeepSeek one) was pre-populated with the
+		// completed first turn — not empty, not duplicating q2.
+		const second = constructedAgents[1];
+		expect(second).toBeDefined();
+		if (!second) return;
+		const init = second.initialMessages as Array<{ role: string; chunks: unknown[] }>;
+		expect(init.length).toBe(2);
+		expect(init[0]).toMatchObject({ role: "user", chunks: [{ type: "text", text: "q1" }] });
+		expect(init[1]).toMatchObject({ role: "assistant", chunks: [{ type: "text", text: "a1" }] });
+	});
 });
author	Adam Malczewski <[email protected]>	2026-05-28 07:15:32 +0900
committer	Adam Malczewski <[email protected]>	2026-05-28 07:15:32 +0900
commit	2f14260bb0f1a51d51e516feda285b68f793ae1b (patch)
tree	f366e3576c65b0e11d81fca0e84714771ebf83ce
parent	8b17d929e70a43749fd962554214bf8ba3e9380f (diff)
download	dispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.tar.gz dispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.zip