diff options
| author | Adam Malczewski <[email protected]> | 2026-05-28 07:15:32 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-05-28 07:15:32 +0900 |
| commit | 2f14260bb0f1a51d51e516feda285b68f793ae1b (patch) | |
| tree | f366e3576c65b0e11d81fca0e84714771ebf83ce | |
| parent | 8b17d929e70a43749fd962554214bf8ba3e9380f (diff) | |
| download | dispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.tar.gz dispatch-2f14260bb0f1a51d51e516feda285b68f793ae1b.zip | |
fix(api): pre-populate Agent.messages from DB on construction so model switches preserve chat history
Before this change, swapping the model mid-conversation via the sidebar
slider lost all prior turns: the new model saw only the current user
message and treated the conversation as brand-new.
Root cause: `getOrCreateAgentForTab` invalidates the cached Agent
(`tabAgent.agent = null`) whenever the effective keyId, modelId,
permission key, or working directory differs from the cached values.
The replacement Agent was then constructed with `messages: []` and
the post-construction step that loads prior turns from the SQLite
`messages` table simply did not exist. `processMessage` had already
appended the current turn's user message to the DB (line 960) before
calling `getOrCreateAgentForTab` (line 1015), so the DB held the full
context — it was just never read.
Fix: after every `new Agent(...)` in `getOrCreateAgentForTab`, call
`getMessagesForTab(tabId)`, walk backwards to the most recent user-role
row, and assign all strictly-prior rows to `tabAgent.agent.messages`.
The walk-backwards strategy correctly handles two boundary cases:
1. Simple model switch — last DB row is the current user message;
drop it (`Agent.run()` will push it again at line 546).
2. Agent-mode auto-fallback retry — last DB row may be a partial
assistant response flushed by the previous failed attempt; we
drop both that and the current user message in one pass.
System-role rows (config-reload notices, etc.) are preserved verbatim;
`toModelMessages` already strips them before the wire payload, so this
is safe.
The fix covers every Agent-reconstruction trigger, not just the model
slider:
- Sidebar model/key change (the reported case)
- Permission setting change
- Working-directory change (`processMessage` line 951)
- dispatch.toml config-watcher reload (lines 236–237)
- Skills directory watcher reload (lines 249–250)
- `stopTab` after user cancellation (line 775)
If `getMessagesForTab` throws (e.g. DB locked, schema mismatch), we
swallow the error and leave `messages: []` — matching pre-fix
behaviour for that case so this commit never regresses.
Tests (+6 in `packages/api/tests/agent-manager.test.ts`, total 26):
- pre-populates Agent.messages from DB history
- leaves messages empty when DB has only the current turn (first msg)
- excludes a partial assistant trail from a prior fallback attempt
- preserves system-role rows in pre-populated history
- survives a getMessagesForTab failure without crashing
- reloads history on every Agent reconstruction (simulated slider
switch from Opus to DeepSeek across two processMessage calls)
The test rig was extended with:
- `fakeMessagesByTab` map + `setFakeMessages` helper letting tests
inject arbitrary DB rows for the mocked `getMessagesForTab`.
- `constructedAgents` array captured at `run()` entry (not at
construction) so each test sees the post-pre-populate snapshot;
the production code reassigns `agent.messages` after `new Agent()`
returns, so capturing at construction yielded a stale empty array.
- Pluggable `runImpl` hook for tests that want a custom event stream
(not yet exercised; staged for the next round of agent-mode
fallback tests).
Totals: 229 tests across 3 packages all green; typecheck clean on
core + api + frontend; biome clean across 124 files.
| -rw-r--r-- | packages/api/src/agent-manager.ts | 49 | ||||
| -rw-r--r-- | packages/api/tests/agent-manager.test.ts | 300 |
2 files changed, 320 insertions, 29 deletions
diff --git a/packages/api/src/agent-manager.ts b/packages/api/src/agent-manager.ts index 6e03adb..efdd732 100644 --- a/packages/api/src/agent-manager.ts +++ b/packages/api/src/agent-manager.ts @@ -654,6 +654,55 @@ export class AgentManager { waitForQueuedMessage: () => this.waitForQueuedMessage(tabId), }, ); + + // Pre-populate the Agent's in-memory message history from the DB + // so prior turns survive Agent recreation. The Agent is + // constructed fresh here in three scenarios that ALL discard + // the previous in-memory `messages` array: + // 1. First call for this tab (no prior Agent existed) + // 2. Model/key/permission/working-directory change — the + // invalidation gate above set `tabAgent.agent = null`. + // This is the model-switcher-slider case: without this + // pre-population, DeepSeek would see zero context after + // switching from Opus mid-conversation. + // 3. Config or skills reload (configWatcher / skillsWatcher + // also null out `tabAgent.agent`). + // + // Boundary semantics: `processMessage` calls `appendMessage` + // for the current turn's user message BEFORE calling this + // function, so the DB ends in `[..., u_current]`. In the + // fallback retry path (agent-mode automatic model fallback), + // the previous attempt may also have flushed a partial + // assistant response, so the DB ends in + // `[..., u_current, partial_a]`. Either way, we walk + // backwards to the most recent user-role row and load only + // strictly-prior rows: `agent.run()` will push the current + // user message itself at agent.ts:546, so including it here + // would duplicate it. + // + // `toModelMessages` already filters out `role === "system"` + // rows and strips `error` / `system` chunks, so it's safe to + // load system messages verbatim. + try { + const rows = getMessagesForTab(tabId); + let cutIdx = rows.length; + for (let i = rows.length - 1; i >= 0; i--) { + const row = rows[i]; + if (row && row.role === "user") { + cutIdx = i; + break; + } + } + if (cutIdx > 0) { + tabAgent.agent.messages = rows + .slice(0, cutIdx) + .map((r) => ({ role: r.role, chunks: r.chunks })); + } + } catch { + // DB read failed — leave `messages: []`. The agent still + // works, just without prior history (matches pre-fix + // behaviour, so this is no worse than what we had before). + } } return tabAgent.agent; } diff --git a/packages/api/tests/agent-manager.test.ts b/packages/api/tests/agent-manager.test.ts index ea8dc46..71d43d8 100644 --- a/packages/api/tests/agent-manager.test.ts +++ b/packages/api/tests/agent-manager.test.ts @@ -1,42 +1,103 @@ import type { AgentEvent, ToolDefinition } from "@dispatch/core"; -import { describe, expect, it, vi } from "vitest"; +import { beforeEach, describe, expect, it, vi } from "vitest"; // Spy on appendEventToChunks so we can assert persistence calls const appendEventToChunksSpy = vi.fn((_chunks: unknown[], _event: unknown) => { // no-op; we inspect calls in tests }); +// Configurable stub for `getMessagesForTab`. Tests can push rows +// before invoking `processMessage` to simulate prior conversation +// history persisted in the DB (model-switch / history-replay path). +interface FakeMessageRow { + id: string; + tabId: string; + seq: number; + role: "user" | "assistant" | "system"; + chunks: unknown[]; + createdAt: number; +} +const fakeMessagesByTab = new Map<string, FakeMessageRow[]>(); +function resetFakeMessages(): void { + fakeMessagesByTab.clear(); +} +function setFakeMessages(tabId: string, rows: FakeMessageRow[]): void { + fakeMessagesByTab.set(tabId, rows); +} +function makeRow( + tabId: string, + seq: number, + role: "user" | "assistant" | "system", + chunks: unknown[], +): FakeMessageRow { + return { id: `msg-${tabId}-${seq}`, tabId, seq, role, chunks, createdAt: seq }; +} + +// Hook into Agent construction so tests can assert what +// `messages` was pre-populated with at the moment `run()` was +// called (after the post-construction pre-populate step in +// `getOrCreateAgentForTab` has had a chance to assign). +// +// We snapshot at `run()` invocation rather than at construction +// because the production code reassigns `agent.messages = +// rows.slice(...)` AFTER `new Agent()` returns — capturing a +// reference at construction would yield a stale empty array. +const constructedAgents: Array<{ initialMessages: unknown[] }> = []; +function resetConstructedAgents(): void { + constructedAgents.length = 0; +} + +// Allow tests to swap in a custom `run` generator (e.g. to simulate +// a fallback failure mid-stream). Returning to undefined restores +// the default. +type RunGen = (msg: string) => AsyncGenerator<unknown>; +let runImpl: RunGen | null = null; +function setRunImpl(impl: RunGen | null): void { + runImpl = impl; +} +async function* defaultRun(_message: string): AsyncGenerator<unknown> { + yield { type: "status", status: "running" } as const; + await new Promise<void>((r) => setTimeout(r, 10)); + yield { type: "reasoning-delta", delta: "thinking about it" } as const; + yield { + type: "reasoning-end", + metadata: { anthropic: { signature: "mock-sig" } }, + } as const; + yield { type: "text-delta", delta: "Hello " } as const; + yield { type: "text-delta", delta: "world" } as const; + yield { + type: "done", + message: { + role: "assistant", + chunks: [ + { + type: "thinking", + text: "thinking about it", + metadata: { anthropic: { signature: "mock-sig" } }, + }, + { type: "text", text: "Hello world" }, + ], + }, + } as const; + yield { type: "status", status: "idle" } as const; +} + // Mock @dispatch/core's Agent to avoid real LLM calls vi.mock("@dispatch/core", () => ({ Agent: class MockAgent { status = "idle"; messages: unknown[] = []; - async *run(_message: string) { - yield { type: "status", status: "running" } as const; - await new Promise<void>((r) => setTimeout(r, 10)); - // v6-style reasoning turn: delta(s) then end with providerMetadata - yield { type: "reasoning-delta", delta: "thinking about it" } as const; - yield { - type: "reasoning-end", - metadata: { anthropic: { signature: "mock-sig" } }, - } as const; - yield { type: "text-delta", delta: "Hello " } as const; - yield { type: "text-delta", delta: "world" } as const; - yield { - type: "done", - message: { - role: "assistant", - chunks: [ - { - type: "thinking", - text: "thinking about it", - metadata: { anthropic: { signature: "mock-sig" } }, - }, - { type: "text", text: "Hello world" }, - ], - }, - } as const; - yield { type: "status", status: "idle" } as const; + async *run(message: string): AsyncGenerator<unknown> { + // Snapshot the post-construction pre-populated message list + // the first thing `run()` does, before the real `Agent.run` + // would push the current user message at line 546. Tests + // inspect this to verify history was loaded correctly. + constructedAgents.push({ initialMessages: [...this.messages] }); + if (runImpl) { + for await (const ev of runImpl(message)) yield ev; + return; + } + for await (const ev of defaultRun(message)) yield ev; } }, PermissionService: class MockPermissionService { @@ -200,8 +261,8 @@ vi.mock("@dispatch/core", () => ({ }, appendMessage() {}, updateMessage() {}, - getMessagesForTab() { - return []; + getMessagesForTab(tabId: string) { + return fakeMessagesByTab.get(tabId) ?? []; }, appendEventToChunks: appendEventToChunksSpy, applySystemEvent(_messages: unknown[], _event: unknown) { @@ -245,6 +306,13 @@ vi.mock("@dispatch/core", () => ({ const { AgentManager } = await import("../src/agent-manager.js"); describe("AgentManager", () => { + beforeEach(() => { + resetFakeMessages(); + resetConstructedAgents(); + setRunImpl(null); + appendEventToChunksSpy.mockClear(); + }); + it("initial status is idle", () => { const manager = new AgentManager(); expect(manager.getStatus()).toBe("idle"); @@ -416,4 +484,178 @@ describe("AgentManager", () => { metadata: { anthropic: { signature: "mock-sig" } }, }); }); + + // ─── History pre-population on Agent (re)construction ──────────── + // + // These tests guard the fix that prior conversation turns survive + // switching models mid-conversation via the sidebar slider. Without + // it, a fresh `Agent` is constructed with `messages: []` and the + // next LLM call sees zero prior context. + + it("pre-populates Agent.messages from DB history when constructing a fresh Agent", async () => { + const manager = new AgentManager(); + const tabId = "tab-history"; + + // Simulate prior conversation in the DB: + // u1, a1, u_current + // (the current turn's user message has already been appended + // by `processMessage` before `getOrCreateAgentForTab` runs) + setFakeMessages(tabId, [ + makeRow(tabId, 0, "user", [{ type: "text", text: "first question" }]), + makeRow(tabId, 1, "assistant", [{ type: "text", text: "first answer" }]), + makeRow(tabId, 2, "user", [{ type: "text", text: "follow-up" }]), + ]); + + await manager.processMessage(tabId, "follow-up"); + + // Exactly one Agent should have been constructed for this tab, + // and its messages must be the prior two rows (excluding the + // current user message — `Agent.run()` pushes that itself). + expect(constructedAgents.length).toBe(1); + const inst = constructedAgents[0]; + expect(inst).toBeDefined(); + if (!inst) return; + const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>; + expect(init.length).toBe(2); + expect(init[0]).toMatchObject({ + role: "user", + chunks: [{ type: "text", text: "first question" }], + }); + expect(init[1]).toMatchObject({ + role: "assistant", + chunks: [{ type: "text", text: "first answer" }], + }); + }); + + it("leaves messages empty when the DB has only the current turn's user message (first turn)", async () => { + const manager = new AgentManager(); + const tabId = "tab-first-turn"; + + // First-ever turn: DB has only the just-appended user message. + setFakeMessages(tabId, [makeRow(tabId, 0, "user", [{ type: "text", text: "hello" }])]); + + await manager.processMessage(tabId, "hello"); + + expect(constructedAgents.length).toBe(1); + const inst = constructedAgents[0]; + expect(inst).toBeDefined(); + if (!inst) return; + // The user message at idx 0 is the current turn — must be excluded. + expect((inst.initialMessages as unknown[]).length).toBe(0); + }); + + it("excludes a partial assistant trail from a prior fallback attempt", async () => { + const manager = new AgentManager(); + const tabId = "tab-fallback-partial"; + + // Scenario: the agent-mode fallback path. Attempt 1 (Opus) errored + // mid-stream after flushing some chunks; attempt 2 (DeepSeek) is + // about to start. DB looks like: + // u1, a1, u_current, partial_a_attempt1 + // The fresh Agent for attempt 2 must see [u1, a1] — not the + // current user message and not the failed attempt's partial. + setFakeMessages(tabId, [ + makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]), + makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]), + makeRow(tabId, 2, "user", [{ type: "text", text: "q2" }]), + makeRow(tabId, 3, "assistant", [{ type: "text", text: "half-baked..." }]), + ]); + + await manager.processMessage(tabId, "q2"); + + expect(constructedAgents.length).toBe(1); + const inst = constructedAgents[0]; + expect(inst).toBeDefined(); + if (!inst) return; + const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>; + expect(init.length).toBe(2); + expect(init[0]).toMatchObject({ role: "user", chunks: [{ type: "text", text: "q1" }] }); + expect(init[1]).toMatchObject({ role: "assistant", chunks: [{ type: "text", text: "a1" }] }); + }); + + it("preserves system-role rows in pre-populated history (toModelMessages filters them later)", async () => { + const manager = new AgentManager(); + const tabId = "tab-with-system-rows"; + + setFakeMessages(tabId, [ + makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]), + makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]), + makeRow(tabId, 2, "system", [ + { type: "system", kind: "config-reload", text: "Configuration reloaded" }, + ]), + makeRow(tabId, 3, "user", [{ type: "text", text: "q2" }]), + ]); + + await manager.processMessage(tabId, "q2"); + + expect(constructedAgents.length).toBe(1); + const inst = constructedAgents[0]; + expect(inst).toBeDefined(); + if (!inst) return; + const init = inst.initialMessages as Array<{ role: string; chunks: unknown[] }>; + // All three prior rows (user/assistant/system) preserved; the + // LLM-facing `toModelMessages` strips the system row later. + expect(init.length).toBe(3); + expect(init[2]).toMatchObject({ role: "system" }); + }); + + it("survives a getMessagesForTab failure without crashing (messages stays empty)", async () => { + const manager = new AgentManager(); + const tabId = "tab-db-error"; + + // Simulate DB error by stubbing the fake-store getter to throw + // for this specific tab. We use a Proxy on the Map's get method + // for the duration of one call. + const realGet = fakeMessagesByTab.get.bind(fakeMessagesByTab); + fakeMessagesByTab.get = ((key: string) => { + if (key === tabId) throw new Error("simulated DB error"); + return realGet(key); + }) as typeof fakeMessagesByTab.get; + + try { + await expect(manager.processMessage(tabId, "anything")).resolves.toBeUndefined(); + } finally { + fakeMessagesByTab.get = realGet; + } + + // Agent still constructed, just with empty messages. + expect(constructedAgents.length).toBe(1); + const inst = constructedAgents[0]; + expect(inst).toBeDefined(); + if (!inst) return; + expect((inst.initialMessages as unknown[]).length).toBe(0); + }); + + it("reloads history on every Agent reconstruction (simulated model switch)", async () => { + const manager = new AgentManager(); + const tabId = "tab-model-switch"; + + // Turn 1: empty DB → just the first user message. + setFakeMessages(tabId, [makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }])]); + await manager.processMessage(tabId, "q1", "key-opus", "claude-opus-4-7"); + + // Turn 2: DB now has the full prior turn + new user message. + // User has switched models via the sidebar slider — different + // (keyId, modelId) triggers Agent invalidation and reconstruction. + setFakeMessages(tabId, [ + makeRow(tabId, 0, "user", [{ type: "text", text: "q1" }]), + makeRow(tabId, 1, "assistant", [{ type: "text", text: "a1" }]), + makeRow(tabId, 2, "user", [{ type: "text", text: "q2" }]), + ]); + await manager.processMessage(tabId, "q2", "key-deepseek", "deepseek-v3"); + + // Exactly two Agents constructed across the two turns (the + // invalidation gate fires when keyId/modelId change). + expect(constructedAgents.length).toBe(2); + + // Second Agent (the DeepSeek one) was pre-populated with the + // completed first turn — not empty, not duplicating q2. + const second = constructedAgents[1]; + expect(second).toBeDefined(); + if (!second) return; + const init = second.initialMessages as Array<{ role: string; chunks: unknown[] }>; + expect(init.length).toBe(2); + expect(init[0]).toMatchObject({ role: "user", chunks: [{ type: "text", text: "q1" }] }); + expect(init[1]).toMatchObject({ role: "assistant", chunks: [{ type: "text", text: "a1" }] }); + }); }); |
