summaryrefslogtreecommitdiffhomepage
path: root/packages/api/src/agent-manager.ts
AgeCommit message (Collapse)Author
2026-06-04feat(config): add subdirectory LSP config watchers and fix permission orderingv2-deprecatedAdam Malczewski
- Add watchDirConfig() for per-directory config watching - Register watchers for subdirectories with their own dispatch.toml - Fix permission ordering: move "*" wildcard to front so findLast reaches specific rules first (was silently breaking all specific bash permission rules) - Add comprehensive tests for watcher functionality - Update mocks in test files
2026-06-03feat(config): merge home-directory global dispatch.toml under project configAdam Malczewski
Load an optional global config at ~/.config/dispatch/dispatch.toml (override via DISPATCH_GLOBAL_CONFIG) and deep-merge it underneath every project/working-directory dispatch.toml, so machine-wide settings — most notably globally available LSP servers — work in any repo without per-repo config. Local always wins on conflicts. - loader: add getGlobalConfigPath(), loadGlobalConfig(), mergeConfigs(); loadConfig(dir) now loads+merges global. [lsp] and [[keys]] merge by id; [permissions] merge per-group with global patterns emitted first so local rules win at evaluation time (findLast). A malformed global config is downgraded to empty rather than breaking every repo. - watcher: watch BOTH global and local dispatch.toml so hot-reload re-merges on either change (dedupes when paths coincide). - export new loader fns from config/index and core index. - types/agent-manager: doc updates reflecting merged LSP resolution. - dispatch.toml: document global-default merge behavior; activate biome and typescript-language-server LSP entries. - tests: merge precedence, lsp/keys merge-by-id, permissions merge, filesystem integration, malformed-global resilience; isolate global path in existing loader tests.
2026-06-03Merge branch 'dev' into warm/prompt-cache-warmingAdam Malczewski
# Conflicts: # packages/api/src/agent-manager.ts # packages/api/tests/agent-manager.test.ts # packages/frontend/src/lib/tabs.svelte.ts
2026-06-03fix: warm the SAME Anthropic message-cache bucket as real turnsAdam Malczewski
Root cause of the 'first warmup misses' + 'switch to chat misses' bugs: Anthropic keys the MESSAGE-level prompt cache on `tool_choice` AND the extended-thinking parameters (both rows in their cache-invalidation table mark the messages cache as invalidated on change). The original warmCache() sent toolChoice:'none' and NO thinking providerOptions, while real turns send toolChoice:'auto' + thinking config for the effort. So warming and chat wrote TWO different message-cache buckets: - warmup #1 missed (no warm-only bucket existed yet), every later warmup hit its own bucket; - the next real chat message read the OTHER bucket → miss. Fix: extract a shared buildStreamOptions() that produces the cache-affecting params (toolChoice + thinking providerOptions + maxOutputTokens). Both run() and warmCache() now call it with the SAME resolved reasoning effort, so the warming replay refreshes the exact cache the next real message reads. The trivial probe turn is still appended AFTER the last cache breakpoint, so it never disturbs the cached prefix. Threaded the per-tab reasoning effort (per-model -> per-tab selector -> default, mirroring processMessage) from the frontend resolver through POST /chat/warm to warmCacheForTab to warmCache. Tests: updated the warmCache toolChoice test to assert it MATCHES a real turn, added an invariant test driving run() and warmCache() and asserting identical cache-affecting params, and assert effort forwarding in the frontend store. check / test (780) / frontend build / typecheck all green.
2026-06-03feat: prompt cache warming for idle tabsAdam Malczewski
Keep a tab's provider prompt-cache warm while idle by periodically replaying the exact cached conversation prefix plus a single trivial throwaway turn, resetting the provider's ~5-min cache TTL so the user's next real message hits a warm cache. Backend: - Agent.warmCache(history): extracts buildLlmContext() shared with run(), then re-sends the identical system+tools+history prefix (same Anthropic cache_control breakpoints) plus a 'reply with just a .' probe turn via toolChoice:none. Returns the request usage; mutates no history, emits/persists nothing. - AgentManager.warmCacheForTab(): resolves the same agent the next real turn would use, replays the FULL genuine history, refuses while a turn is running. - POST /chat/warm: returns ONLY the warming request's usage (never persisted, never folded into the real usage aggregate). Frontend: - cache-warming.svelte.ts store: per-tab 4-min repeating idle timer with countdown, warming-specific last-request cache %, and error capture. Arms on turn end, pauses during a turn, disables+resets on a real user message. - cache-warm-storage.ts: per-tab localStorage persistence of the toggle. - Lifecycle hooks wired into tabs.svelte.ts (status/statuses/sendMessage/ hydrate/create/open/close). - ModelSelector: bottom-of-panel checkbox + debug strip (last-% / countdown / error), shown only when enabled. Warming cache data never touches the real Cache Rate view. Tests: core warmCache (5), api warm route (3) + warmCacheForTab (3), frontend store (12) + storage (10). check / test (779) / frontend build / typecheck all green.
2026-06-03Merge branch 'dev' into cmp7/compaction-toolAdam Malczewski
# Conflicts: # packages/frontend/src/lib/components/ChatInput.svelte
2026-06-03feat(compaction): add UI-driven conversation compactionAdam Malczewski
Summarize a conversation's older "head" into a structured anchored Markdown summary while preserving the most recent turns verbatim, shrinking context size while keeping the information needed to continue coherently. Triggered by a "Compact conversation" button in Chat Settings (not an agent tool). Approach informed by OpenCode's session/compaction.ts: - Ported SUMMARY_TEMPLATE (Goal / Constraints / Progress / Key Decisions / Next Steps / Critical Context / Relevant Files) and the anchored-summary buildPrompt (re-summarizes a prior summary when present). - Ported the TOOL_OUTPUT_MAX_CHARS (2000) cap on tool results in the summary request. - Simplified tail selection to a fixed recent-turn count (DEFAULT_TAIL_TURNS=2) instead of OpenCode's token-budget splitTurn. core: - New src/compaction/ module (pure, DB-free): template, prompt builder, head/tail selection, transcript renderer with tool-output capping, prior summary extraction. Generic over ChatMessage so callers keep turnId/seq. - db/chunks.ts: rekeyChunks(from,to) relocates a tab's full history to a backup tab (reversible — nothing is deleted). - AgentEvent: compaction-started / -complete / -error variants. api: - AgentManager.compactTab(tempTabId, sourceTabId): side-effect-free resolveConnection() for the compactor model (configured compaction_model_*, else the source tab's own key+model), one-shot tool-less summary generation via a transient Agent, then relocate full history to a fresh backup tab and re-seed the canonical source id with [summary turn + preserved tail]. Source tab is locked (messages queue) during the run; queue drains afterward. - Routes: POST /tabs/:id/compact, GET/PUT /tabs/settings/compaction-model. frontend: - "Compact conversation" button in ModelSelector (Chat Settings), between Working Directory and the agent toggle; idle-gated. - Compaction-model key+model selector in Settings, beside the title model. - Transient placeholder tab shows a large, non-faded "Please wait, compacting conversation…" screen; closing it cancels. Source input locked while running. - Handle compaction-* events: reload compacted source, insert backup tab, refocus source, discard placeholder. tests: core compaction unit tests, rekeyChunks DB test, AgentManager.compactTab orchestration tests, and compaction route tests. All green (713 tests), biome clean, all typechecks pass, frontend builds.
2026-06-03Merge branch 'dev' into img8/image-attachmentsAdam Malczewski
2026-06-02feat(chat): paste-to-attach images/PDFs with model capability checkAdam Malczewski
Add multimodal image/PDF input to the chat box via clipboard paste, gated by a graceful per-model capability check. UX: a pasted image/PDF inserts an inline token (【image:…】 / 【pdf:…】) into the draft, so attachments have ORDER relative to typed text and can be referenced positionally. The token is the only handle — deleting it (atomic Backspace/ Delete, or selection overlap) detaches the file; an input-reconciliation safety net detaches any attachment whose token is no longer intact. No preview strip. Capability check: resolveModelCapabilities reads models.dev modalities.input (new GET /models/capabilities, mirrors /context-limit). The input blocks Send (no tokens spent) only on a definitive 'no'; unknown capability (catalog offline / unmapped provider) stays permissive. Attachments require a fresh turn — Send is blocked while generating and /chat rejects content mid-turn (409). Attachments are EPHEMERAL: forwarded to the model for the turn via ordered AI SDK ImagePart/FilePart content, but never persisted (history keeps the text with [image]/[pdf] markers). Text-only turns serialize byte-identically to before. Limits (Anthropic-aligned, enforced at paste + re-validated server-side): PNG/JPEG/WebP/GIF/PDF; image ≤5MB, PDF ≤32MB, ≤20 attachments, ≤32MB total. core: UserContentPart types, models/attachments validator, capability resolver, agent.run+toModelMessages thread ordered content. api: /chat content validation + passthrough. frontend: attachment-tokens helper, ChatInput paste/token/gating, per-tab staged attachments, App.svelte capability fetch. +44 tests.
2026-06-02feat(tools): add key_usage tool reporting API-key usage levelsAdam Malczewski
Adds an agent-callable `key_usage` tool that reports current usage for configured API keys so the agent can pick a key with headroom, warn before hitting a rate limit, and diagnose exhausted-key failures. Per key it reports: provider, active/exhausted status (with last error + when it was exhausted), remaining rate-limit headroom and reset timestamp per window (5-hour, weekly, and monthly where the provider exposes it), and whether the figures are live or served from cache (with the cache's last-fetched-from-source time). Supports anthropic and opencode-go keys (live with cache fallback for anthropic; live scrape for opencode-go). Optional `key_id` reports one key; omitted reports all. Hard permission gate `perm_key_usage` (default off): when disabled the tool is completely removed from the toolset/context. Registered in both the parent permission-gated path and the child whitelist path, advertised in the system prompt (TOOL_DESCRIPTIONS), grantable to subagents via the summon enum, and exposed as a frontend tool-permission checkbox. To report data freshness, claude.ts gains `getAccountUsageWithSource` + `ClaudeUsageResult` (live vs cache + cachedAt from usage_cache.cached_at); the existing `getAccountUsage` now delegates to it, preserving behavior. Tests: core key-usage tool suite (windows, %-conversion, freshness, exhausted status, unsupported/unavailable, filtering) + agent-manager perm-gate test.
2026-06-02Merge branch 'dev' into feat/cs-code-search-toolAdam Malczewski
# Conflicts: # packages/api/src/agent-manager.ts # packages/api/tests/agent-manager.test.ts # packages/frontend/src/lib/components/ToolPermissions.svelte # packages/frontend/src/lib/settings.svelte.ts
2026-06-02feat(lsp): add config-driven LSP support (Roblox Luau via luau-lsp)Adam Malczewski
Add Language Server Protocol integration modeled on opencode's, wired for this codebase's plain-TypeScript tool/agent architecture. Core (@dispatch/core): - lsp/client.ts: LSP/JSON-RPC client over stdio (vscode-jsonrpc) with the initialize handshake, didOpen/didChange sync, push + pull diagnostics (textDocument/diagnostic, workspace/diagnostic), and a generic request() passthrough for hover/definition/references/documentSymbol. - lsp/server.ts: resolves dispatch.toml [lsp] entries into spawn specs. Config-driven only — no builtin registry, no auto-download. - lsp/manager.ts: process-wide LspManager owning client lifecycles, keyed by root+serverID, lazy spawn + reuse + graceful shutdown. - lsp/language.ts: extension->languageId map incl. .luau -> "luau". - lsp/diagnostic.ts: error-only <diagnostics> block formatting (1-based). - tools/lsp.ts: on-demand 'lsp' tool (1-based coords -> 0-based wire). - write-file.ts: optional onAfterWrite hook for diagnostics-on-write. - config schema: validate [lsp] block; DispatchConfig.lsp + LspServerConfig. API (@dispatch/api): - AgentManager owns one LspManager; per-working-directory server cache cleared on config reload; diagnostics appended to write_file results; 'lsp' tool gated by new perm_lsp setting; shutdownAll on destroy(). Config: - dispatch.toml: documented, commented [lsp.luau-lsp] Roblox example. Tests: fake-lsp-server fixture + client/manager/server/diagnostic/schema/ tool/write-hook suites, plus an opt-in real-binary luau-lsp smoke test (auto-skipped when luau-lsp is absent). 652 pass; biome + 3 typechecks green.
2026-06-02feat: add search_code tool wrapping the cs code-search engineAdam Malczewski
Add a dedicated, permission-gated search_code tool that wraps boyter/cs (code spelunker) — a fast, relevance-ranked, structure-aware code search engine — giving agents a better default than grep/find for exploratory 'where is X / how does Y work' searches (ranked results, snippets, ~5x smaller payloads). - packages/core/src/tools/search-code.ts: createSearchCodeTool factory; -f json invocation, workdir path containment, graceful missing-binary handling (DISPATCH_CS_BIN override), readable per-file formatted output. - Wire-up: export from core; register in agent-manager (both child-whitelist and parent perm paths) behind new perm_search_code; add to summon catalog + tools enum; frontend ToolPermissions + settings. - Docker: build a patched, statically-linked cs (pinned v3.1.0 commit) in a golang builder stage and bundle at /usr/local/bin/cs. - docker/cs/luau-declarations.patch: additive Luau declaration table so --only-declarations / definition ranking works for Roblox .luau files (upstream has Lua but not Luau). Applied during the Docker build. - Tests: new search-code.test.ts (stubbed JSON formatting + live-cs integration, skipped when cs absent); agent-manager/routes mocks + perm-gating assertions; loader pass-through. All tests (596), biome, and tsc (core/api/frontend) pass. cs-builder Docker stage verified to build and produce a working patched binary.
2026-06-02Merge branch 'dev' into perm/fix-user-agent-summon-permissionAdam Malczewski
# Conflicts: # packages/api/tests/agent-manager.test.ts
2026-06-02fix(tabs): say a reply will WAKE you with a new message (clearer than ↵Adam Malczewski
'arrives on its own') Matches actual behavior: a peer's reply wakes this tab with a new message in a later turn. Updated the send_to_tab description (both canReadTab branches), the delivery-result text (both branches), and the system-prompt one-liner; updated the test assertion accordingly.
2026-06-02fix(perm): decouple perm_user_agent from perm_summon for spawning user agentsAdam Malczewski
Granting only the user-agent (top-level) permission without the subagent-summon permission left the agent unable to summon user agents: the whole summon tool was gated behind perm_summon, so perm_user_agent alone produced no summon tool. Register summon when EITHER perm_summon OR perm_user_agent is granted. createSummonTool now takes an independent subagentEnabled flag (mirrors perm_summon) alongside userAgentEnabled (mirrors perm_user_agent): - subagent-only -> ordinary subagents, no top_level - user-agent-only -> spawns ONLY top-level user agents (top_level forced, background/top_level params dropped, user-agent catalog only) - both -> unchanged full behavior retrieve stays bundled with perm_summon (user agents are fire-and-forget). Adds core summon tests (user-agent-only mode + legacy-default regression) and an agent-manager summon/user_agent permission-split suite.
2026-06-02fix(tabs): only mention read_tab when the sender actually has it; CAPS on ONLYAdam Malczewski
The send_to_tab guidance previously told the agent it could call read_tab to check for a reply, but the tab-messaging permissions are split — a tab can hold send_to_tab WITHOUT read_tab (the exact case in testing). Advertising a tool the agent wasn't granted is wrong. Thread a canReadTab flag from AgentManager.buildTabCommToolEntries into createSendToTabTool (true iff this tab is also granted read_tab). The tool description and the delivery-result text now only reference read_tab when canReadTab is true; otherwise they say a reply arrives on its own and to end the turn. Drop the read_tab phrasing from the static TOOL_DESCRIPTIONS one-liner (can't be conditional per-tab there). Also uppercase ONLY in the recipient reply-contract footer for emphasis. Tests: cover both canReadTab branches for description + result text; assert ONLY is uppercased.
2026-06-02fix(tabs): clearer send_to_tab context to stop busy-wait + wrong-recipient ↵Adam Malczewski
replies Two behavioral problems observed once the tools were usable: 1. The SENDER busy-waited for a reply (ran 'sleep 20' / polled) instead of ending its turn. Tool description, the delivery result text, and the system-prompt one-liner now say plainly: do not sleep/poll/run commands to wait; a reply arrives on its own in a later turn (or via read_tab in a future turn); keep working if there's other work, else end your turn. 2. The RECIPIENT replied to its OWN user in plain text instead of routing the answer back through send_to_tab. The provenance wrapper now states the message is from another AGENT (not your user), and that to reply you must use send_to_tab addressed to the sender's handle — and only if asked, since it may just be context. A plain text answer reaches only your own user. Tests updated for the new wording.
2026-06-02fix(tabs): advertise send_to_tab/read_tab in the agent system promptAdam Malczewski
Granted tab-messaging tools were registered in the API tool payload but buildSystemPrompt built its 'You have access to the following tools' list by filtering toolNames through TOOL_DESCRIPTIONS, which had no entries for send_to_tab/read_tab. The model was therefore told it lacked those tools and refused to use them even when explicitly granted. Add the two missing TOOL_DESCRIPTIONS entries so the capability list matches the granted toolset. Add regression tests that capture the constructed Agent's systemPrompt and assert the tab-messaging tools are listed when granted (and omitted when not), locking the prompt list to the schema list so they can't drift again.
2026-06-02feat(todo): port opencode's declarative whole-list todo toolAdam Malczewski
Replace the imperative id-based CRUD todo tool (add/update/list/get/remove) with opencode's declarative whole-list design: a single `todos` param that replaces the entire list each call. No model-visible ids, no delta reasoning, no "task not found" spirals. - core: TaskItem { id, content, status }; statuses pending|in_progress| completed|cancelled. TaskList.setTasks/getTasks/onChange. New rich TODO_DESCRIPTION adapted from opencode's todowrite.txt. - api: TASK_MANAGEMENT_GUIDANCE system-prompt section (from anthropic.txt); updated TOOL_DESCRIPTIONS.todo. Reload fix: TabStatusSnapshot now carries per-tab tasks so getAllStatuses rehydrates the panel on reconnect. - frontend: mirror types; hydrate tasks from snapshot in both restore paths; upgrade sidebar Tasks panel to render content + all four statuses + progress. - tests: new core task-list.test.ts (15); updated api TaskList mocks + getAllStatuses task-snapshot coverage. bun run check clean; 569 tests pass; all packages typecheck.
2026-06-02Merge branch 'dev' into u3/agent-effort-levelAdam Malczewski
# Conflicts: # packages/api/tests/agent-manager.test.ts
2026-06-02fix: reconcile live cacheStats to DB truth on turn-sealedAdam Malczewski
Addresses the live-accumulator overshoot a Gemini review surfaced: the frontend adds every streamed usage event to cacheStats, but a rate-limited fallback attempt's usage is discarded server-side (never persisted). Live numbers overshot until a reload re-seeded from the DB aggregate. Fix: turn-sealed (emitted AFTER the atomic usage-row write) now carries the authoritative getUsageStatsForTab aggregate. The store REPLACES (not adds) cacheStats with it every turn — landing the just-sealed turn's usage AND self-healing any live drift, including the discarded-fallback overshoot. No extra round-trip (piggybacks turn-sealed); idempotent in the happy path. - core: add UsageStats type; getUsageStatsForTab returns it; turn-sealed gains optional usageStats field. - api: agent-manager reads getUsageStatsForTab post-flush and attaches it to the turn-sealed emit (try/catch: omit on DB error). - frontend: turn-sealed handler replaces cacheStats (undefined ⇒ untouched back-compat; null ⇒ clear). Tests: frontend reconcile/self-heal/back-compat/null-clear; api turn-sealed carries aggregate. 509 -> 514 passing; typecheck + biome green.
2026-06-02feat: persist per-tab token/cache usage across reloadAdam Malczewski
Persist usage as invisible type:"usage" chunk rows (side channel): - core: add "usage" ChunkType + UsageData; exclude usage rows from getChunksForTab/getTotalChunkCount; add getUsageStatsForTab aggregate (exported from barrel); defensive skip in groupRowsToMessages. - api: agent-manager accumulates per-attempt usageRows and flushes them in the same atomic appendChunks call as the turn's content (discarded on a superseded fallback attempt). GET /tabs enriches rows with usageStats. - frontend: hydrateFromBackend seeds cacheStats from usageStats (reload only; no re-seed on statuses reconnect, so no double-count with live events). Tests: core DB-backed usage persistence/aggregate; api usage-row-per-event + fallback discard; routes GET /tabs usageStats; frontend hydrate seed + no-double-count + live-accumulation-after-seed. 495 -> 509 passing.
2026-06-02feat(agents): per-model reasoning effort levelAdam Malczewski
Add a per-model/key reasoning effort setting to agent definitions, surfaced and editable in the Agent Settings page and displayed at a glance in the model selector views. - core: single source of truth for effort levels (REASONING_EFFORTS, DEFAULT_REASONING_EFFORT='high', labels, isReasoningEffort guard); add 'xhigh' level; AgentModelEntry.effort; xhigh budget=24000 for classic-thinking Claude; default floor 'high'. Persist/parse effort in the agent TOML loader. - api: thread effort through the fallback chain with per-model -> per-tab -> default precedence; validate /chat + agentModels effort from the canonical list. - frontend: effort <select> per model row in AgentBuilder; effort badges in ModelSelector (agent + subagent chains); Thinking dropdown sourced from canonical list; per-tab default raised to 'high'. - tests: +15 (loader round-trip, agent xhigh budget, canonical list + guard, api precedence, route validation).
2026-06-01fix(queue): consume queued messages after a turn ends (start a new turn)Adam Malczewski
A message queued while the agent was mid-turn was only handled if it arrived DURING a tool batch (injected as a [USER INTERRUPT]). If it landed after the last tool call — or the turn had no tools — the agent silently appended it to history and ended the turn with no response, so it sat there unanswered. This affected both user-queued messages and agent-queued ones (send_to_tab). - agent.ts: stop the end-of-turn drain that swallowed trailing queued messages into history. They now stay on the queue. - agent-manager: after a CLEAN turn settles, continueFromQueue() drains the queue and starts a fresh turn to answer it. Skipped on a user-stopped or errored turn (queue preserved for the next send). - Loop safety: continuation draws from the existing autoWakeBudget, so a runaway agent<->agent chain is bounded; human sends refill it, so human conversations are never throttled. - dequeueMessages now tags message-consumed with reason "interrupt" | "continuation"; the frontend collapses continuation- consumed queued bubbles into the next turn's initiator row (avoids the linger/dup traps documented in queue-interrupt-reconcile-edge-cases.md). - Tests: agent (no-swallow + interrupt regression), agent-manager (continuation, no-op when empty, user-stop preserves queue, bounded loop), frontend (continuation bubble becomes next initiator). - wishlist: remove the now-fixed item.
2026-06-01feat(tabs): tab-to-tab agent communication via short handlesAdam Malczewski
Add send_to_tab / read_tab tools so an agent can message or read another tab by a git-style short handle (shortest unique prefix of the tab UUID, min 4 chars), shown in the tab bar. - core/db/tabs: resolveTabPrefix + shortestUniquePrefix (open tabs only, LIKE-sanitized prefix matching) - new tools read-tab.ts / send-to-tab.ts (+ tests) decoupled from the DB TabRow via a minimal ResolvedTabRef projection - agent-manager: unified deliverMessage routing (busy -> queue, idle -> new turn) shared by POST /chat and send_to_tab; agent->agent auto-wake budget (MAX_AGENT_AUTO_WAKES) to bound ping-pong loops - summon/loader: send_to_tab + read_tab as grantable tools - frontend: shortHandleFor + handle badge in TabBar; perm toggles - notes: tab-comm / user-agents / todo-redesign plans - chore: biome format fixes (debug-logger, summon.test) Refs notes/plan-tab-comm.md
2026-05-31feat: implement user agents (top-level tabs via summon)Adam Malczewski
- agent parameter is now required on summon tool - new top_level param spawns independent fire-and-forget user agent tabs - gated by perm_user_agent permission (UI checkbox added) - agent definition type validation (subagent vs user-agent slug mismatch) - context-aware error messages when agent slug not found - read_file_slice added to summon tool's allowed tools enum - updated and expanded summon tests
2026-05-30feat(chunks): chunk-native frontend store with turn-sealed reconcile + ↵Adam Malczewski
per-chunk eviction Replace the stored ChatMessage[] with a chunk-native model: tab.chunks (sealed ChunkRow[]) + tab.live (transient in-flight turn buffer) + derived tab.renderGroups. This enables per-chunk eviction (trimming WITHIN a large turn) and raw-chunk pagination (loadOlderChunks), removing the whole-message eviction limitation. Backend: - Emit turn-start/turn-sealed around each turn; expose currentTurnId in the status snapshot. turn-sealed fires after the durable write (status:idle fires before it). - New GET /tabs/:id/chunks raw paginated endpoint (limit/before). - Wrap appendChunks in a single SQLite transaction. Frontend: - turn-sealed drives a turn-aware reconcile that folds the sealed turn into chunks while preserving a concurrent newer in-flight turn and pending queued messages; deferred while the user is scrolled up. - Stable turn-scoped render keys (${turnId}:${role}:${n}) avoid remount/flash. Reconcile correctness (three review passes): - preserve a concurrent newer turn when an earlier deferred reconcile flushes; - keep optimistic queued user messages (no loss); - turn-start backfill skips pending queued rows and tags only the turn initiator; - bind consumed interrupt messages to the in-flight turn so they collapse on seal (no lingering/duplicated bubble). Tests: chat-store reconcile/eviction/pagination suite; api chunks endpoint + events.
2026-05-30refactor(chunks): append-only chunk log with per-step cache-stable wireAdam Malczewski
Replace the message-as-container model with a flat, append-only chunk log. - chunks table (id, tab_id, seq, turn_id, step, role, type, data_json): one row per chunk; tool_call (assistant) and tool_result (tool) are SEPARATE rows linked by callId. Message/turn are derived groupings, not stored. - chunks/transform.ts: DB-free explode (Chunk[] -> rows) / group (rows -> messages), shared by backend and the browser frontend. - Cache fix: toModelMessages segments each turn at tool-batch boundaries into stable [assistant, tool] pairs per step, so earlier steps serialize byte-identically across requests (kills the prompt-cache churn). - agent-manager persists a turn's chunks on seal (once), discarding a failed fallback attempt's partial chunks; rebuilds agent history from the log. - GET /messages windows the log by chunk seq then groups; loadMoreMessages merges a turn split across the window boundary by turnId. - One-shot migration drops the legacy messages table and clears tabs; settings/credentials/keys/usage preserved. Full suite green (317 tests); biome, tsc, and svelte-check clean.
2026-05-29fix(claude): eliminate /home mount race that blanks Claude credentials at bootAdam Malczewski
On hosts where /home is a separate filesystem, the dispatch-api service could start before /home was mounted. The API's first DB access then failed (EACCES: mkdir '/home/tradam'), Claude account discovery silently caught the error and left claudeAccounts empty, and -- because discovery only ran in the constructor -- it stayed empty for the whole process lifetime. Every Claude message then fell back to the deepseek-v4-flash / empty-key defaults, producing a 401 'Missing API key' from OpenCode Zen. Fixes: - s6 run script waits (capped ~30s) for /home/tradam before exec'ing bun; passes instantly where /home is on the root filesystem. - systemd unit gains RequiresMountsFor=/home and After=...home.mount. - agent-manager re-runs _refreshClaudeAccounts() on config hot-reload and lazily on an empty cache in the Anthropic path, so a process that lost the boot race self-heals on the next request instead of staying broken.
2026-05-29feat: stop generation button with abort signal plumbingAdam Malczewski
- Add POST /chat/stop endpoint on API - Thread abortSignal from agent-manager through Agent.run() to streamText - Thread abortSignal option through the Agent.run() signature - Emit status:idle on stopTab() so frontend WS gets the update - Add stopGeneration() store method on frontend tabStore - Add stop button in ChatInput (btn-sm lg:btn-xs for mobile tap target) - Add tests for /chat/stop endpoint - Refactor processMessage to pass abortSignal to agent.run
2026-05-29feat: subagent summon — catalog filter, error hints, system prompt, ↵Adam Malczewski
AgentBuilder default, SubAgent mode display - Filter summon tool catalog to is_subagent-flagged agents only - Return fresh subagent list in error when slug not found - Add subagent hint to system prompt when summon tool available - Default is_subagent checkbox to true in AgentBuilder - Fix tab-created event to include agentSlug and agentModels - Add SubAgent read-only mode to ModelSelector with model slider
2026-05-29fix: refresh agent config on send; widen fallback retry detectionAdam Malczewski
- Refresh agent config from API before sending a message so edits in AgentBuilder (changed keyId/modelId/agentModels) take effect immediately on existing tabs instead of using stale snapshots - Broaden isRetryable check to also match 'usage limit' and 'exhausted' so fallback keys are actually tried on quota errors
2026-05-28fix(core): normalize tool schemas for Anthropic, add toolChoice=auto; ↵Adam Malczewski
feat(summon): agent definition support; docs: cc/ research findings - registry.ts: add normalizeForAnthropic() to strip , additionalProperties, default, nullable from zodToJsonSchema output so Anthropic doesn't silently reject tool definitions - agent.ts: add toolChoice=auto for Claude OAuth to prevent Opus thinking forever without calling tools - summon.ts: add agentSlug parameter, build agents catalog in description, add toAvailableAgents helper - agent-manager.ts: wire agent definition loading into spawnChildAgent, agent model fallback - loader.ts: export loadAgent, expandAgentToolNames, getAgentDirPaths; add getAgentDirPaths for permission gate - agent.ts: auto-allow read-only tools in agent definition directories - packaging/PKGBUILD: exclude ARM64 prebuilds from x86_64 package - cc/: research findings on Claude Opus tool calling issues - tests: loader tests, summon tool tests
2026-05-28feat: restore tab layout + in-flight chunks on browser reopen; agents keep ↵Adam Malczewski
running in background Implements the 'background-running agents + restore-layout-on-reopen' feature. Full design and parallel-implementation plan in `plan-bg-restore.md`; Gemini code review (SHIP verdict, no findings) in `report.md`. User-visible behaviors: 1. Browser-close keeps agents alive. If an agent is mid-stream when the browser closes / reloads / loses the network, it continues processing on the backend. (This was already the case in code — agents run fire-and-forget in app.ts:77-79 — but it was previously pointless because the UI never restored the tab to receive the output.) 2. Layout restore on browser reopen. Every tab that existed at the time the window was closed is restored, in original `position` order, with full persisted message history. Tabs whose agents finished while disconnected appear with the completed message. Tabs whose agents are still running appear streaming live — the in-flight assistant message is reconstructed from the backend's in-memory `currentChunks` (sent over the wire on connect) and accumulates new deltas as they arrive. 3. Explicit tab-close cancels + forgets. Clicking the X still cancels the agent (existing `stopTab` in DELETE /tabs/:id) and archives the row (`is_open = 0`), so it is not restored. No change to that path. The gap that the implementation closes: previously, App.svelte:onMount unconditionally called `createNewTab()` with a fresh UUID, ignoring every existing row in the `tabs` table. Every browser open was a clean slate. The DB had the conversation history but no way for the UI to discover it. Implementation: • New `TabStatusSnapshot` interface in packages/core/src/types/index.ts (auto-exported via existing `export * from "./types"`): interface TabStatusSnapshot { status: AgentStatus; currentChunks?: Chunk[]; // present iff running currentAssistantId?: string; // present iff running } • `agent-manager.ts:getAllStatuses()` rewritten to return `Record<string, TabStatusSnapshot>` (was `Record<string, AgentStatus>`). For running tabs only, attaches a defensive shallow copy of `tabAgent.currentChunks` (the live streaming array the per-message loop appends to) plus the DB id of the in-flight assistant message. The defensive copy is the consumer's to mutate. Idle / error tabs get `{ status }` only. `GET /status` and the WS `onOpen` snapshot both pick up the new shape automatically — neither call site changed. • Frontend mirror of `TabStatusSnapshot` in packages/frontend/src/lib/types.ts; `AgentEvent.statuses` variant updated to use `Record<string, TabStatusSnapshot>`. • New `hydrateFromBackend()` on the tab store (packages/frontend/src/lib/tabs.svelte.ts). Sequence on app mount: 1. Bail with 0 if `tabs.length > 0` (hot-reload idempotency). 2. GET /tabs → list of `is_open=1` rows in `position` order. 3. GET /status → in-flight TabStatusSnapshot map. 4. GET /tabs/:id/messages for each tab in parallel via Promise.all → persisted ChatMessage[]. 5. Build the Tab objects, splicing the snapshot's live chunks into the in-flight assistant message for every running tab (two paths: merge into the existing DB row with matching id, or append a fresh in-flight message if no row matches). 6. `tabs = restored; activeTabId = restored[0]?.id ?? null;` Every fetch is wrapped in try/catch so one tab's failure can't destroy the whole restore pass. • WS `statuses` handler in `tabs.svelte.ts:handleEvent` rewritten for the new shape. Still fires `reloadTabMessagesFromApi` on the desync case (frontend thinks running, backend says idle — the pre-existing recovery path is preserved). When backend says running, seeds in-flight chunks into the assistant message matching `snap.currentAssistantId` (creating it if needed). When backend says non-running, clears `isStreaming` on the previous in-flight message and nulls `currentAssistantId`. • `App.svelte:onMount` now awaits `tabStore.hydrateFromBackend()` before deciding whether to fall back to `createNewTab()`. Fallback condition is the doubly-defensive `restored === 0 && tabStore.tabs.length === 0`. `wsClient.connect()` fires in parallel with hydration — the resulting WS `statuses` event is per-tab idempotent against the hydrated state, so there is no race even if it arrives mid-hydration. What was NOT done (deliberately, deferred to wishlist): • Pre-existing inconsistency: core `AgentStatus` includes "waiting_for_key" but frontend `TabStatusSnapshot.status` uses only the existing 3-state pattern ("idle" | "running" | "error"). Not introduced here; mirrored the existing precedent. • Restored tabs use defaults for `reasoningEffort`, `agentSlug`, `agentScope`, `agentModels`, `workingDirectory` — these are not in the DB `tabs` schema. Future schema expansion. • Per-delta DB flushing — not needed; the in-memory snapshot covers the gap between flushAssistant calls. • LocalStorage cache of tab ids — backend DB is the source of truth. Process notes: • Implemented via parallel programmer subagents (flash agents were requested but unavailable in this environment — substituted with "programmer" agents, which share the "reads a plan, implements a single step" charter). Backend (Segment A: getAllStatuses + 5 tests) and frontend (Segment B: types + hydrateFromBackend + statuses handler + onMount + 8 tests) ran disjoint-file-ownership in parallel. • Gemini code review (yolo mode for tool access, explicit prompt-level write restriction to `report.md` only) returned a SHIP verdict with no findings against the plan. • Self-review surfaced one followup gap that Gemini's earlier plan-mode pass also caught: no explicit test for `/tabs/:id/messages` failure isolation. Added a test covering both HTTP-500 and network-error variants alongside a healthy tab, asserting per-tab failures don't destroy the whole restore. Tests: • api/tests/agent-manager.test.ts: +5 (snapshot empty record, idle-tab field omission, running-tab field inclusion, defensive copy invariant, omits chunks for running tab with null currentChunks). 31 total (was 26). • frontend/tests/chat-store.test.ts: +9 (restore-with-messages, in-flight seeding, /tabs failure → 0 returned, empty /tabs array, idempotency when tabs already exist, idle-status when /status omits, running-snapshot statuses handler seeding, idle-snapshot statuses handler clearing, per-tab failure isolation across HTTP-500 and network-error). 44 total (was 35). Totals: 243 tests across 3 packages all green; typecheck clean on core + api + frontend; biome clean across 124 files.
2026-05-28fix(api): pre-populate Agent.messages from DB on construction so model ↵Adam Malczewski
switches preserve chat history Before this change, swapping the model mid-conversation via the sidebar slider lost all prior turns: the new model saw only the current user message and treated the conversation as brand-new. Root cause: `getOrCreateAgentForTab` invalidates the cached Agent (`tabAgent.agent = null`) whenever the effective keyId, modelId, permission key, or working directory differs from the cached values. The replacement Agent was then constructed with `messages: []` and the post-construction step that loads prior turns from the SQLite `messages` table simply did not exist. `processMessage` had already appended the current turn's user message to the DB (line 960) before calling `getOrCreateAgentForTab` (line 1015), so the DB held the full context — it was just never read. Fix: after every `new Agent(...)` in `getOrCreateAgentForTab`, call `getMessagesForTab(tabId)`, walk backwards to the most recent user-role row, and assign all strictly-prior rows to `tabAgent.agent.messages`. The walk-backwards strategy correctly handles two boundary cases: 1. Simple model switch — last DB row is the current user message; drop it (`Agent.run()` will push it again at line 546). 2. Agent-mode auto-fallback retry — last DB row may be a partial assistant response flushed by the previous failed attempt; we drop both that and the current user message in one pass. System-role rows (config-reload notices, etc.) are preserved verbatim; `toModelMessages` already strips them before the wire payload, so this is safe. The fix covers every Agent-reconstruction trigger, not just the model slider: - Sidebar model/key change (the reported case) - Permission setting change - Working-directory change (`processMessage` line 951) - dispatch.toml config-watcher reload (lines 236–237) - Skills directory watcher reload (lines 249–250) - `stopTab` after user cancellation (line 775) If `getMessagesForTab` throws (e.g. DB locked, schema mismatch), we swallow the error and leave `messages: []` — matching pre-fix behaviour for that case so this commit never regresses. Tests (+6 in `packages/api/tests/agent-manager.test.ts`, total 26): - pre-populates Agent.messages from DB history - leaves messages empty when DB has only the current turn (first msg) - excludes a partial assistant trail from a prior fallback attempt - preserves system-role rows in pre-populated history - survives a getMessagesForTab failure without crashing - reloads history on every Agent reconstruction (simulated slider switch from Opus to DeepSeek across two processMessage calls) The test rig was extended with: - `fakeMessagesByTab` map + `setFakeMessages` helper letting tests inject arbitrary DB rows for the mocked `getMessagesForTab`. - `constructedAgents` array captured at `run()` entry (not at construction) so each test sees the post-pre-populate snapshot; the production code reassigns `agent.messages` after `new Agent()` returns, so capturing at construction yielded a stale empty array. - Pluggable `runImpl` hook for tests that want a custom event stream (not yet exercised; staged for the next round of agent-mode fallback tests). Totals: 229 tests across 3 packages all green; typecheck clean on core + api + frontend; biome clean across 124 files.
2026-05-27fix(api): apply correct model+baseURL when API key env var is missing, ↵Adam Malczewski
preventing silent fallback to OpenCode Go endpoint
2026-05-27refactor: ChatMessage.chunks[] union — interleaved thinking, tool ↵Adam Malczewski
batching, error/system chunks
2026-05-27feat: tool-output truncation+spill, read_file pagination, read_file_slice, ↵Adam Malczewski
symlink-safe path resolution
2026-05-24fix: prompt caching, OpenCode Go MiniMax/Qwen support, Opus 4.7 thinking, ↵Adam Malczewski
SDK compat - Implement Anthropic prompt caching: first system message + last 2 non-system messages get cache_control: ephemeral, mirroring OpenCode's applyCaching strategy. Move system prompt inline into messages array so providerOptions can attach. - Add opencode-anthropic provider variant routing MiniMax/Qwen models through the /messages endpoint with x-api-key auth, distinct from the Claude OAuth flow's Bearer auth and Claude Code mimicry. - Split isAnthropic into isClaudeOAuth (billing header, mcp_ tool prefix, thinking config) and usesAnthropicSDK (cache markers) so non-OAuth Anthropic-format gateways get the right treatment. - Pin @ai-sdk/anthropic to ^1.2.12: v3 returns LanguageModelV3-spec models that ai v4's streamText rejects at runtime ('AI SDK 4 only supports models that implement specification version v1'). Drop unnecessary V1 casts. - Restore Opus 4.7 extended thinking by rewriting the outgoing /messages body in the Claude OAuth fetch interceptor: inject thinking: { type: 'adaptive' } (v1 SDK can't emit it), strip temperature/top_p/top_k (Anthropic rejects them with thinking enabled). Gated on max_tokens > 4096 so effort=none still works. - Bump MAX_STEPS from 10 to 50 to align with AI SDK's stepCountIs(20) default and reduce mid-task halts. - Fix pre-existing typecheck errors in agent-manager.ts (entry/nextEntry narrowing), app.ts (agentModels body field), KeyUsage.svelte (m guards), and a TS2742 in provider.ts via explicit ModelFactory return type. - buildFallbackSequence now always returns at least one entry so processMessage runs the agent loop even without keyId/modelId (fixes 4 broken agent-manager tests).
2026-05-23feat: google gemini provider, adaptive thinking for opus 4.7, model search ↵Adam Malczewski
filter - Added Google (Gemini) as a provider: add-key UI, env var resolution via resolveApiKey, usage tracking via native models endpoint + gemini.google.com cookie scraping - @ai-sdk/anthropic upgraded to v3 (adaptive thinking support) with LanguageModelV1 cast for ai v4 compat - Claude Opus 4.7 uses adaptive thinking (type: adaptive); all other models keep explicit budget tokens - Model selector modal: search filter with space matching dash/underscore - Copy button: all tool results truncated at 300 chars - Sidebar layout fix: Claude Reset panel removed from flex-1 fill to prevent overlap
2026-05-23feat: fallback model range slider with live label, model-changed eventAdam Malczewski
- Added model-changed event: backend emits it on fallback, frontend updates tab keyId/modelId - Range slider embedded inside active agent card when >1 model configured - Live label updates on drag (oninput), backend call only on release (onchange) - Slider auto-positions when fallback occurs via model-changed WS event
2026-05-23feat: key fallback using agent models[] hierarchy, background tool modes, ↵Adam Malczewski
copy truncation - Agent rate-limit fallback now iterates through agent's configured models[] in strict order - Frontend sends agentModels with each /chat request; backend uses buildFallbackSequence() - Emits notice event on fallback so chat shows which key failed and what's being tried next - Child agents inherit parent's agentModels for fallback - Added statusCode propagation from AI SDK errors for programmatic 429 detection - Copy button truncates all tool results at 300 chars (was 200 for 4 specific tools) - run_shell, summon, youtube_transcribe: background mode support - summon: blocking mode by default with getResult callback
2026-05-23feat: relative working directory support and subagent tab cwd propagationAdam Malczewski
- Resolve relative cwd paths (e.g. ./subtask) against parent's working directory at runtime - check-dir endpoint resolves relative paths and returns the resolved absolute path - AgentBuilder shows resolved path below input for relative paths, updated helper text - tab-created event now includes workingDirectory so subagent tabs display their cwd in sidebar - Add workingDirectory to tab-created AgentEvent type definition - spawnChildAgent stores resolved absolute path instead of raw relative path
2026-05-23feat: add is_subagent flag to agents, fix all lint/type/test issuesAdam Malczewski
- Add is_subagent checkbox to agent editor; subagents are hidden from Chat Settings - Add is_subagent field to AgentDefinition type, TOML serialization, and API route - Filter subagents from ModelSelector agent list - Fix all biome lint/format errors across codebase (useLiteralKeys, noNonNullAssertion, noExplicitAny, formatting, import sorting) - Fix svelte-check errors (type narrowing in SkillsBrowser, ToolPermissions, SidebarPanel) - Fix a11y warnings in App.svelte (label-control associations) - Fix test mocks missing BackgroundShellStore, BackgroundTranscriptStore, createWebSearchTool, createYoutubeTranscribeTool - Update stale 409 test to match current message-queuing behavior - Exclude packaging/ and release/ dirs from biome to avoid linting stale build artifacts
2026-05-23feat: youtube_transcribe blocks with polling, interruptible with background ↵Adam Malczewski
retrieve - youtube_transcribe now polls until transcript is ready (waits estimated_seconds - 2s, min 2s) - Times out after 10 minutes of polling - When user interrupts, polling continues in background with youtube_transcribe_<uuid> job ID - BackgroundTranscriptStore holds polling jobs, retrieve tool resolves them - ToolCallDisplay shows 'interrupted' badge (blue) when result contains [USER INTERRUPT] - Applies to all interruptible tools: run_shell, youtube_transcribe, retrieve
2026-05-23feat: web_search + youtube_transcribe tools, shell interrupt backgrounding, ↵Adam Malczewski
fixes - Add web_search tool (Firecrawl POST to /v1/search with query, limit, lang, country, scrapeOptions) - Add youtube_transcribe tool (GET to transcriber service, handles completed/queued/failed statuses) - Both tools registered for parent agents (always) and child agents (permission-gated) - Added to summon enum, TOOL_DESCRIPTIONS, and core exports - Shell interrupt: run_shell now races against user queue interrupt - When interrupted, command continues in background with run_shell_<uuid> job ID - BackgroundShellStore holds running processes, auto-cleans 10min after completion - retrieve tool extended to handle both agent IDs and shell job IDs - Tool error detection: results starting with 'Error:' now marked isError in UI - Fix TS error: cast unavailMatch[1] regex capture group to string - Docker: network_mode host for Tailscale/LAN access to external services - Bun.serve idleTimeout set to 60s (was default 10s) - KeyUsage: clearer message when OpenCode usage data unavailable - Firecrawl: only send scrapeOptions when scrape=true (avoid 400 on instances without scrape support)
2026-05-22feat: message queue/interrupt system, CORS fix, mobile fixes, chat splittingAdam Malczewski
- Add message queue allowing users to send messages while agent is running - Queue messages are injected into tool results as [USER INTERRUPT] - Retrieve tool interrupted via Promise.race when user message arrives - Queued messages show with 'queued' badge and cancel button - Consumed messages repositioned and chat splits at interrupt point - New assistant message block created after interrupt for clean flow - Add POST /chat/cancel endpoint for cancelling queued messages - Fix CORS to allow any origin (Tailscale/LAN access) - Fix crypto.randomUUID fallback for non-secure contexts (HTTP) - Fix frontend API URL derivation from page hostname - Auto-create DB tab if missing on processMessage (foreign key fix) - Add error logging to processMessage catch block - Fix working directory input sync on agent switch - Fix agent mode button to re-apply agent settings
2026-05-22feat: agent builder, CWD support, auto-save, UI polish, unavailable tool ↵Adam Malczewski
handling - Agent Builder: full CRUD with card grid, drag-and-drop model reorder, edit/delete - Auto-save on edit with 600ms debounce, AbortController for concurrency, fieldset disabled until name entered - Agent definitions stored as TOML with cwd field, loaded from global/project dirs - Working directory: per-tab CWD override in Chat Settings, agent default CWD, auto-create on first message - CWD validation: check-dir endpoint with ~ expansion, real-time validity indicator - Subagent CWD validated against parent's effective CWD using path.relative - Unavailable tool calls: caught gracefully, shown as tool call with error badge, model retries - UI: tab bar border radius, sidebar border removed, chat input ghost style, scroll-to-bottom rectangle - Skills dir collapse uses CSS rotation, Model Choice renamed to Chat Settings, System Prompt view removed - Reusable SkillsBrowser/ToolPermissions with external mode for Agent Builder - ModelSelector: Agent/Manual toggle, agent list, Agent Settings link - Page router, skills recursive scanning, bin/up gopass removed, docker volume mounts
2026-05-22feat: two-row tab bar with temp/persistent subagent tabs, tab persistenceAdam Malczewski
- Split tab bar into user tabs (top) and subagent tabs (bottom row) - Bottom row only visible when active user tab has child agents - Parent user tab stays highlighted when viewing a subagent tab - Temp tabs auto-disappear when agent completes, click to promote to persistent - 'Open Tab' button on summon tool calls to view/reopen subagent history - Reopening archived tabs fetches full chat history from backend - Add parent_tab_id column to DB with safe ALTER TABLE migration - Persist keyId, modelId, parentTabId on child tab creation - Flush partial assistant messages on abort/error (finally block) - Defensive JSON.parse in message restoration - Allow all Vite hosts for local development - Fix nested button in ToolCallDisplay (button inside button)