diff options
| author | Adam Malczewski <[email protected]> | 2026-06-01 01:46:13 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-01 01:46:13 +0900 |
| commit | 8b9533c22a47bbf6f916667e2c25d8e8e419da37 (patch) | |
| tree | 715a6a3d6f43781395e7dc7c8cdb519cef46a870 | |
| parent | 1853dd1d40308deb829bc621beb79c5d39b9c57f (diff) | |
| download | dispatch-8b9533c22a47bbf6f916667e2c25d8e8e419da37.tar.gz dispatch-8b9533c22a47bbf6f916667e2c25d8e8e419da37.zip | |
feat(tabs): tab-to-tab agent communication via short handles
Add send_to_tab / read_tab tools so an agent can message or read another
tab by a git-style short handle (shortest unique prefix of the tab UUID,
min 4 chars), shown in the tab bar.
- core/db/tabs: resolveTabPrefix + shortestUniquePrefix (open tabs only,
LIKE-sanitized prefix matching)
- new tools read-tab.ts / send-to-tab.ts (+ tests) decoupled from the DB
TabRow via a minimal ResolvedTabRef projection
- agent-manager: unified deliverMessage routing (busy -> queue, idle ->
new turn) shared by POST /chat and send_to_tab; agent->agent auto-wake
budget (MAX_AGENT_AUTO_WAKES) to bound ping-pong loops
- summon/loader: send_to_tab + read_tab as grantable tools
- frontend: shortHandleFor + handle badge in TabBar; perm toggles
- notes: tab-comm / user-agents / todo-redesign plans
- chore: biome format fixes (debug-logger, summon.test)
Refs notes/plan-tab-comm.md
24 files changed, 2092 insertions, 42 deletions
diff --git a/notes/plan-tab-comm.md b/notes/plan-tab-comm.md new file mode 100644 index 0000000..7acd804 --- /dev/null +++ b/notes/plan-tab-comm.md @@ -0,0 +1,358 @@ +# Implementation Plan: Tab-to-Tab Agent Communication + +## Summary + +Give every tab a **short, human-readable handle** visible in the UI, and give +agents **two new tools** to talk to each other by that handle: + +1. **`send_to_tab`** — deliver a user message to another tab by its short ID. + - Target **mid-turn** → message is **queued** (identical path to a user message). + - Target **idle** → message **wakes** the tab and starts a new turn. + - Fire-and-forget: returns immediately, does not block on a response. +2. **`read_tab`** — return the target tab's **last completed assistant turn** plus + its current status (`idle` / `running` / `error`). Non-blocking snapshot. + +The tools are gated behind two independent permissions (both default off): +**`perm_send_to_tab`** (message other tabs) and **`perm_read_tab`** (read other +tabs), mirroring how `perm_user_agent` gates user-agent spawning. Splitting them +lets an operator grant read-only visibility without also granting the ability to +wake/steer other tabs. + +This enables: handing an agent a tab handle and asking it to steer another AI, +chaining agents, and one agent delegating subtasks to a peer tab. + +### The short ID is git-style: derived, not stored + +We do **not** add a `short_id` column. The handle is the **shortest unique +prefix of the tab's existing UUID**, exactly like git short commit hashes: + +- A prefix is valid as long as it uniquely identifies one open tab. +- Minimum display length is 4 hex chars; if two open tabs collide on those 4, + the displayed handle grows one char at a time until unambiguous. +- Resolution accepts **any length** ≥ a small floor and matches by prefix. + +This means the short form is a **pure projection** of data we already store — +no new column, no migration, no backfill, no unique-ID generation/retry, no +collision bookkeeping at insert time. It also can't drift: a prefix is always +computed against the current set of open tabs. + +### The good news: the wake/queue machinery already exists + +`POST /chat` (`packages/api/src/app.ts`) **already** implements exactly the +routing the wishlist describes: + +```ts +if (agentManager.getTabStatus(tabId) === "running") { + agentManager.queueMessage(tabId, message, queueId); // mid-turn → queue +} else { + agentManager.processMessage(tabId, message, ...); // idle → new turn +} +``` + +The new `send_to_tab` tool reuses this same decision via a small shared +`AgentManager` method. We are **not** inventing a new delivery mechanism — we're +exposing the existing one to agents and adding addressing by a derived handle. + +--- + +## Current architecture (for context) + +- **Tab identity**: tabs are keyed by `crypto.randomUUID()` — a canonical + **lowercase** hex UUID (verified: both `crypto.randomUUID()` and the frontend's + non-secure fallback emit lowercase). Created via `createTab()` in + `packages/core/src/db/tabs.ts`, reached from: + - `POST /tabs` (frontend "+" button → `createNewTab`) + - `AgentManager.spawnChildAgent` (summon, sub/user agents) +- **Message delivery**: `AgentManager` owns per-tab state in `tabAgents` + (`Map<tabId, TabAgent>`). Key methods: + - `processMessage(tabId, msg, keyId?, modelId?, ...)` — runs a full turn, + persists chunks, resolves completion promises. + - `queueMessage(tabId, msg, clientId?)` — pushes to `tabAgent.messageQueue` + and wakes blocking tools via `queueListeners`. + - `getTabStatus(tabId)` — `"idle" | "running" | "error"`. +- **Tools**: built per-tab in `getOrCreateAgentForTab`. Two paths — a + permission-gated path (top-level tabs) and a `toolsOverride` whitelist path + (child agents). Each tool is a `ToolDefinition` whose `execute` closes over + **callbacks** supplied by `AgentManager` (see `createSummonTool`, + `createRetrieveTool`). This is the seam the new tools plug into. +- **History**: a tab's conversation is a flat append-only `chunks` log. The last + assistant turn is recoverable via `getChunksForTab(tabId)` → + `groupRowsToMessages(...)` → last `role === "assistant"` message. +- **Permissions**: read in `getOrCreateAgentForTab` via `getSetting("perm_*")`, + folded into `permKey` (cache-invalidation), surfaced as checkboxes in + `ToolPermissions.svelte` with defaults in `settings.svelte.ts`. +- **SQLite `LIKE`**: case-insensitive for ASCII by default (no + `case_sensitive_like` PRAGMA set), so `6FE8` resolves `6fe8…` for free. `%` + and `_` are wildcards → incoming prefixes MUST be sanitized (see below). + +--- + +## Part A — Git-style short tab handles + +### The two halves + +A derived-prefix scheme has a **display** side (compute the shortest unique +prefix to show) and a **resolution** side (match an arbitrary-length prefix back +to one tab). They live in different layers: + +| Concern | Where | Why | +|---|---|---| +| **Display** — each open tab's shortest unique prefix | Frontend (`tabs.svelte.ts`) | The frontend already holds every open tab's full UUID; computing prefixes client-side needs zero new wire data and updates reactively as tabs open/close | +| **Resolution** — prefix string → real `tabId` | Backend DB (`db/tabs.ts`) | The tools run server-side and must resolve against the authoritative open-tab set | + +Because the handle is derived, **nothing new is stored or sent**. No change to +the `tabs` schema, no `tab-created` payload change. + +### Display: shortest-unique-prefix (frontend) + +`packages/frontend/src/lib/tabs.svelte.ts` +- Add a derived helper, e.g. `shortHandleFor(tabId)`, computed against the + current open `tabs`: + - Start at length 4. If any **other** open tab shares that prefix, increment + until unique (cap at full UUID as the degenerate fallback). + - Expose as a `$derived` map `{ tabId → handle }` so the tab bar and any + "Agents" view stay in sync as tabs open/close. +- This naturally yields 4-char handles in the common case and only grows on a + genuine leading-hex collision among open tabs. + +`packages/frontend/src/lib/components/TabBar.svelte` +- Render the handle as a small mono badge next to the title (both the user-tab + row and the subagent row). This is the human/LLM-visible addressing token. + +> Note: the existing debug `shortId` helper in `tabs.svelte.ts` (first-8-char +> slice) is unrelated and stays as-is; the new handle is collision-aware. + +### Resolution: prefix → tab (backend) + +`packages/core/src/db/tabs.ts` +- New `resolveTabPrefix(prefix: string): { status: "ok"; tab: TabRow } | + { status: "none" } | { status: "ambiguous"; matches: TabRow[] }`: + 1. **Sanitize**: lowercase; reject/strip anything outside `[0-9a-f-]` so the + SQLite `LIKE` wildcards `%`/`_` can't be injected. Enforce a min length + (e.g. 4) to avoid absurdly broad matches. + 2. Query open tabs: `SELECT * FROM tabs WHERE is_open = 1 AND id LIKE $p` + with `$p = prefix + '%'`. + 3. 0 rows → `none`; 1 row → `ok`; >1 → `ambiguous` (return the matches so the + caller can list them). +- Scope to `is_open = 1`: closed tabs are not addressable and shouldn't cause + phantom ambiguity. +- (Exact full-UUID still works — it's just a maximal prefix.) + +`packages/core/src/index.ts` +- Re-export `resolveTabPrefix`. + +### Why derive instead of store + +- **No migration / backfill / column.** Zero schema churn. +- **No insert-time unique-ID generation** (the race-prone part) — UUIDs are + already unique by construction. +- **Self-correcting.** If two tabs share a 4-char prefix, both just display 5 + chars; when one closes, the other can shrink back to 4. A stored handle would + go stale here. +- **Resolution is one indexed-ish `LIKE` scan** over open tabs (a handful of + rows); negligible cost. + +--- + +## Part B — The two tools + +Both live in core as `ToolDefinition` factories taking a callbacks object, and +are wired in `AgentManager` exactly like `summon`/`retrieve`. + +### Tool shapes + +``` +send_to_tab({ + tab_id: string, // required — the short handle (any unique-length prefix) of the target + message: string, // required — the user message to deliver +}) +-> "Delivered to tab 6fe8 (status: running → queued)" // or "(idle → started new turn)" + +read_tab({ + tab_id: string, // required — the short handle of the target tab +}) +-> "<tab_response tab=6fe8 status=idle>...last assistant turn text...</tab_response>" +``` + +The `tab_id` parameter accepts any length prefix; the tool resolves it via +`resolveTabPrefix`. On `ambiguous`, the tool returns the competing handles and +asks the agent to add a character — same UX as `git checkout <ambiguous sha>`. + +### `send_to_tab` semantics (mirrors `POST /chat`) + +1. `resolveTabPrefix(tab_id)`: + - `none` → error listing currently-open handles (mirrors how `summon` lists + valid agent slugs on a bad slug). + - `ambiguous` → error listing the matching handles; ask for one more char. + - `ok` → proceed with `tab.id`. +2. Reject sending to **self** (no-op footgun) with a clear message. +3. Prefix the delivered text with provenance so the target (and the user) know + who sent it and can reply back: + `[message from tab <senderHandle>]\n\n<message>` +4. Route via a new shared `AgentManager.deliverMessage(tabId, message)`: + - target `running` → `queueMessage(...)` → report `queued`. + - target `idle`/`error` → hydrate key/model from the live `TabAgent` (warm) + or the DB tab row (cold), then `processMessage(...)` → report `started`. +5. Return immediately (fire-and-forget). The sender uses `read_tab` later. + +### `read_tab` semantics (non-blocking snapshot) + +1. `resolveTabPrefix(tab_id)` (same `none`/`ambiguous` handling). +2. Read `getChunksForTab(tab.id)` → `groupRowsToMessages` → last + `role === "assistant"` message → flatten its text chunks. +3. Return that text plus `status` from `getTabStatus(tab.id)`: + - `running` → note the turn is still in progress; returns the **previous** + completed turn (or "no completed turn yet"). + - `idle` → the just-finished turn. + - empty history → "Tab has no assistant responses yet." + +**Why non-blocking:** two agents that block on each other's results would +deadlock. `retrieve` can block because child agents can't summon their parent; +peer tabs have no such guarantee. A snapshot read + explicit re-read is safe. +(An optional `wait: boolean` flag is possible later but deliberately omitted v1.) + +### Why a shared `deliverMessage` method + +`POST /chat` and `send_to_tab` make the **same** running/idle decision. Factor it +into one `AgentManager.deliverMessage(tabId, message, opts?)` returning +`{ status: "queued" | "started" }`, and call it from both. Avoids drift between +the HTTP path and the tool path. (Refactor `POST /chat` to use it.) + +The tools take a **resolver callback** (`resolveShortId`) wired to +`resolveTabPrefix`, plus `deliver`, `getLastResponse`, `getStatus`, +`listOpenHandles`, and `selfTabId` — all closed over by `AgentManager`, same +pattern as `createSummonTool`. + +--- + +## Changes by file + +### Core + +| File | Change | +|---|---| +| `packages/core/src/db/tabs.ts` | New `resolveTabPrefix(prefix)` (sanitize → `LIKE` over open tabs → ok/none/ambiguous). **No schema/column change.** | +| `packages/core/src/tools/send-to-tab.ts` | **new** — `createSendToTabTool(callbacks)`; `SendToTabCallbacks { resolveShortId, deliver, listOpenHandles, selfTabId }` | +| `packages/core/src/tools/read-tab.ts` | **new** — `createReadTabTool(callbacks)`; `ReadTabCallbacks { resolveShortId, getLastResponse, getStatus, listOpenHandles }` | +| `packages/core/src/tools/summon.ts` | Add `send_to_tab`, `read_tab` to the `tools` enum + description list so agents can grant them to children | +| `packages/core/src/index.ts` | Re-export `resolveTabPrefix` + the two new tool factories | + +> Removed from the earlier draft: `short_id` column, migration, backfill, +> `generateShortId()`, `getTabByShortId()`, and the `shortId` field on +> `TabRow`/`tab-created`. All obsolete under the derived-prefix design. + +### API + +| File | Change | +|---|---| +| `packages/api/src/agent-manager.ts` | Read `perm_send_to_tab` + `perm_read_tab` + add both to `permKey`; gate each tool independently; build `send_to_tab`/`read_tab` in **both** tool paths (permission path always; child path when `toolsOverride` includes them); add `deliverMessage()`; cold-tab key/model hydrate from DB on wake. Wire tool callbacks to `resolveTabPrefix`/`deliverMessage`/`getChunksForTab`/`getTabStatus`, passing `selfTabId = tabId` | +| `packages/api/src/app.ts` | Refactor `POST /chat` to call `deliverMessage()` | + +> `tab-created` payload is **unchanged** (no `shortId`), and `routes/tabs.ts` +> needs no change — the frontend derives handles from UUIDs it already receives. + +### Frontend + +| File | Change | +|---|---| +| `packages/frontend/src/lib/tabs.svelte.ts` | Add derived `shortHandleFor` / `{tabId→handle}` map (shortest-unique-prefix over open tabs). **No `Tab` field added** — it's derived, not stored | +| `packages/frontend/src/lib/components/TabBar.svelte` | Render the derived handle badge on each tab | +| `packages/frontend/src/lib/components/ToolPermissions.svelte` | Add two entries: `{ id: "send_to_tab", label: "Message other tabs" }` and `{ id: "read_tab", label: "Read other tabs" }` | +| `packages/frontend/src/lib/settings.svelte.ts` | Add `send_to_tab: false` + `read_tab: false` to `toolPerms` + `savedToolPerms` defaults | +| `packages/frontend/src/lib/components/AgentBuilder.svelte` *(if it lists tools)* | Include the two new tool names so agent definitions can grant them | + +--- + +## Files NOT changing + +| File | Why | +|---|---| +| `packages/core/src/db/index.ts` | **No schema change** — handle is derived, not stored | +| `packages/core/src/types/index.ts` | `tab-created` unchanged; no `shortId` on the wire | +| `packages/core/src/tools/retrieve.ts` | Unchanged — peer reads use the new `read_tab`, not `retrieve` | +| `packages/core/src/agent/agent.ts` | The agent loop already passes `queueCallbacks`/context to tools; no change needed | +| DB `settings` table | Key-value; `perm_send_to_tab` / `perm_read_tab` need no migration | +| Queue internals (`queueMessage`/`dequeueMessages`/`waitForQueuedMessage`) | Reused as-is | + +--- + +## Testing + +- **`packages/core/tests/db/`** — `resolveTabPrefix`: exact UUID → ok; 4-char + unique → ok; colliding prefix → ambiguous with both matches; unknown → none; + case-insensitive (`6FE8` ↔ `6fe8`); wildcard injection (`%`, `_`) sanitized; + closed tabs excluded from matches. +- **`packages/core/tests/tools/`** — `send_to_tab`: none → lists open handles; + ambiguous → asks for more chars; self-send rejected; provenance prefix applied; + calls `deliver`. `read_tab`: returns last assistant turn; empty history + message; status surfaced. +- **Frontend** — shortest-unique-prefix: two tabs sharing 4 hex chars both render + 5; closing one lets the other shrink back to 4; single tab renders 4. +- **`packages/api/tests/agent-manager.test.ts`** — `deliverMessage` routes + running→queue / idle→processMessage; cold-tab wake hydrates key/model from DB; + `perm_send_to_tab` / `perm_read_tab` each gate their tool independently + invalidate cache. +- **`packages/api/tests/routes.test.ts`** — `POST /chat` still behaves after the + `deliverMessage` refactor. +- Manual: tab A messages idle tab B (B wakes); A messages running tab B (queued, + consumed on B's next turn — see dependency below); A `read_tab` B; ambiguous + prefix prompts for one more char. + +--- + +## Risks / edge cases / dependencies + +- **DEPENDENCY — "queue not consumed after turn" bug.** When the target is + **running**, `send_to_tab` queues the message. Per the separate wishlist item, + a queued message currently attaches at end-of-turn but does **not** kick off a + new turn (`agent.ts` end-of-loop pushes it to history then yields `done`). So + peer messages to a *busy* tab won't get a reply until that fix lands. The + **idle-wake** path is fully functional today. → Recommend fixing the queue- + consumption bug alongside, or shipping idle-wake first and calling out the + busy-tab limitation. +- **Ambiguous prefix is a first-class outcome**, not an error to hide. Surface + the competing handles and ask for one more char (git's exact UX). Tests must + cover it. +- **`LIKE` wildcard injection.** `%`/`_` in a raw prefix would broaden the match. + Sanitize to `[0-9a-f-]` + min length before querying. Covered by a test. +- **Display vs resolution drift window.** The frontend computes a handle from its + known open tabs; the backend resolves against the DB. If they momentarily + disagree (a tab opened elsewhere a beat ago), the worst case is a `none`/ + `ambiguous` the agent retries — self-correcting, no corruption. +- **Cold-tab wake loses fallback chain.** An idle tab not in `tabAgents` (e.g. + after server restart) only has `key_id`/`model_id` in the DB — the agent + definition's multi-model `agentModels` fallback chain isn't persisted. Waking + it uses the single stored model (no fallback). Acceptable degradation; note it. + (Overlaps with the "key switching not migrating context" wishlist item.) +- **Deadlock avoidance.** `read_tab` is intentionally non-blocking so two agents + can't wait on each other forever. +- **Runaway agent ping-pong (livelock) — MITIGATED.** Two agents that each reply + to incoming messages (A wakes B wakes A ...) would spend tokens unbounded with + no human in the loop. Mitigation: an **origin-aware auto-wake budget** in + `AgentManager.deliverMessage`. Each tab carries `autoWakeBudget` (max + `MAX_AGENT_AUTO_WAKES = 6`). A `send_to_tab` call delivers with `origin: + "agent"`; waking an idle tab consumes one unit. At 0, further agent messages + are **queued but do NOT wake** the tab (`status: "suppressed"`) and a `notice` + system chunk is emitted; the `send_to_tab` tool returns a "HELD — do not keep + resending" message so the sender stops. Any human-originated delivery + (`POST /chat`, `origin: "human"`, the default) **refills the budget to full**, + so human-driven and bounded multi-hop delegation are unrestricted; only + unattended machine-to-machine cascades are capped. Messages are never dropped. +- **Footguns behind a permission.** An agent could spam or wake the user's + personal tabs. Mitigations: `perm_send_to_tab` / `perm_read_tab` default **off**; self-send + blocked; provenance prefix makes the source visible to the user. +- **Stale handle in tool description.** Unlike `summon`'s agent catalog, we do + NOT bake the live tab list into the tool description (it changes constantly). + Discovery is via the UI badge + the open-handle list returned on none/ambiguous. + +--- + +## Suggested phasing + +1. **Phase 1 — Derived handles (Part A).** `resolveTabPrefix` (backend) + + shortest-unique-prefix display + TabBar badge. No schema change; shippable on + its own (useful even before the tools). +2. **Phase 2 — Tools (Part B).** `send_to_tab` + `read_tab`, `deliverMessage` + refactor, `perm_send_to_tab` + `perm_read_tab`, permission UI + defaults, summon whitelist. +3. **Phase 3 — Polish.** Optional "Agents" sidebar view; fix the + queue-consumption bug so busy-tab delivery replies without a nudge; optional + `wait` flag on `read_tab`. diff --git a/notes/plan-user-agents.md b/notes/plan-user-agents.md new file mode 100644 index 0000000..012dbfb --- /dev/null +++ b/notes/plan-user-agents.md @@ -0,0 +1,206 @@ +# Implementation Plan: User Agents + +## Summary + +Two changes rolled into one: + +1. **`agent` becomes required** on `summon` — all spawned agents must use a definition +2. **New `top_level` mode** — spawns independent "user agent" tabs, gated by a new `perm_user_agent` permission + +A **user agent** is a top-level, independent tab spawned by an AI agent via the `summon` tool. Unlike **subagents** (child tabs owned by a parent), user agents appear as first-class tabs — persistent, independent lifecycle, no parent. They are fire-and-forget: the spawning agent gets an `agent_id` back but cannot `retrieve` the result. + +User agents **must** be spawned from a non-subagent agent definition (e.g. `default`). The definition controls their tools, models, working directory, and system prompt. Tools are still intersected with the spawning agent's own tools (can't escalate). A new `perm_user_agent` permission gates access to this capability, surfaced as a separate checkbox in the Tool Permissions UI. + +--- + +## Current Architecture (for context) + +Today, the `summon` tool creates **subagent tabs**: + +- They have a `parentTabId` linking them to the spawning tab +- They show in a **bottom row** under the parent tab in the tab bar +- They're **non-persistent** by default (italic, faded) until promoted +- Their **tools are restricted** — intersected with the parent's tool set (can't escalate) +- Their **working directory** must be within the parent's working directory +- They have **completion tracking** (`completionPromise`) — the parent blocks on `retrieve` until the child finishes +- The `agent` parameter is currently optional — agents can be spawned ad-hoc with just a `tools` list, inheriting the parent's model + +### Comparison table + +| Property | Subagent (current & updated) | User Agent (new) | +|---|---|---| +| `parentTabId` | set to parent | `null` | +| `persistent` | `false` (promoted on click) | `true` | +| Tab bar position | Bottom row under parent | Top row with user tabs | +| Tab lifecycle | Closed when parent closes | Independent | +| Retrievable | Yes, via `retrieve` tool | No — fire-and-forget | +| Working directory | Must be within parent's dir | Any dir (from definition or default) | +| Completion tracking | Yes (`completionPromise`) | No | +| Agent definition | Required | Required (`is_subagent !== true`) | +| Models | From agent definition | From agent definition | + +--- + +## Summon tool — new parameter shape + +``` +summon({ + task: string, // required — what to do + agent: string, // required — agent definition slug (was optional) + top_level?: boolean, // optional — spawn as user agent (only in schema if perm_user_agent enabled) + tools?: string[], // optional — override tools (intersected with spawning agent's tools) + background?: boolean, // optional — for subagents only (user agents are always fire-and-forget) + working_directory?: string // optional — override the definition's cwd +}) +``` + +### Tools resolution + +- **`tools` omitted**: agent definition's tools ∩ spawning agent's tools +- **`tools` provided**: provided tools ∩ spawning agent's tools + +Models always come from the agent definition. + +--- + +## Changes by file + +### 1. `packages/core/src/tools/summon.ts` + +**Schema changes:** + +- `agent` — change from `.optional()` to required +- `top_level` — new `z.boolean().optional()`, **only included in the schema when `userAgentEnabled` is `true`** +- `tools` — stays optional. Description updated: "Defaults to the agent definition's tools. Intersected with the spawning agent's tools." +- `background` — stays optional, ignored when `top_level: true` +- `working_directory` — stays optional + +**Factory signature change:** + +```ts +createSummonTool( + defaultWorkingDirectory: string, + callbacks: SummonCallbacks, + availableSubagents: AvailableAgent[], // is_subagent === true + availableUserAgents: AvailableAgent[], // is_subagent !== true + agentDirs: string[], + userAgentEnabled: boolean, // new — controls whether top_level param + user agent catalog exist +) +``` + +**`SummonCallbacks.spawn` interface:** + +- Add `topLevel?: boolean` to the spawn options object + +**`buildAgentsCatalog` update:** + +The catalog in the tool description is built conditionally: + +- **When `userAgentEnabled` is `false`**: only show the subagents group: + ``` + Available agents: + - programmer: Programmer — Implements code from a given plan + - flash: Flash — A cheap subagent + ``` + +- **When `userAgentEnabled` is `true`**: show two labeled groups: + ``` + Subagents (spawned as child tabs): + - programmer: Programmer — Implements code from a given plan + - flash: Flash — A cheap subagent + + User agents (spawned as independent top-level tabs, requires top_level=true): + - default: Default — Default agent with all tools enabled + ``` + +**`toAvailableAgents` → split into two functions:** + +- Rename existing to `toAvailableSubagents()` — keeps `is_subagent === true` filter +- New `toAvailableUserAgents()` — filters `is_subagent !== true` + +**Execute logic when `top_level: true`:** + +- Always return immediately with `agent_id` (fire-and-forget, ignore `background`) +- Call `callbacks.spawn(...)` with `topLevel: true` + +--- + +### 2. `packages/core/src/index.ts` + +- Re-export renamed `toAvailableSubagents` and new `toAvailableUserAgents` + +--- + +### 3. `packages/api/src/agent-manager.ts` + +**`getOrCreateAgentForTab` — permission reading & tool construction:** + +- Read new setting: `const permUserAgent = getSetting("perm_user_agent") === "allow"` +- Include `permUserAgent` in the `permKey` cache-invalidation string +- Load user agent definitions via `toAvailableUserAgents(...)` +- Pass `userAgentEnabled: permUserAgent` and `availableUserAgents` to `createSummonTool` + +**`spawnChildAgent` — when `topLevel: true`:** + +- **`parentTabId`**: pass `null` to `createTab()` and the `tab-created` event +- **Working directory**: use the agent definition's `cwd` if set, otherwise the global default (`DISPATCH_WORKING_DIR` / `process.cwd()`). **No containment check** against parent's directory. +- **Tools**: from definition (or `tools` param if provided), intersected with spawning agent's tools. Same intersection logic as subagents — can't escalate. +- **Models**: from the agent definition's `models` array +- **No completion tracking**: skip `completionPromise` / `completionResolve` setup. Leave them `undefined`. + +**`getChildResult` guard:** + +- If the tab has no `completionPromise` and status is `running`, return error: `"This is a user agent (top-level tab) and cannot be retrieved. User agents are fire-and-forget."` + +--- + +### 4. `packages/frontend/src/lib/components/ToolPermissions.svelte` + +Add a new entry to the `toolPermissions` array: + +```ts +{ + id: "user_agent", + label: "Spawn user agents", + description: "Allow the AI to open new independent top-level tabs" +} +``` + +--- + +### 5. `packages/frontend/src/lib/settings.svelte.ts` + +Add `user_agent: false` to the default `toolPerms` and `savedToolPerms` objects. + +--- + +## Files NOT changing + +| File | Why | +|---|---| +| `packages/frontend/src/lib/components/TabBar.svelte` | Already renders `parentTabId === null` tabs in top row as persistent | +| `packages/frontend/src/lib/tabs.svelte.ts` (`tab-created` handler) | Already sets `persistent: parentTabId == null` | +| `packages/core/src/tools/retrieve.ts` | Unchanged — the retrieve guard lives in AgentManager | +| `packages/core/src/agents/loader.ts` | `is_subagent` already exists and distinguishes the two types | +| DB schema | The `settings` table is key-value, no migration needed | +| Agent definition TOML format | `is_subagent` already exists | + +--- + +## Complete file change list + +| File | Change | +|---|---| +| `packages/core/src/tools/summon.ts` | `agent` required, conditional `top_level` param, two-group catalog (subagents-only when no perm), `toAvailableSubagents()` + `toAvailableUserAgents()`, spawn interface update | +| `packages/core/src/index.ts` | Re-export new/renamed functions | +| `packages/api/src/agent-manager.ts` | Read `perm_user_agent`, pass to factory, handle `topLevel` in spawn, retrieve guard | +| `packages/frontend/src/lib/components/ToolPermissions.svelte` | Add "Spawn user agents" checkbox | +| `packages/frontend/src/lib/settings.svelte.ts` | Add `user_agent` default | + +--- + +## Risks / edge cases + +- **Nested user agents**: A user agent could itself have `perm_user_agent` and spawn more user agents. This is allowed and works naturally since user agents are independent tabs with no parent chain. +- **Agent definition with no models**: Should not happen in practice — the Agent Builder UI requires at least one model entry. But if it does, the spawn will fail at the model-resolution step with a clear error. +- **Retrieve on user agent**: Guarded in `getChildResult` — returns an error message explaining user agents are fire-and-forget. diff --git a/notes/todo-tool-redesign-plan.md b/notes/todo-tool-redesign-plan.md new file mode 100644 index 0000000..7e3af48 --- /dev/null +++ b/notes/todo-tool-redesign-plan.md @@ -0,0 +1,86 @@ +# Todo/Task tool redesign — port opencode's effective design into Dispatch + +## 1. What each implementation does today + +### Dispatch (current — disabled because it confused agents) +- Tool name: `todo`. **Imperative CRUD** with 5 actions: `add`, `update`, `list`, `get`, `remove`. +- Each task gets a **server-assigned opaque id** (`task-1`, `task-2`, …) returned by `add`. +- To mutate state the model must call `update` / `remove` **with the right `task_id`**. +- `TaskItem = { id, title, description, status }`, status `pending | in_progress | done | blocked`. +- Wiring: `packages/core/src/tools/task-list.ts` (class `TaskList` + `createTaskListTool`), + instantiated per-tab in `packages/api/src/agent-manager.ts`, broadcast via the + `task-list-update` agent event, rendered by `packages/frontend/.../TaskListPanel.svelte`. + +### opencode (effective) +- Tool name: `todowrite`. **One declarative action**: a single param `todos` containing the + **entire list**. Every call **replaces** the whole list. +- Todo shape: `{ content, status, priority }`. **No ids exposed to the model.** +- Statuses: `pending | in_progress | completed | cancelled`. Priority: `high | medium | low`. +- Rich tool description (`todowrite.txt`) + heavy **system-prompt reinforcement** + (`prompt/anthropic.txt` "Task Management" section) with worked examples of the *flow*. +- Persisted per session, broadcast via a `todo.updated` bus event to the UI. + +## 2. Why opencode's is effective and Dispatch's spirals + +The single biggest problem with Dispatch's version is that it is an **imperative, id-based, +multi-action API**. That creates several failure modes that make weaker models spiral: + +1. **Id bookkeeping.** The model must remember each `task-N` id returned by `add` and reference + it later. LLMs lose track, **guess an id**, and hit `Error: Task with ID 'task-3' not found`, + then thrash trying to re-sync with `list`/`get`. +2. **Many round-trips.** Setting up a 5-item plan and marking one in progress is **6+ tool calls** + (5×`add` + 1×`update`). Re-planning means `remove`+`add` churn. +3. **Delta reasoning.** The model has to reason about *current server state* vs *desired state* and + emit the diff. LLMs are far better at emitting a **full desired state**. +4. **Inconsistent surface.** `blocked` exists in the type but is never explained in the prompt and + isn't rendered — dead state that invites confusion. + +opencode's `todowrite` removes the entire class of problems: it is **declarative / idempotent**. +The model emits the full desired list each time; no ids, no deltas, no "not found" errors, and a +5-item plan + one in-progress is **one call**. The strong description + system-prompt examples teach +the *cadence* (write list → mark in_progress → work → mark completed → next). + +## 3. Plan for Dispatch + +Goal: keep all existing plumbing (tool **name `todo`**, the `task-list-update` event, the +`TaskList` per-tab store, the sidebar panel) but swap the **imperative CRUD interface for a +declarative whole-list write**, matching opencode's model. Keeping the name `todo` means zero churn +in the allowlist/summon/loader/permission wiring and existing agent TOMLs. + +### Core (`packages/core`) +1. `types/index.ts` + - `TaskStatus = "pending" | "in_progress" | "completed" | "cancelled"`. + - `TaskPriority = "high" | "medium" | "low"`. + - `TaskItem = { id; content; status; priority }` (`id` kept = positional, for UI keying + event + contract; **never shown to the model**). +2. `tools/task-list.ts` + - Replace the CRUD `TaskList` (add/update/get/remove) with a declarative store: + `setTasks(items)` rebuilds the list (positional ids), `getTasks()`, `onChange()`. + - `createTaskListTool` exposes ONE param `todos: Array<{content, status, priority}>`; `execute` + calls `setTasks` and echoes the canonical stored list back (without ids). Robust defaults: + missing `priority`→`medium`, missing/invalid `status`→`pending`, empty array clears the list. + - New rich `TODO_DESCRIPTION` ported/adapted from opencode's `todowrite.txt` (When to use / When + NOT / States / Rules / Examples), emphasising "send the whole list every time, no ids". + +### API (`packages/api/src/agent-manager.ts`) +3. Replace `TODO_GUIDANCE` with an opencode-style **"Task Management"** system-prompt section: + declarative whole-list semantics, "use it VERY frequently", one `in_progress` at a time, mark + completed immediately, plus the two worked examples. Update `TOOL_DESCRIPTIONS.todo`. + (Wiring via `createTaskListTool(tabAgent.taskList)` and `onChange → task-list-update` is unchanged.) + +### Frontend (`packages/frontend`) +4. `lib/types.ts`: mirror the new `TaskItem` (`{ id, content, status, priority }` + new status union). +5. `lib/components/TaskListPanel.svelte`: render `content`; map statuses (completed→checked/struck, + in_progress→indeterminate/bold, cancelled→dim/struck, pending→empty); subtle `high` priority hint; + update the counter label (`completed`/`in progress`). + +### Tests + verification +6. `packages/core/tests/tools/task-list.test.ts`: empty list, whole-list replace, status/priority + preservation, default priority, `onChange` fires, tool `execute` updates store + echoes no ids. +7. `bun run typecheck` (core), `bun run test` (vitest), `bun run check` (biome). + +### Out of scope / deliberately unchanged +- Tool name stays `todo`; `expandAgentToolNames`, summon defaults, permission keys, agent TOMLs + untouched. +- Persistence to DB (opencode stores todos in SQLite) is **not** added — Dispatch keeps the existing + in-memory per-tab `TaskList`; the visible/UX behaviour is what was failing, and that's what we fix. diff --git a/notes/wishlist.md b/notes/wishlist.md index aa806b5..8c3f5ef 100644 --- a/notes/wishlist.md +++ b/notes/wishlist.md @@ -7,8 +7,6 @@ - Start a chat on one device (e.g. desktop) and seamlessly pick it up later on another (e.g. phone). - Sidebar remembers which views were open and in what order, restoring them exactly as they were. -- **Edit chat history.** Click on any existing message in the chat history and choose to edit it — this applies to user messages, AI responses, and tool results. - - **Update the way tools appear in the chat UI.** Improve the visual presentation of tool calls and their results — make them more readable, compact, and scannable. - **Show git diffs for edited files.** When the AI edits a file (write_file tool call), display a git diff in the UI rather than just the raw file content. diff --git a/packages/api/src/agent-manager.ts b/packages/api/src/agent-manager.ts index 111237c..517c661 100644 --- a/packages/api/src/agent-manager.ts +++ b/packages/api/src/agent-manager.ts @@ -15,8 +15,10 @@ import { createListFilesTool, createReadFileSliceTool, createReadFileTool, + createReadTabTool, createRetrieveTool, createRunShellTool, + createSendToTabTool, createSkillsWatcher, createSummonTool, createTaskListTool, @@ -32,6 +34,8 @@ import { getClaudeAccountsFromDB, getMessagesForTab, getSetting, + getTab, + listOpenTabs, loadAgent, loadAgents, loadConfig, @@ -41,8 +45,11 @@ import { refreshAccountCredentials, refreshAccountCredentialsAsync, resolveApiKey, + resolveTabPrefix, type SkillDefinition, type SystemChunkKind, + shortestUniquePrefix, + type TabResolution, type TabStatusSnapshot, TaskList, toAvailableSubagents, @@ -73,6 +80,16 @@ const TOOL_DESCRIPTIONS: Record<string, string> = { "Fetch the transcript/subtitles for a YouTube video. Set background=true to start in the background and get a job_id for later retrieval.", }; +/** + * Maximum number of CONSECUTIVE agent-to-agent auto-wakes a tab will accept + * before it stops auto-responding and waits for a human. Each `send_to_tab` + * that would wake an idle tab consumes one unit; any human-originated message + * (e.g. via `POST /chat`) refills the budget to full. This bounds runaway + * agent ping-pong loops (A wakes B wakes A ...) that would otherwise spend + * tokens unbounded with no human in the loop. See notes/plan-tab-comm.md. + */ +const MAX_AGENT_AUTO_WAKES = 6; + const DEFAULT_SYSTEM_PROMPT = "You are Dispatch, an agent designed to help with any task that the user asks for. Be helpful and concise."; @@ -197,6 +214,14 @@ interface TabAgent { * rows. Set at the start of `processMessage`, cleared when the turn ends. */ currentTurnId: string | null; + /** + * Remaining consecutive agent-to-agent auto-wakes this tab will accept + * before requiring human intervention (see `MAX_AGENT_AUTO_WAKES`). + * Refilled to the max by any human-originated `deliverMessage`; decremented + * each time an agent-originated `send_to_tab` wakes this tab from idle. When + * it hits 0, further agent messages are queued but do NOT start a turn. + */ + autoWakeBudget: number; } export class AgentManager { @@ -343,6 +368,7 @@ export class AgentManager { currentChunks: null, currentAssistantId: null, currentTurnId: null, + autoWakeBudget: MAX_AGENT_AUTO_WAKES, }; this.tabAgents.set(tabId, tabAgent); } @@ -366,10 +392,12 @@ export class AgentManager { const permBash = getSetting("perm_bash") === "allow"; const permSummon = getSetting("perm_summon") === "allow"; const permUserAgent = getSetting("perm_user_agent") === "allow"; + const permSendToTab = getSetting("perm_send_to_tab") === "allow"; + const permReadTab = getSetting("perm_read_tab") === "allow"; const permWebSearch = getSetting("perm_web_search") === "allow"; const permYoutubeTranscribe = getSetting("perm_youtube_transcribe") === "allow"; const sysPrompt = getSetting("system_prompt") ?? ""; - const permKey = `${permRead}:${permEdit}:${permBash}:${permSummon}:${permUserAgent}:${permWebSearch}:${permYoutubeTranscribe}:${sysPrompt}`; + const permKey = `${permRead}:${permEdit}:${permBash}:${permSummon}:${permUserAgent}:${permSendToTab}:${permReadTab}:${permWebSearch}:${permYoutubeTranscribe}:${sysPrompt}`; // If the override differs or permissions changed, invalidate the cached agent if ( @@ -504,6 +532,12 @@ export class AgentManager { }), }); } + // Tab-to-tab communication — gated on the child whitelist. + if (allowed.has("send_to_tab") || allowed.has("read_tab")) { + for (const entry of this.buildTabCommToolEntries(tabId)) { + if (allowed.has(entry.name)) toolEntries.push(entry); + } + } } else { // Parent agent: use permission settings from DB if (permRead) { @@ -581,6 +615,14 @@ export class AgentManager { }), }); } + if (permSendToTab || permReadTab) { + const tabCommAllowed = new Set<string>(); + if (permSendToTab) tabCommAllowed.add("send_to_tab"); + if (permReadTab) tabCommAllowed.add("read_tab"); + for (const entry of this.buildTabCommToolEntries(tabId)) { + if (tabCommAllowed.has(entry.name)) toolEntries.push(entry); + } + } } const tools = toolEntries.map((e) => e.tool); @@ -971,14 +1013,18 @@ export class AgentManager { if (!agentDef) { const allDefs = loadAgents(parentEffectiveDir); if (options.topLevel) { - const userAgents = allDefs.filter((d) => !d.is_subagent).map((d) => `${d.slug} (${d.name})`); + const userAgents = allDefs + .filter((d) => !d.is_subagent) + .map((d) => `${d.slug} (${d.name})`); const hint = userAgents.length > 0 ? ` Available user agents: ${userAgents.join(", ")}.` : " No user agent definitions exist yet."; throw new Error(`Agent definition not found: "${options.agentSlug}".${hint}`); } else { - const subagents = allDefs.filter((d) => d.is_subagent).map((d) => `${d.slug} (${d.name})`); + const subagents = allDefs + .filter((d) => d.is_subagent) + .map((d) => `${d.slug} (${d.name})`); const hint = subagents.length > 0 ? ` Available subagents: ${subagents.join(", ")}.` @@ -1147,7 +1193,8 @@ export class AgentManager { if (tabAgent.status === "running") { return { status: "error", - error: "This is a user agent (top-level tab) and cannot be retrieved. User agents are fire-and-forget.", + error: + "This is a user agent (top-level tab) and cannot be retrieved. User agents are fire-and-forget.", }; } return { @@ -1159,6 +1206,214 @@ export class AgentManager { return tabAgent.completionPromise; } + // ─── Tab-to-tab communication ─────────────────────────────────── + // + // `send_to_tab` / `read_tab` let an agent message a peer tab by its short + // handle (a git-style prefix of the tab UUID). Delivery reuses the exact + // running→queue / idle→new-turn routing that `POST /chat` uses (see + // `deliverMessage`), so an agent message behaves identically to a user one. + + /** + * Build the `send_to_tab` + `read_tab` tool entries for `tabId`. Shared by + * both tool-construction paths (child whitelist + permission-gated parent). + * `selfHandle` is computed once so the calling tab can stamp provenance and + * reject self-sends. + */ + private buildTabCommToolEntries( + tabId: string, + ): Array<{ name: string; tool: ReturnType<typeof createSendToTabTool> }> { + const selfHandle = shortestUniquePrefix(tabId); + return [ + { + name: "send_to_tab", + tool: createSendToTabTool({ + resolveShortId: (prefix) => this.resolveTabHandle(prefix), + // origin: "agent" subjects this to the receiver's auto-wake + // budget so agent↔agent loops are bounded (see deliverMessage). + deliver: (targetId, message) => + this.deliverMessage(targetId, message, { origin: "agent" }), + listOpenHandles: () => this.listOpenHandles(tabId), + self: { id: tabId, handle: selfHandle }, + }), + }, + { + name: "read_tab", + tool: createReadTabTool({ + resolveShortId: (prefix) => this.resolveTabHandle(prefix), + getLastResponse: (targetId) => this.getLastTabResponse(targetId), + listOpenHandles: () => this.listOpenHandles(tabId), + }), + }, + ]; + } + + /** + * Project a core `ResolveTabPrefixResult` down to the tool-facing + * `TabResolution` (minimal `{ id, title, handle }` refs). Each match's + * `handle` is recomputed via `shortestUniquePrefix` so the value the tool + * echoes back always matches what the UI currently shows. + */ + private resolveTabHandle(prefix: string): TabResolution { + const res = resolveTabPrefix(prefix); + if (res.status === "none") return { status: "none" }; + if (res.status === "ok") { + return { + status: "ok", + tab: { + id: res.tab.id, + title: res.tab.title, + handle: shortestUniquePrefix(res.tab.id), + }, + }; + } + return { + status: "ambiguous", + matches: res.matches.map((t) => ({ + id: t.id, + title: t.title, + handle: shortestUniquePrefix(t.id), + })), + }; + } + + /** Snapshot of open tabs as `{ handle, title }`, excluding `exceptId` + * (typically the caller's own tab). Drives the "available tabs" hints. */ + private listOpenHandles(exceptId?: string): Array<{ handle: string; title: string }> { + return listOpenTabs() + .filter((t) => t.id !== exceptId) + .map((t) => ({ handle: shortestUniquePrefix(t.id), title: t.title })); + } + + /** + * Return a tab's most recent COMPLETED assistant turn as flat text, plus + * its current status. Reads the persisted chunk log (source of truth) and + * grabs the last `role === "assistant"` group's text chunks. `text` is null + * when no completed assistant turn exists yet. + */ + getLastTabResponse(tabId: string): { text: string | null; status: AgentStatus } { + const status = this.getTabStatus(tabId); + try { + const messages = getMessagesForTab(tabId); + for (let i = messages.length - 1; i >= 0; i--) { + const msg = messages[i]; + if (!msg || msg.role !== "assistant") continue; + const text = msg.chunks + .filter((c): c is { type: "text"; text: string } => c.type === "text") + .map((c) => c.text) + .join("") + .trim(); + if (text.length > 0) return { text, status }; + } + } catch { + // DB unavailable / tab unknown — fall through to null. + } + return { text: null, status }; + } + + /** + * Deliver `message` to `tabId`, choosing the SAME routing as `POST /chat`: + * - target running → queue it (consumed like a user interrupt). + * - target idle/errored → wake it and start a new turn. + * + * Returns quickly; does NOT block on the turn. Both the HTTP `/chat` path + * and the `send_to_tab` tool call through here so the running/idle decision + * lives in exactly one place. + * + * `opts` carries the per-request knobs `/chat` forwards (key/model, agent + * fallback chain, reasoning effort, working dir, an explicit queue id). The + * `send_to_tab` tool passes none of these — for a cold wake (a tab not in + * `tabAgents`, e.g. after a server restart) the key/model are hydrated from + * the live `TabAgent` if present, else from the persisted tab row. (A cold + * tab keeps its stored key/model but not its full agent-definition fallback + * chain — see plan notes.) + */ + deliverMessage( + tabId: string, + message: string, + opts: { + keyId?: string; + modelId?: string; + agentModels?: Array<{ key_id: string; model_id: string }>; + reasoningEffort?: "none" | "low" | "medium" | "high" | "max"; + workingDirectory?: string; + queueId?: string; + /** + * Who is sending this message. `"human"` (default) is unrestricted + * and REFILLS the target's agent-to-agent auto-wake budget. `"agent"` + * (from the `send_to_tab` tool) is governed by that budget: an + * agent-originated wake of an idle tab consumes one unit, and once the + * budget is exhausted the message is queued WITHOUT starting a turn + * (returned as `suppressed`) so a runaway A↔B loop can't spend tokens + * forever with no human in the loop. + */ + origin?: "human" | "agent"; + } = {}, + ): { status: "queued"; messageId: string } | { status: "started" } | { status: "suppressed" } { + const origin = opts.origin ?? "human"; + + // A human touching the tab clears any accumulated agent-wake throttle: + // the conversation is back under human supervision, so peers get a fresh + // budget of auto-wakes again. + if (origin === "human") { + this._getOrCreateTabAgent(tabId).autoWakeBudget = MAX_AGENT_AUTO_WAKES; + } + + if (this.getTabStatus(tabId) === "running") { + // Busy target → always queue (consumed like a user interrupt), + // regardless of origin. Queuing does not itself start a turn, so it + // can't drive a runaway loop; we don't spend budget here. + const { messageId } = this.queueMessage(tabId, message, opts.queueId); + return { status: "queued", messageId }; + } + + // Idle/errored target → this delivery would WAKE the tab (start a turn). + // For agent-originated wakes, enforce the auto-wake budget first. + if (origin === "agent") { + const target = this._getOrCreateTabAgent(tabId); + if (target.autoWakeBudget <= 0) { + // Budget exhausted: preserve the message (queue it, never drop) + // but do NOT wake the tab. A human message will refill the budget + // and the queued message will be seen on the next human turn. + this.queueMessage(tabId, message, opts.queueId); + const notice = + `Automatic agent-to-agent message limit reached for this tab ` + + `(${MAX_AGENT_AUTO_WAKES} consecutive). Further messages from other tabs ` + + `are held until you send a message here.`; + this.emit({ type: "notice", message: notice }, tabId); + this.routeSystemEventToTab(tabId, "notice", notice); + return { status: "suppressed" }; + } + target.autoWakeBudget -= 1; + } + + // Resolve key/model: explicit opts win, then the live tab agent's, then + // the persisted row's. + const tabAgent = this.tabAgents.get(tabId); + let keyId = opts.keyId ?? tabAgent?.keyId ?? undefined; + let modelId = opts.modelId ?? tabAgent?.modelId ?? undefined; + const agentModels = opts.agentModels ?? tabAgent?.agentModels; + if (!keyId || !modelId) { + const row = getTab(tabId); + if (row) { + keyId = keyId ?? row.keyId ?? undefined; + modelId = modelId ?? row.modelId ?? undefined; + } + } + + this.processMessage( + tabId, + message, + keyId, + modelId, + opts.reasoningEffort, + opts.workingDirectory, + agentModels, + ).catch((err) => { + console.error(`[dispatch] deliverMessage processMessage error for tab ${tabId}:`, err); + }); + return { status: "started" }; + } + async processMessage( tabId: string, message: string, diff --git a/packages/api/src/app.ts b/packages/api/src/app.ts index 73d3de5..19cc193 100644 --- a/packages/api/src/app.ts +++ b/packages/api/src/app.ts @@ -56,28 +56,33 @@ app.post("/chat", async (c) => { return c.json({ error: "message must be a non-empty string" }, 400); } - if (agentManager.getTabStatus(tabId) === "running") { - const queueId = typeof body.queueId === "string" ? body.queueId : undefined; - const { messageId } = agentManager.queueMessage(tabId, message, queueId); - return c.json({ status: "queued", messageId }); - } - const keyId = typeof body.keyId === "string" ? body.keyId : undefined; const modelId = typeof body.modelId === "string" ? body.modelId : undefined; const agentModels = Array.isArray(body.agentModels) ? body.agentModels : undefined; const workingDirectory = typeof body.workingDirectory === "string" ? body.workingDirectory : undefined; + const queueId = typeof body.queueId === "string" ? body.queueId : undefined; const validEfforts = ["none", "low", "medium", "high", "max"]; const reasoningEffort = typeof body.reasoningEffort === "string" && validEfforts.includes(body.reasoningEffort) ? (body.reasoningEffort as "none" | "low" | "medium" | "high" | "max") : undefined; - // Non-blocking — let the agent run in the background - agentManager - .processMessage(tabId, message, keyId, modelId, reasoningEffort, workingDirectory, agentModels) - .catch(console.error); + // Single routing decision (queue if busy, new turn if idle) shared with the + // `send_to_tab` tool via `AgentManager.deliverMessage`. Non-blocking — a + // started turn runs in the background. + const outcome = agentManager.deliverMessage(tabId, message, { + ...(keyId ? { keyId } : {}), + ...(modelId ? { modelId } : {}), + ...(agentModels ? { agentModels } : {}), + ...(reasoningEffort ? { reasoningEffort } : {}), + ...(workingDirectory !== undefined ? { workingDirectory } : {}), + ...(queueId ? { queueId } : {}), + }); + if (outcome.status === "queued") { + return c.json({ status: "queued", messageId: outcome.messageId }); + } return c.json({ status: "ok" }); }); diff --git a/packages/api/tests/agent-manager.test.ts b/packages/api/tests/agent-manager.test.ts index 4415bbb..1358eb1 100644 --- a/packages/api/tests/agent-manager.test.ts +++ b/packages/api/tests/agent-manager.test.ts @@ -24,6 +24,39 @@ function resetFakeMessages(): void { function setFakeMessages(tabId: string, rows: FakeMessageRow[]): void { fakeMessagesByTab.set(tabId, rows); } + +// Configurable stub for the tabs DB (getTab / listOpenTabs). Tests can seed +// rows to exercise deliverMessage cold-hydration and handle resolution. +interface FakeTabRow { + id: string; + title: string; + keyId: string | null; + modelId: string | null; + parentTabId: string | null; + status: string; + isOpen: boolean; + position: number; + createdAt: number; + updatedAt: number; +} +const fakeTabs = new Map<string, FakeTabRow>(); +function resetFakeTabs(): void { + fakeTabs.clear(); +} +function setFakeTab(row: Partial<FakeTabRow> & { id: string }): void { + fakeTabs.set(row.id, { + title: "Tab", + keyId: null, + modelId: null, + parentTabId: null, + status: "idle", + isOpen: true, + position: 0, + createdAt: 0, + updatedAt: 0, + ...row, + }); +} function makeRow( tabId: string, seq: number, @@ -42,11 +75,22 @@ function makeRow( // because the production code reassigns `agent.messages = // rows.slice(...)` AFTER `new Agent()` returns — capturing a // reference at construction would yield a stale empty array. -const constructedAgents: Array<{ initialMessages: unknown[] }> = []; +const constructedAgents: Array<{ initialMessages: unknown[]; toolNames: string[] }> = []; function resetConstructedAgents(): void { constructedAgents.length = 0; } +// Configurable settings store so tests can toggle tool permissions +// (perm_send_to_tab / perm_read_tab / ...) and assert which tools the +// constructed Agent receives. Defaults to empty (getSetting → null). +const fakeSettings = new Map<string, string>(); +function resetFakeSettings(): void { + fakeSettings.clear(); +} +function setFakeSetting(key: string, value: string): void { + fakeSettings.set(key, value); +} + // Allow tests to swap in a custom `run` generator (e.g. to simulate // a fallback failure mid-stream). Returning to undefined restores // the default. @@ -87,12 +131,19 @@ vi.mock("@dispatch/core", () => ({ Agent: class MockAgent { status = "idle"; messages: unknown[] = []; + toolNames: string[] = []; + constructor(config: { tools?: Array<{ name: string }> }) { + this.toolNames = (config?.tools ?? []).map((t) => t.name); + } async *run(message: string): AsyncGenerator<unknown> { // Snapshot the post-construction pre-populated message list // the first thing `run()` does, before the real `Agent.run` // would push the current user message at line 546. Tests // inspect this to verify history was loaded correctly. - constructedAgents.push({ initialMessages: [...this.messages] }); + constructedAgents.push({ + initialMessages: [...this.messages], + toolNames: [...this.toolNames], + }); if (runImpl) { for await (const ev of runImpl(message)) yield ev; return; @@ -244,6 +295,41 @@ vi.mock("@dispatch/core", () => ({ }; }, createTab() {}, + getTab(id: string) { + return fakeTabs.get(id) ?? null; + }, + listOpenTabs() { + return [...fakeTabs.values()].filter((t) => t.isOpen); + }, + resolveTabPrefix(prefix: string) { + const sanitized = (prefix ?? "").toLowerCase().replace(/[^0-9a-f-]/g, ""); + if (sanitized.length < 4) return { status: "none" }; + const matches = [...fakeTabs.values()].filter( + (t) => t.isOpen && t.id.toLowerCase().startsWith(sanitized), + ); + if (matches.length === 0) return { status: "none" }; + if (matches.length === 1) return { status: "ok", tab: matches[0] }; + return { status: "ambiguous", matches }; + }, + shortestUniquePrefix(id: string) { + return (id ?? "").slice(0, 4); + }, + createSendToTabTool(_callbacks: unknown): ToolDefinition { + return { + name: "send_to_tab", + description: "send to tab", + parameters: { _type: "z.ZodObject", shape: {} } as unknown as ToolDefinition["parameters"], + execute: async () => "mock", + }; + }, + createReadTabTool(_callbacks: unknown): ToolDefinition { + return { + name: "read_tab", + description: "read tab", + parameters: { _type: "z.ZodObject", shape: {} } as unknown as ToolDefinition["parameters"], + execute: async () => "mock", + }; + }, getClaudeAccountsFromDB() { return []; }, @@ -256,8 +342,8 @@ vi.mock("@dispatch/core", () => ({ resolveApiKey() { return null; }, - getSetting(_key: string) { - return null; + getSetting(key: string) { + return fakeSettings.get(key) ?? null; }, appendChunks() { return []; @@ -316,6 +402,8 @@ describe("AgentManager", () => { beforeEach(() => { resetFakeMessages(); resetConstructedAgents(); + resetFakeTabs(); + resetFakeSettings(); setRunImpl(null); appendEventToChunksSpy.mockClear(); }); @@ -849,4 +937,273 @@ describe("AgentManager", () => { expect(snap["tab-early"]).not.toHaveProperty("currentChunks"); expect(snap["tab-early"]).not.toHaveProperty("currentAssistantId"); }); + + // ─── Tab-to-tab communication ───────────────────────────────── + + describe("deliverMessage", () => { + it("starts a new turn when the target tab is idle", async () => { + const manager = new AgentManager(); + const events: AgentEvent[] = []; + manager.onEvent((e) => events.push(e)); + + const outcome = manager.deliverMessage("tab-idle", "wake up"); + expect(outcome.status).toBe("started"); + + // Let the background turn run to completion. + await new Promise<void>((r) => setTimeout(r, 60)); + expect(events.some((e) => e.type === "text-delta")).toBe(true); + expect(manager.getTabStatus("tab-idle")).toBe("idle"); + }); + + it("queues the message when the target tab is running", () => { + const manager = new AgentManager(); + const inner = manager as unknown as { + tabAgents: Map<string, Record<string, unknown>>; + }; + // Seed a running tab agent directly. + inner.tabAgents.set("tab-busy", { + agent: null, + status: "running", + keyId: null, + modelId: null, + taskList: { onChange: () => {} }, + messageQueue: [], + queueListeners: [], + shellStore: {}, + transcriptStore: {}, + currentChunks: null, + currentAssistantId: null, + currentTurnId: null, + }); + + const outcome = manager.deliverMessage("tab-busy", "queued msg"); + expect(outcome.status).toBe("queued"); + if (outcome.status === "queued") { + expect(typeof outcome.messageId).toBe("string"); + } + // The message landed on the running tab's queue. + const agent = inner.tabAgents.get("tab-busy") as { messageQueue: unknown[] }; + expect(agent.messageQueue).toHaveLength(1); + }); + + it("hydrates key/model from the persisted tab row for a cold wake", () => { + const manager = new AgentManager(); + setFakeTab({ id: "tab-cold", keyId: "persisted-key", modelId: "persisted-model" }); + + // Spy on processMessage to capture the key/model deliverMessage + // forwarded — asserting the hydration decision directly rather than + // downstream tabAgent state (which the mocked ModelRegistry rewrites). + const spy = vi.spyOn(manager, "processMessage").mockResolvedValue(undefined); + + const outcome = manager.deliverMessage("tab-cold", "hello"); + expect(outcome.status).toBe("started"); + + expect(spy).toHaveBeenCalledTimes(1); + const args = spy.mock.calls[0] ?? []; + expect(args[0]).toBe("tab-cold"); // tabId + expect(args[1]).toBe("hello"); // message + expect(args[2]).toBe("persisted-key"); // keyId hydrated from row + expect(args[3]).toBe("persisted-model"); // modelId hydrated from row + }); + + it("prefers explicit opts over the persisted row on a cold wake", () => { + const manager = new AgentManager(); + setFakeTab({ id: "tab-cold2", keyId: "row-key", modelId: "row-model" }); + const spy = vi.spyOn(manager, "processMessage").mockResolvedValue(undefined); + + manager.deliverMessage("tab-cold2", "hello", { + keyId: "explicit-key", + modelId: "explicit-model", + }); + + const args = spy.mock.calls[0] ?? []; + expect(args[2]).toBe("explicit-key"); + expect(args[3]).toBe("explicit-model"); + }); + }); + + describe("deliverMessage — agent auto-wake budget", () => { + it("allows up to 6 consecutive agent wakes, then suppresses further ones", () => { + const manager = new AgentManager(); + const spy = vi.spyOn(manager, "processMessage").mockResolvedValue(undefined); + + // 6 agent-originated wakes of an idle tab should all start turns. + for (let i = 0; i < 6; i++) { + const outcome = manager.deliverMessage("tab-pp", `msg ${i}`, { origin: "agent" }); + expect(outcome.status).toBe("started"); + } + expect(spy).toHaveBeenCalledTimes(6); + + // The 7th is suppressed: no new turn, message preserved on the queue. + const seventh = manager.deliverMessage("tab-pp", "msg 7", { origin: "agent" }); + expect(seventh.status).toBe("suppressed"); + expect(spy).toHaveBeenCalledTimes(6); // unchanged — no wake + + const inner = manager as unknown as { + tabAgents: Map<string, { messageQueue: unknown[]; autoWakeBudget: number }>; + }; + const agent = inner.tabAgents.get("tab-pp"); + expect(agent?.autoWakeBudget).toBe(0); + // Suppressed message is queued, not dropped. + expect(agent?.messageQueue).toHaveLength(1); + }); + + it("a human message refills the budget and re-enables agent wakes", () => { + const manager = new AgentManager(); + vi.spyOn(manager, "processMessage").mockResolvedValue(undefined); + + // Exhaust the budget with agent wakes. + for (let i = 0; i < 6; i++) { + manager.deliverMessage("tab-refill", `a${i}`, { origin: "agent" }); + } + expect(manager.deliverMessage("tab-refill", "blocked", { origin: "agent" }).status).toBe( + "suppressed", + ); + + // A human message refills the budget... + const humanOutcome = manager.deliverMessage("tab-refill", "human here", { + origin: "human", + }); + expect(humanOutcome.status).toBe("started"); + + const inner = manager as unknown as { + tabAgents: Map<string, { autoWakeBudget: number }>; + }; + expect(inner.tabAgents.get("tab-refill")?.autoWakeBudget).toBe(6); + + // ...so an agent can wake it again. + expect(manager.deliverMessage("tab-refill", "again", { origin: "agent" }).status).toBe( + "started", + ); + }); + + it("does not consume budget when the message is merely queued (busy target)", () => { + const manager = new AgentManager(); + const inner = manager as unknown as { + tabAgents: Map<string, Record<string, unknown>>; + }; + inner.tabAgents.set("tab-busy-budget", { + agent: null, + status: "running", + keyId: null, + modelId: null, + taskList: { onChange: () => {} }, + messageQueue: [], + queueListeners: [], + shellStore: {}, + transcriptStore: {}, + currentChunks: null, + currentAssistantId: null, + currentTurnId: null, + autoWakeBudget: 6, + }); + + const outcome = manager.deliverMessage("tab-busy-budget", "queued one", { + origin: "agent", + }); + expect(outcome.status).toBe("queued"); + // Budget untouched — queuing can't drive a runaway loop. + const agent = inner.tabAgents.get("tab-busy-budget") as { autoWakeBudget: number }; + expect(agent.autoWakeBudget).toBe(6); + }); + + it("human-originated wakes are never throttled", () => { + const manager = new AgentManager(); + const spy = vi.spyOn(manager, "processMessage").mockResolvedValue(undefined); + + // Far more than the budget, all human-originated → all start turns. + for (let i = 0; i < 10; i++) { + const outcome = manager.deliverMessage("tab-human", `h${i}`, { origin: "human" }); + expect(outcome.status).toBe("started"); + } + expect(spy).toHaveBeenCalledTimes(10); + }); + + it("defaults origin to human when unspecified (POST /chat path)", () => { + const manager = new AgentManager(); + const spy = vi.spyOn(manager, "processMessage").mockResolvedValue(undefined); + for (let i = 0; i < 8; i++) { + expect(manager.deliverMessage("tab-default", `d${i}`).status).toBe("started"); + } + expect(spy).toHaveBeenCalledTimes(8); + }); + }); + + describe("getLastTabResponse", () => { + it("returns the most recent assistant turn's text and current status", () => { + const manager = new AgentManager(); + setFakeMessages("tab-hist", [ + makeRow("tab-hist", 1, "user", [{ type: "text", text: "hi" }]), + makeRow("tab-hist", 2, "assistant", [{ type: "text", text: "first answer" }]), + makeRow("tab-hist", 3, "user", [{ type: "text", text: "again" }]), + makeRow("tab-hist", 4, "assistant", [ + { type: "text", text: "second " }, + { type: "text", text: "answer" }, + ]), + ]); + + const res = manager.getLastTabResponse("tab-hist"); + expect(res.text).toBe("second answer"); + expect(res.status).toBe("idle"); + }); + + it("returns null text when the tab has no assistant turn yet", () => { + const manager = new AgentManager(); + setFakeMessages("tab-empty", [ + makeRow("tab-empty", 1, "user", [{ type: "text", text: "hi" }]), + ]); + const res = manager.getLastTabResponse("tab-empty"); + expect(res.text).toBeNull(); + }); + + it("skips assistant turns that contain no text chunks", () => { + const manager = new AgentManager(); + setFakeMessages("tab-toolonly", [ + makeRow("tab-toolonly", 1, "assistant", [{ type: "text", text: "real answer" }]), + // A later assistant turn with only non-text chunks should be skipped. + makeRow("tab-toolonly", 2, "assistant", [{ type: "thinking", text: "hmm" }]), + ]); + const res = manager.getLastTabResponse("tab-toolonly"); + expect(res.text).toBe("real answer"); + }); + }); + + describe("send_to_tab / read_tab permission split", () => { + // Drives the real parent-path tool construction in getOrCreateAgentForTab + // by toggling the new split permissions and inspecting which tools the + // constructed Agent received. + async function toolsForPerms(tabId: string, perms: Record<string, string>): Promise<string[]> { + for (const [k, v] of Object.entries(perms)) setFakeSetting(k, v); + const manager = new AgentManager(); + await manager.processMessage(tabId, "go"); + return constructedAgents.at(-1)?.toolNames ?? []; + } + + it("grants only send_to_tab when only perm_send_to_tab is allowed", async () => { + const tools = await toolsForPerms("tab-send-only", { perm_send_to_tab: "allow" }); + expect(tools).toContain("send_to_tab"); + expect(tools).not.toContain("read_tab"); + }); + + it("grants only read_tab when only perm_read_tab is allowed", async () => { + const tools = await toolsForPerms("tab-read-only", { perm_read_tab: "allow" }); + expect(tools).toContain("read_tab"); + expect(tools).not.toContain("send_to_tab"); + }); + + it("grants both when both permissions are allowed", async () => { + const tools = await toolsForPerms("tab-both", { + perm_send_to_tab: "allow", + perm_read_tab: "allow", + }); + expect(tools).toContain("send_to_tab"); + expect(tools).toContain("read_tab"); + }); + + it("grants neither when both permissions are off", async () => { + const tools = await toolsForPerms("tab-neither", {}); + expect(tools).not.toContain("send_to_tab"); + expect(tools).not.toContain("read_tab"); + }); + }); }); diff --git a/packages/api/tests/routes.test.ts b/packages/api/tests/routes.test.ts index 4b8dd40..9ab2afe 100644 --- a/packages/api/tests/routes.test.ts +++ b/packages/api/tests/routes.test.ts @@ -166,6 +166,34 @@ vi.mock("@dispatch/core", () => ({ }; }, createTab() {}, + getTab() { + return null; + }, + listOpenTabs() { + return []; + }, + resolveTabPrefix() { + return { status: "none" }; + }, + shortestUniquePrefix(id: string) { + return (id ?? "").slice(0, 4); + }, + createSendToTabTool(_callbacks: unknown) { + return { + name: "send_to_tab", + description: "send to tab", + parameters: { _type: "z.ZodObject", shape: {} }, + execute: async () => "mock", + }; + }, + createReadTabTool(_callbacks: unknown) { + return { + name: "read_tab", + description: "read tab", + parameters: { _type: "z.ZodObject", shape: {} }, + execute: async () => "mock", + }; + }, getClaudeAccountsFromDB() { return []; }, diff --git a/packages/core/src/agents/loader.ts b/packages/core/src/agents/loader.ts index 333716e..f4a6c5a 100644 --- a/packages/core/src/agents/loader.ts +++ b/packages/core/src/agents/loader.ts @@ -108,8 +108,8 @@ export function expandAgentToolNames(tools: string[]): string[] { default: // Pass through tool names that aren't permission-group // aliases (summon, retrieve, web_search, youtube_transcribe, - // todo, and the granular file tools themselves if a user - // hand-wrote them in a TOML). + // send_to_tab, read_tab, todo, and the granular file tools + // themselves if a user hand-wrote them in a TOML). expanded.add(t); } } diff --git a/packages/core/src/db/tabs.ts b/packages/core/src/db/tabs.ts index 928f434..8b290d2 100644 --- a/packages/core/src/db/tabs.ts +++ b/packages/core/src/db/tabs.ts @@ -159,3 +159,78 @@ export function getDescendantIds(rootId: string): string[] { } return order.reverse(); } + +/** + * Minimum length of a tab-handle prefix accepted by `resolveTabPrefix`. + * Mirrors the frontend's minimum DISPLAY length (4 hex chars). Anything + * shorter is rejected as too broad — an agent must echo at least the 4-char + * handle shown in the UI. + */ +export const MIN_TAB_PREFIX_LENGTH = 4; + +/** + * Outcome of resolving a short tab handle (a git-style prefix of a tab's + * UUID) back to a concrete open tab. + * + * - `ok` — exactly one open tab matched; `tab` is it. + * - `none` — no open tab matched (bad/stale handle, or too-short prefix). + * - `ambiguous` — more than one open tab shares the prefix; `matches` lists + * them so the caller can ask for one more character (the same + * UX as `git checkout <ambiguous-sha>`). + */ +export type ResolveTabPrefixResult = + | { status: "ok"; tab: TabRow } + | { status: "none" } + | { status: "ambiguous"; matches: TabRow[] }; + +/** + * Resolve a short tab handle to a single OPEN tab by prefix match — the + * git-short-hash model. The handle is NEVER stored: it is always derived from + * (and matched against) the canonical lowercase UUID in `tabs.id`. + * + * Sanitization is mandatory because the SQLite `LIKE` operator treats `%` and + * `_` as wildcards: an unsanitized prefix like `a%` would match broadly. We + * lowercase the input (UUIDs are canonical lowercase; SQLite `LIKE` is also + * ASCII-case-insensitive by default) and strip everything outside the UUID + * alphabet `[0-9a-f-]` so no wildcard can survive into the query. + * + * A prefix shorter than `MIN_TAB_PREFIX_LENGTH` after sanitization returns + * `none` rather than matching a large swath of tabs. + * + * Only OPEN tabs (`is_open = 1`) are addressable — a closed tab's UUID prefix + * must not cause phantom ambiguity or resolve to a dead conversation. + */ +export function resolveTabPrefix(prefix: string): ResolveTabPrefixResult { + const sanitized = (prefix ?? "").toLowerCase().replace(/[^0-9a-f-]/g, ""); + if (sanitized.length < MIN_TAB_PREFIX_LENGTH) { + return { status: "none" }; + } + const db = getDatabase(); + const rows = db + .query("SELECT * FROM tabs WHERE is_open = 1 AND id LIKE $prefix ORDER BY position ASC") + .all({ $prefix: `${sanitized}%` }) as Array<Record<string, unknown>>; + if (rows.length === 0) return { status: "none" }; + if (rows.length === 1) return { status: "ok", tab: rowToTab(rows[0] as Record<string, unknown>) }; + return { status: "ambiguous", matches: rows.map(rowToTab) }; +} + +/** + * Compute the shortest unique prefix (minimum `MIN_TAB_PREFIX_LENGTH` chars) + * that identifies `tabId` among the currently OPEN tabs — the backend twin of + * the frontend's display helper. Used when a tool needs to echo a tab's own + * handle (e.g. provenance prefixes, "available tabs" hints) without trusting a + * value from the wire. + * + * Returns the full id if no shorter unique prefix exists (degenerate — only if + * two open tabs share an entire id, which UUID uniqueness precludes). + */ +export function shortestUniquePrefix(tabId: string): string { + const db = getDatabase(); + const rows = db.query("SELECT id FROM tabs WHERE is_open = 1").all() as Array<{ id: string }>; + const others = rows.map((r) => r.id).filter((id) => id !== tabId); + for (let len = MIN_TAB_PREFIX_LENGTH; len < tabId.length; len++) { + const candidate = tabId.slice(0, len); + if (!others.some((id) => id.startsWith(candidate))) return candidate; + } + return tabId; +} diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts index a3816ea..b1b17cc 100644 --- a/packages/core/src/index.ts +++ b/packages/core/src/index.ts @@ -49,6 +49,10 @@ export { createTab, getTab, listOpenTabs, + MIN_TAB_PREFIX_LENGTH, + type ResolveTabPrefixResult, + resolveTabPrefix, + shortestUniquePrefix, type TabRow, updateTabModel, updateTabStatus, @@ -78,9 +82,16 @@ export { prefix as bashArityPrefix } from "./tools/bash-arity.js"; export { createListFilesTool } from "./tools/list-files.js"; export { createReadFileTool } from "./tools/read-file.js"; export { createReadFileSliceTool } from "./tools/read-file-slice.js"; +export { createReadTabTool, type ReadTabCallbacks } from "./tools/read-tab.js"; export { createToolRegistry } from "./tools/registry.js"; export { createRetrieveTool, type RetrieveCallbacks } from "./tools/retrieve.js"; export { BackgroundShellStore, createRunShellTool } from "./tools/run-shell.js"; +export { + createSendToTabTool, + type ResolvedTabRef, + type SendToTabCallbacks, + type TabResolution, +} from "./tools/send-to-tab.js"; export { analyzeCommand } from "./tools/shell-analyze.js"; export { type AvailableAgent, diff --git a/packages/core/src/llm/debug-logger.ts b/packages/core/src/llm/debug-logger.ts index 2b7420c..072a7a1 100644 --- a/packages/core/src/llm/debug-logger.ts +++ b/packages/core/src/llm/debug-logger.ts @@ -281,11 +281,7 @@ export function logStepLifecycle(data: { * Log agent loop-level events (loop start, break conditions, etc.). * Only logs at verbosity >= 3. */ -export function logAgentLoop(data: { - tabId?: string; - event: string; - detail?: unknown; -}): void { +export function logAgentLoop(data: { tabId?: string; event: string; detail?: unknown }): void { if (!ENABLED || VERBOSITY < 3) return; const detail = data.detail !== undefined ? ` ${JSON.stringify(data.detail)}` : ""; console.error(`[dispatch-debug] AGENT tab=${data.tabId ?? "?"} ${data.event}${detail}`); @@ -308,16 +304,14 @@ export function logAgentLoop(data: { * we just clone and read once via `.text()` — simpler and safe because * non-streaming bodies are bounded. */ -export function wrapFetchWithLogging< - F extends (...args: never[]) => Promise<Response> | Response, ->(baseFetch: F, opts: { tabId?: string; modelHint?: string }): F { +export function wrapFetchWithLogging<F extends (...args: never[]) => Promise<Response> | Response>( + baseFetch: F, + opts: { tabId?: string; modelHint?: string }, +): F { if (!ENABLED) return baseFetch; const wrapped = async (...args: Parameters<F>) => { const requestId = ++seq; - const [input, init] = args as unknown as [ - RequestInfo | URL, - RequestInit | undefined, - ]; + const [input, init] = args as unknown as [RequestInfo | URL, RequestInit | undefined]; const url = typeof input === "string" ? input @@ -325,7 +319,8 @@ export function wrapFetchWithLogging< ? input.toString() : (input as Request).url; const method = - init?.method ?? (typeof input === "object" && "method" in input ? (input as Request).method : "POST"); + init?.method ?? + (typeof input === "object" && "method" in input ? (input as Request).method : "POST"); // Snapshot headers as a plain object for logging. const headerObj: Record<string, string> = {}; diff --git a/packages/core/src/tools/read-tab.ts b/packages/core/src/tools/read-tab.ts new file mode 100644 index 0000000..e80dbd0 --- /dev/null +++ b/packages/core/src/tools/read-tab.ts @@ -0,0 +1,95 @@ +import { z } from "zod"; +import type { AgentStatus, ToolDefinition } from "../types/index.js"; +import type { TabResolution } from "./send-to-tab.js"; + +export interface ReadTabCallbacks { + /** Resolve a (possibly short) handle to one open tab. */ + resolveShortId(prefix: string): TabResolution; + /** + * Return the target tab's most recent COMPLETED assistant turn as plain + * text, plus its current status. `text` is null when the tab has no + * completed assistant turn yet. + */ + getLastResponse(tabId: string): { text: string | null; status: AgentStatus }; + /** Snapshot of currently-open tabs, for "available tabs" error hints. */ + listOpenHandles(): Array<{ handle: string; title: string }>; +} + +/** Render the "available tabs" hint shared by the none/ambiguous branches. */ +function renderOpenHandles(handles: Array<{ handle: string; title: string }>): string { + if (handles.length === 0) return "No other tabs are currently open."; + const lines = handles.map((h) => ` - ${h.handle}: ${h.title}`); + return ["Currently open tabs:", ...lines].join("\n"); +} + +export function createReadTabTool(callbacks: ReadTabCallbacks): ToolDefinition { + return { + name: "read_tab", + description: [ + "Read the most recent completed response from another tab (agent) by its short ID.", + "", + "Returns a SNAPSHOT — it does NOT block or wait for the target to finish.", + " - If the target is idle, you get its just-finished turn.", + " - If the target is still running, you get its PREVIOUS completed turn (if any);", + " call read_tab again later to get the newest one.", + "", + "Use this after send_to_tab to collect another agent's reply. IDs are git-style", + "prefixes: pass any length that uniquely identifies the target (min 4 chars).", + ].join("\n"), + parameters: z.object({ + tab_id: z + .string() + .describe( + "The short ID (handle) of the tab to read, as shown in the tab bar. Any unique-length prefix works (min 4 chars).", + ), + }), + execute: async (args: Record<string, unknown>): Promise<string> => { + const rawId = (args.tab_id as string | undefined)?.trim() ?? ""; + + if (!rawId) { + return `Error: tab_id is required.\n\n${renderOpenHandles(callbacks.listOpenHandles())}`; + } + + const resolution = callbacks.resolveShortId(rawId); + + if (resolution.status === "none") { + return [ + `Error: no open tab matches the ID "${rawId}".`, + "", + renderOpenHandles(callbacks.listOpenHandles()), + ].join("\n"); + } + if (resolution.status === "ambiguous") { + const matches = resolution.matches.map((m) => ` - ${m.handle}: ${m.title}`).join("\n"); + return [ + `Error: the ID "${rawId}" is ambiguous — it matches multiple open tabs:`, + matches, + "", + "Add one or more characters to disambiguate.", + ].join("\n"); + } + + const target = resolution.tab; + const { text, status } = callbacks.getLastResponse(target.id); + + const runningNote = + status === "running" + ? " (this tab is still running; the response below is its previous completed turn — read again later for the newest)" + : ""; + + if (text === null) { + const reason = + status === "running" + ? "it is still working on its first turn" + : "it has no assistant responses yet"; + return `Tab ${target.handle} (${target.title}) has no completed response — ${reason}.`; + } + + return [ + `<tab_response tab="${target.handle}" status="${status}"${runningNote}>`, + text, + "</tab_response>", + ].join("\n"); + }, + }; +} diff --git a/packages/core/src/tools/send-to-tab.ts b/packages/core/src/tools/send-to-tab.ts new file mode 100644 index 0000000..eb86b7e --- /dev/null +++ b/packages/core/src/tools/send-to-tab.ts @@ -0,0 +1,147 @@ +import { z } from "zod"; +import type { ToolDefinition } from "../types/index.js"; + +/** + * A tab reference surfaced to the `send_to_tab` / `read_tab` tools. The tools + * are intentionally decoupled from the DB `TabRow` shape — the AgentManager + * maps `resolveTabPrefix(...)` results down to this minimal projection so the + * tools (and their unit tests) never depend on the persistence layer. + */ +export interface ResolvedTabRef { + /** The tab's canonical full UUID. */ + id: string; + /** The tab's display title (for disambiguation hints). */ + title: string; + /** The tab's current short handle (shortest unique prefix). */ + handle: string; +} + +/** + * Outcome of resolving a short tab handle. Mirrors core's + * `ResolveTabPrefixResult` but over the minimal `ResolvedTabRef` projection. + */ +export type TabResolution = + | { status: "ok"; tab: ResolvedTabRef } + | { status: "none" } + | { status: "ambiguous"; matches: ResolvedTabRef[] }; + +export interface SendToTabCallbacks { + /** Resolve a (possibly short) handle to one open tab. */ + resolveShortId(prefix: string): TabResolution; + /** + * Deliver `message` to `tabId`. If the target is mid-turn the message is + * queued (same path as a user message); if idle/errored it wakes the tab + * and starts a new turn. Returns quickly — does NOT block on the turn. + */ + deliver( + tabId: string, + message: string, + ): + | Promise<{ status: "queued" | "started" | "suppressed" }> + | { status: "queued" | "started" | "suppressed" }; + /** Snapshot of currently-open tabs, for "available tabs" error hints. */ + listOpenHandles(): Array<{ handle: string; title: string }>; + /** The calling tab's own id + handle — used to block self-sends and to + * stamp provenance onto the delivered message. */ + self: { id: string; handle: string }; +} + +/** Render the "available tabs" hint shared by the none/ambiguous branches. */ +function renderOpenHandles(handles: Array<{ handle: string; title: string }>): string { + if (handles.length === 0) return "No other tabs are currently open."; + const lines = handles.map((h) => ` - ${h.handle}: ${h.title}`); + return ["Currently open tabs:", ...lines].join("\n"); +} + +export function createSendToTabTool(callbacks: SendToTabCallbacks): ToolDefinition { + return { + name: "send_to_tab", + description: [ + "Send a message to another tab (agent) by its short ID — the handle shown in the tab bar.", + "", + "Behaviour mirrors a user sending a message:", + " - If the target tab is mid-turn (busy), your message is QUEUED and picked up next.", + " - If the target tab is idle, your message WAKES it and starts a new turn.", + "", + "This is fire-and-forget: it returns immediately and does NOT wait for a reply.", + "Use the 'read_tab' tool with the same ID later to read the target's latest response.", + "", + "Your tab ID is auto-added to the top of the message so the recipient can reply to you.", + "IDs are git-style prefixes: pass any length that uniquely identifies the target (min 4 chars).", + "If the ID is ambiguous you'll be asked to add a character.", + ].join("\n"), + parameters: z.object({ + tab_id: z + .string() + .describe( + "The short ID (handle) of the target tab, as shown in the tab bar. Any unique-length prefix of the tab's id works (min 4 chars).", + ), + message: z + .string() + .describe("The message to deliver to the target tab, exactly as a user would type it."), + }), + execute: async (args: Record<string, unknown>): Promise<string> => { + const rawId = (args.tab_id as string | undefined)?.trim() ?? ""; + const message = (args.message as string | undefined) ?? ""; + + if (!rawId) { + return `Error: tab_id is required.\n\n${renderOpenHandles(callbacks.listOpenHandles())}`; + } + if (!message.trim()) { + return "Error: message must not be empty."; + } + + const resolution = callbacks.resolveShortId(rawId); + + if (resolution.status === "none") { + return [ + `Error: no open tab matches the ID "${rawId}".`, + "", + renderOpenHandles(callbacks.listOpenHandles()), + ].join("\n"); + } + if (resolution.status === "ambiguous") { + const matches = resolution.matches.map((m) => ` - ${m.handle}: ${m.title}`).join("\n"); + return [ + `Error: the ID "${rawId}" is ambiguous — it matches multiple open tabs:`, + matches, + "", + "Add one or more characters to disambiguate.", + ].join("\n"); + } + + const target = resolution.tab; + + if (target.id === callbacks.self.id) { + return "Error: cannot send a message to your own tab."; + } + + // Stamp provenance so the recipient (and the watching user) can see + // which tab the message came from and reply back via its handle. + const delivered = `[message from tab ${callbacks.self.handle}]\n\n${message}`; + + try { + const result = await callbacks.deliver(target.id, delivered); + if (result.status === "suppressed") { + // The target hit its automatic agent-to-agent wake limit. The + // message was preserved (queued) but did NOT start a turn — a + // human must step in. Tell the sender plainly so it stops + // hammering the target and creating a runaway loop. + return [ + `Message HELD for tab ${target.handle} (${target.title}) — it was NOT delivered as a wake.`, + `That tab has reached its automatic agent-to-agent message limit, so it will not`, + `auto-respond again until a human sends it a message. Do not keep resending:`, + `your message is already queued and will be seen when a human resumes that tab.`, + ].join("\n"); + } + const verb = + result.status === "queued" + ? "queued (target is busy; it will be picked up next turn)" + : "delivered (target was idle; a new turn has started)"; + return `Message ${verb}. Target tab: ${target.handle} (${target.title}). Use read_tab with "${target.handle}" to read its reply later.`; + } catch (err) { + return `Error delivering message: ${err instanceof Error ? err.message : String(err)}`; + } + }, + }; +} diff --git a/packages/core/src/tools/summon.ts b/packages/core/src/tools/summon.ts index d40f0ca..4820e89 100644 --- a/packages/core/src/tools/summon.ts +++ b/packages/core/src/tools/summon.ts @@ -177,6 +177,8 @@ export function createSummonTool( " - retrieve: Collect results from its children (required if summon is given)", " - web_search: Search the web", " - youtube_transcribe: Fetch YouTube video transcripts", + " - send_to_tab: Send a message to another tab/agent by its ID", + " - read_tab: Read another tab/agent's latest response by its ID", "", "The 'agent' parameter is required — every spawned agent must use a definition.", "Tools default to the agent definition's tools, intersected with your own tools (you can't grant capabilities you don't have).", @@ -232,6 +234,8 @@ export function createSummonTool( "retrieve", "web_search", "youtube_transcribe", + "send_to_tab", + "read_tab", ]), ) .optional() @@ -262,7 +266,9 @@ export function createSummonTool( const tools = args.tools as string[] | undefined; const workingDirectory = args.working_directory as string | undefined; const background = (args.background as boolean | undefined) ?? false; - const topLevel = userAgentEnabled ? ((args.top_level as boolean | undefined) ?? false) : false; + const topLevel = userAgentEnabled + ? ((args.top_level as boolean | undefined) ?? false) + : false; try { const agentId = await callbacks.spawn({ diff --git a/packages/core/tests/agents/loader.test.ts b/packages/core/tests/agents/loader.test.ts index 88173ea..35ab0cf 100644 --- a/packages/core/tests/agents/loader.test.ts +++ b/packages/core/tests/agents/loader.test.ts @@ -22,10 +22,34 @@ describe("expandAgentToolNames", () => { expect(out).toContain("run_shell"); }); + it("passes through the tab tools as independent names (no tab_comm group)", () => { + const out = expandAgentToolNames(["send_to_tab", "read_tab"]); + expect(out).toContain("send_to_tab"); + expect(out).toContain("read_tab"); + // Granting only one must not pull in the other. + const onlySend = expandAgentToolNames(["send_to_tab"]); + expect(onlySend).toContain("send_to_tab"); + expect(onlySend).not.toContain("read_tab"); + }); + it("passes through non-group tool names unchanged", () => { - const out = expandAgentToolNames(["summon", "retrieve", "web_search", "youtube_transcribe"]); + const out = expandAgentToolNames([ + "summon", + "retrieve", + "web_search", + "youtube_transcribe", + "send_to_tab", + "read_tab", + ]); expect(out).toEqual( - expect.arrayContaining(["summon", "retrieve", "web_search", "youtube_transcribe"]), + expect.arrayContaining([ + "summon", + "retrieve", + "web_search", + "youtube_transcribe", + "send_to_tab", + "read_tab", + ]), ); }); diff --git a/packages/core/tests/db/tabs.test.ts b/packages/core/tests/db/tabs.test.ts index e1c9bf8..67533dc 100644 --- a/packages/core/tests/db/tabs.test.ts +++ b/packages/core/tests/db/tabs.test.ts @@ -73,6 +73,22 @@ class FakeDatabase { return [{ max_pos: maxPos }]; } + // resolveTabPrefix: open tabs whose id starts with a sanitized prefix. + // The production query binds `$prefix` as `<sanitized>%`; emulate SQLite + // LIKE prefix semantics here (case-insensitive, `%` = "rest of string"). + if (norm === "SELECT * FROM tabs WHERE is_open = 1 AND id LIKE $prefix ORDER BY position ASC") { + const raw = String(params?.$prefix ?? ""); + const needle = raw.endsWith("%") ? raw.slice(0, -1) : raw; + return this.rows + .filter((r) => r.is_open === 1 && r.id.toLowerCase().startsWith(needle.toLowerCase())) + .sort((a, b) => a.position - b.position); + } + + // shortestUniquePrefix: all open tab ids. + if (norm === "SELECT id FROM tabs WHERE is_open = 1") { + return this.rows.filter((r) => r.is_open === 1).map((r) => ({ id: r.id })); + } + throw new Error(`FakeDatabase: unsupported SELECT: ${norm}`); } @@ -134,7 +150,8 @@ vi.mock("../../src/db/index.js", () => ({ // Dynamic import AFTER `vi.mock` registers (vitest hoists `vi.mock` to // the very top of the file, so by the time this line runs the mock is // active for `./index.js` resolution inside `tabs.ts`). -const { archiveTab, createTab, getDescendantIds, getTab } = await import("../../src/db/tabs.js"); +const { archiveTab, createTab, getDescendantIds, getTab, resolveTabPrefix, shortestUniquePrefix } = + await import("../../src/db/tabs.js"); beforeAll(() => { fakeDb = new FakeDatabase(); @@ -234,3 +251,103 @@ describe("getDescendantIds", () => { expect(ids2).toEqual(["b1", "a1"]); }); }); + +// --------------------------------------------------------------------------- +// resolveTabPrefix — git-style short-handle resolution +// --------------------------------------------------------------------------- +describe("resolveTabPrefix", () => { + it("returns none when the prefix is shorter than the minimum length", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "A"); + // 3 chars < MIN_TAB_PREFIX_LENGTH (4) + expect(resolveTabPrefix("abc").status).toBe("none"); + }); + + it("returns none when no open tab matches", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "A"); + expect(resolveTabPrefix("ffff").status).toBe("none"); + }); + + it("resolves a unique 4-char prefix to the single matching tab", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "Alpha"); + createTab("9999aaaa-0000-4000-8000-000000000000", "Beta"); + const res = resolveTabPrefix("abcd"); + expect(res.status).toBe("ok"); + if (res.status === "ok") { + expect(res.tab.id).toBe("abcd1234-0000-4000-8000-000000000000"); + expect(res.tab.title).toBe("Alpha"); + } + }); + + it("resolves the full UUID (a maximal prefix)", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "Alpha"); + const res = resolveTabPrefix("abcd1234-0000-4000-8000-000000000000"); + expect(res.status).toBe("ok"); + }); + + it("reports ambiguity when multiple open tabs share the prefix", () => { + createTab("abcd1111-0000-4000-8000-000000000000", "One"); + createTab("abcd2222-0000-4000-8000-000000000000", "Two"); + const res = resolveTabPrefix("abcd"); + expect(res.status).toBe("ambiguous"); + if (res.status === "ambiguous") { + expect(res.matches).toHaveLength(2); + expect(res.matches.map((m) => m.title).sort()).toEqual(["One", "Two"]); + } + }); + + it("disambiguates when one more character is supplied", () => { + createTab("abcd1111-0000-4000-8000-000000000000", "One"); + createTab("abcd2222-0000-4000-8000-000000000000", "Two"); + const res = resolveTabPrefix("abcd1"); + expect(res.status).toBe("ok"); + if (res.status === "ok") expect(res.tab.title).toBe("One"); + }); + + it("matches case-insensitively (UUIDs are lowercase; LIKE is ASCII-CI)", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "Alpha"); + const res = resolveTabPrefix("ABCD"); + expect(res.status).toBe("ok"); + }); + + it("sanitizes LIKE wildcards so they cannot broaden the match", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "Alpha"); + createTab("9999aaaa-0000-4000-8000-000000000000", "Beta"); + // `%` would match everything if not stripped; after sanitization the + // query is effectively `abcd%` which matches only Alpha. + const res = resolveTabPrefix("ab%d"); + // "ab%d" -> sanitized "abd" (3 chars) -> below min length -> none. + expect(res.status).toBe("none"); + }); + + it("excludes archived (closed) tabs from matches", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "Alpha"); + archiveTab("abcd1234-0000-4000-8000-000000000000"); + expect(resolveTabPrefix("abcd").status).toBe("none"); + }); +}); + +// --------------------------------------------------------------------------- +// shortestUniquePrefix — display-handle derivation +// --------------------------------------------------------------------------- +describe("shortestUniquePrefix", () => { + it("returns a 4-char prefix when no other open tab collides", () => { + createTab("abcd1234-0000-4000-8000-000000000000", "Alpha"); + expect(shortestUniquePrefix("abcd1234-0000-4000-8000-000000000000")).toBe("abcd"); + }); + + it("grows the prefix one char at a time on a collision", () => { + createTab("abcd1111-0000-4000-8000-000000000000", "One"); + createTab("abcd2222-0000-4000-8000-000000000000", "Two"); + // First differing char is at index 4, so a 5-char prefix is unique. + expect(shortestUniquePrefix("abcd1111-0000-4000-8000-000000000000")).toBe("abcd1"); + expect(shortestUniquePrefix("abcd2222-0000-4000-8000-000000000000")).toBe("abcd2"); + }); + + it("ignores closed tabs when computing uniqueness", () => { + createTab("abcd1111-0000-4000-8000-000000000000", "One"); + createTab("abcd2222-0000-4000-8000-000000000000", "Two"); + archiveTab("abcd2222-0000-4000-8000-000000000000"); + // With Two closed, One no longer collides → back to 4 chars. + expect(shortestUniquePrefix("abcd1111-0000-4000-8000-000000000000")).toBe("abcd"); + }); +}); diff --git a/packages/core/tests/tools/read-tab.test.ts b/packages/core/tests/tools/read-tab.test.ts new file mode 100644 index 0000000..71e419c --- /dev/null +++ b/packages/core/tests/tools/read-tab.test.ts @@ -0,0 +1,101 @@ +import { describe, expect, it } from "vitest"; +import { createReadTabTool, type ReadTabCallbacks } from "../../src/tools/read-tab.js"; +import type { TabResolution } from "../../src/tools/send-to-tab.js"; + +function makeCallbacks(overrides: Partial<ReadTabCallbacks> = {}): ReadTabCallbacks { + return { + resolveShortId: (): TabResolution => ({ + status: "ok", + tab: { id: "target-id", title: "Target", handle: "targ" }, + }), + getLastResponse: () => ({ text: "the answer is 42", status: "idle" }), + listOpenHandles: () => [{ handle: "targ", title: "Target" }], + ...overrides, + }; +} + +describe("createReadTabTool — schema & description", () => { + it("is a non-blocking snapshot read", () => { + const tool = createReadTabTool(makeCallbacks()); + expect(tool.name).toBe("read_tab"); + expect(tool.description).toContain("SNAPSHOT"); + expect(tool.description.toLowerCase()).toContain("does not block"); + }); +}); + +describe("createReadTabTool — execute()", () => { + it("returns the last assistant response wrapped in a tab_response tag", async () => { + const tool = createReadTabTool(makeCallbacks()); + const out = await tool.execute({ tab_id: "targ" }); + expect(out).toContain("<tab_response"); + expect(out).toContain('tab="targ"'); + expect(out).toContain('status="idle"'); + expect(out).toContain("the answer is 42"); + expect(out).toContain("</tab_response>"); + }); + + it("notes that a running tab's response is its previous completed turn", async () => { + const tool = createReadTabTool( + makeCallbacks({ + getLastResponse: () => ({ text: "older turn", status: "running" }), + }), + ); + const out = await tool.execute({ tab_id: "targ" }); + expect(out).toContain("still running"); + expect(out).toContain("older turn"); + }); + + it("explains when a tab has no completed response yet (idle)", async () => { + const tool = createReadTabTool( + makeCallbacks({ + getLastResponse: () => ({ text: null, status: "idle" }), + }), + ); + const out = await tool.execute({ tab_id: "targ" }); + expect(out).toContain("no completed response"); + expect(out).toContain("no assistant responses yet"); + }); + + it("explains when a tab is still on its first turn (running, no prior text)", async () => { + const tool = createReadTabTool( + makeCallbacks({ + getLastResponse: () => ({ text: null, status: "running" }), + }), + ); + const out = await tool.execute({ tab_id: "targ" }); + expect(out).toContain("no completed response"); + expect(out).toContain("still working on its first turn"); + }); + + it("rejects an empty tab_id and lists open handles", async () => { + const tool = createReadTabTool(makeCallbacks()); + const out = await tool.execute({ tab_id: "" }); + expect(out).toContain("Error"); + expect(out).toContain("targ"); + }); + + it("returns a helpful error when the id is unknown", async () => { + const tool = createReadTabTool(makeCallbacks({ resolveShortId: () => ({ status: "none" }) })); + const out = await tool.execute({ tab_id: "zzzz" }); + expect(out).toContain("no open tab matches"); + expect(out).toContain("Currently open tabs:"); + }); + + it("asks for more characters when the id is ambiguous", async () => { + const tool = createReadTabTool( + makeCallbacks({ + resolveShortId: () => ({ + status: "ambiguous", + matches: [ + { id: "a1", title: "One", handle: "abcd1" }, + { id: "a2", title: "Two", handle: "abcd2" }, + ], + }), + }), + ); + const out = await tool.execute({ tab_id: "abcd" }); + expect(out).toContain("ambiguous"); + expect(out).toContain("abcd1"); + expect(out).toContain("abcd2"); + }); +}); diff --git a/packages/core/tests/tools/send-to-tab.test.ts b/packages/core/tests/tools/send-to-tab.test.ts new file mode 100644 index 0000000..4450fc5 --- /dev/null +++ b/packages/core/tests/tools/send-to-tab.test.ts @@ -0,0 +1,142 @@ +import { describe, expect, it, vi } from "vitest"; +import { + createSendToTabTool, + type SendToTabCallbacks, + type TabResolution, +} from "../../src/tools/send-to-tab.js"; + +function makeCallbacks(overrides: Partial<SendToTabCallbacks> = {}): SendToTabCallbacks { + return { + resolveShortId: (): TabResolution => ({ + status: "ok", + tab: { id: "target-id", title: "Target", handle: "targ" }, + }), + deliver: () => ({ status: "started" }), + listOpenHandles: () => [{ handle: "targ", title: "Target" }], + self: { id: "self-id", handle: "self" }, + ...overrides, + }; +} + +describe("createSendToTabTool — schema & description", () => { + it("exposes tab_id and message params and a fire-and-forget description", () => { + const tool = createSendToTabTool(makeCallbacks()); + expect(tool.name).toBe("send_to_tab"); + expect(tool.description).toContain("fire-and-forget"); + expect(tool.description.toLowerCase()).toContain("queued"); + }); +}); + +describe("createSendToTabTool — execute()", () => { + it("delivers to a resolved target and reports the started status", async () => { + const deliver = vi.fn(() => ({ status: "started" as const })); + const tool = createSendToTabTool(makeCallbacks({ deliver })); + const out = await tool.execute({ tab_id: "targ", message: "hello there" }); + expect(deliver).toHaveBeenCalledTimes(1); + const [targetId, delivered] = deliver.mock.calls[0] ?? []; + expect(targetId).toBe("target-id"); + // Provenance prefix names the sending tab's handle. + expect(delivered).toContain("[message from tab self]"); + expect(delivered).toContain("hello there"); + expect(out).toContain("idle"); + expect(out).toContain("targ"); + }); + + it("reports the queued status when the target is busy", async () => { + const deliver = vi.fn(() => ({ status: "queued" as const })); + const tool = createSendToTabTool(makeCallbacks({ deliver })); + const out = await tool.execute({ tab_id: "targ", message: "ping" }); + expect(out.toLowerCase()).toContain("queued"); + expect(out.toLowerCase()).toContain("busy"); + }); + + it("reports a HELD message when delivery is suppressed (auto-wake limit hit)", async () => { + const deliver = vi.fn(() => ({ status: "suppressed" as const })); + const tool = createSendToTabTool(makeCallbacks({ deliver })); + const out = await tool.execute({ tab_id: "targ", message: "ping again" }); + expect(out).toContain("HELD"); + expect(out.toLowerCase()).toContain("limit"); + // It must steer the sender away from retrying in a loop. + expect(out.toLowerCase()).toContain("do not keep resending"); + expect(out.toLowerCase()).toContain("human"); + }); + + it("rejects an empty tab_id and lists open handles", async () => { + const tool = createSendToTabTool(makeCallbacks()); + const out = await tool.execute({ tab_id: " ", message: "hi" }); + expect(out).toContain("Error"); + expect(out).toContain("targ"); + }); + + it("rejects an empty message", async () => { + const deliver = vi.fn(() => ({ status: "started" as const })); + const tool = createSendToTabTool(makeCallbacks({ deliver })); + const out = await tool.execute({ tab_id: "targ", message: " " }); + expect(out).toContain("Error"); + expect(deliver).not.toHaveBeenCalled(); + }); + + it("returns a helpful error and open-tab list when the id is unknown", async () => { + const deliver = vi.fn(() => ({ status: "started" as const })); + const tool = createSendToTabTool( + makeCallbacks({ + resolveShortId: () => ({ status: "none" }), + deliver, + }), + ); + const out = await tool.execute({ tab_id: "zzzz", message: "hi" }); + expect(out).toContain("no open tab matches"); + expect(out).toContain("Currently open tabs:"); + expect(deliver).not.toHaveBeenCalled(); + }); + + it("asks for more characters when the id is ambiguous", async () => { + const deliver = vi.fn(() => ({ status: "started" as const })); + const tool = createSendToTabTool( + makeCallbacks({ + resolveShortId: () => ({ + status: "ambiguous", + matches: [ + { id: "a1", title: "One", handle: "abcd1" }, + { id: "a2", title: "Two", handle: "abcd2" }, + ], + }), + deliver, + }), + ); + const out = await tool.execute({ tab_id: "abcd", message: "hi" }); + expect(out).toContain("ambiguous"); + expect(out).toContain("abcd1"); + expect(out).toContain("abcd2"); + expect(deliver).not.toHaveBeenCalled(); + }); + + it("refuses to send to its own tab", async () => { + const deliver = vi.fn(() => ({ status: "started" as const })); + const tool = createSendToTabTool( + makeCallbacks({ + resolveShortId: () => ({ + status: "ok", + tab: { id: "self-id", title: "Me", handle: "self" }, + }), + deliver, + }), + ); + const out = await tool.execute({ tab_id: "self", message: "hi" }); + expect(out).toContain("cannot send a message to your own tab"); + expect(deliver).not.toHaveBeenCalled(); + }); + + it("surfaces a thrown delivery error instead of crashing", async () => { + const tool = createSendToTabTool( + makeCallbacks({ + deliver: () => { + throw new Error("boom"); + }, + }), + ); + const out = await tool.execute({ tab_id: "targ", message: "hi" }); + expect(out).toContain("Error delivering message"); + expect(out).toContain("boom"); + }); +}); diff --git a/packages/core/tests/tools/summon.test.ts b/packages/core/tests/tools/summon.test.ts index 6597c21..f59f345 100644 --- a/packages/core/tests/tools/summon.test.ts +++ b/packages/core/tests/tools/summon.test.ts @@ -39,9 +39,13 @@ describe("createSummonTool — description content", () => { path: "/home/u/.config/dispatch/agents/researcher.toml", }, ]; - const tool = createSummonTool("/tmp/work", noopCallbacks, agents, [], [ - "/home/u/.config/dispatch/agents", - ]); + const tool = createSummonTool( + "/tmp/work", + noopCallbacks, + agents, + [], + ["/home/u/.config/dispatch/agents"], + ); expect(tool.description).toContain("programmer"); expect(tool.description).toContain("Programmer"); expect(tool.description).toContain("Implements code from a plan"); diff --git a/packages/frontend/src/lib/components/TabBar.svelte b/packages/frontend/src/lib/components/TabBar.svelte index feb1e5b..3cbd849 100644 --- a/packages/frontend/src/lib/components/TabBar.svelte +++ b/packages/frontend/src/lib/components/TabBar.svelte @@ -56,6 +56,7 @@ const activeUserTabId = $derived( > <span class="flex items-center gap-1.5"> <span class="w-1.5 h-1.5 rounded-full shrink-0 {statusColor(tab.agentStatus)}"></span> + <span class="font-mono text-[10px] px-1 py-0.5 rounded bg-base-300 text-base-content/60 shrink-0" title="Tab ID — agents address this tab by this handle">{tabStore.shortHandleFor(tab.id)}</span> <span class="max-w-32 truncate text-xs">{tab.title}</span> </span> <button @@ -89,6 +90,7 @@ const activeUserTabId = $derived( > <span class="flex items-center gap-1"> <span class="w-1 h-1 rounded-full shrink-0 {statusColor(tab.agentStatus)}"></span> + <span class="font-mono text-[10px] px-1 rounded bg-base-300 text-base-content/60 shrink-0" title="Tab ID — agents address this tab by this handle">{tabStore.shortHandleFor(tab.id)}</span> <span class="max-w-28 truncate text-xs">{tab.title}</span> </span> {#if tab.persistent} diff --git a/packages/frontend/src/lib/components/ToolPermissions.svelte b/packages/frontend/src/lib/components/ToolPermissions.svelte index d64eaf9..0caf0fa 100644 --- a/packages/frontend/src/lib/components/ToolPermissions.svelte +++ b/packages/frontend/src/lib/components/ToolPermissions.svelte @@ -28,6 +28,16 @@ const toolPermissions: ToolPermission[] = [ description: "Allow the AI to open new independent top-level tabs", }, { + id: "send_to_tab", + label: "Message other tabs", + description: "Allow the AI to send messages to other tabs by their ID", + }, + { + id: "read_tab", + label: "Read other tabs", + description: "Allow the AI to read other tabs' latest responses by their ID", + }, + { id: "web_search", label: "Web search", description: "Allow the AI to search the web via Firecrawl", diff --git a/packages/frontend/src/lib/settings.svelte.ts b/packages/frontend/src/lib/settings.svelte.ts index 11eb79c..123008d 100644 --- a/packages/frontend/src/lib/settings.svelte.ts +++ b/packages/frontend/src/lib/settings.svelte.ts @@ -9,6 +9,8 @@ let toolPerms = $state<Record<string, boolean>>({ bash: false, summon: false, user_agent: false, + send_to_tab: false, + read_tab: false, external_directory: false, web_search: false, youtube_transcribe: false, @@ -19,6 +21,8 @@ let savedToolPerms = $state<Record<string, boolean>>({ bash: false, summon: false, user_agent: false, + send_to_tab: false, + read_tab: false, external_directory: false, web_search: false, youtube_transcribe: false, diff --git a/packages/frontend/src/lib/tabs.svelte.ts b/packages/frontend/src/lib/tabs.svelte.ts index e303c80..317de8d 100644 --- a/packages/frontend/src/lib/tabs.svelte.ts +++ b/packages/frontend/src/lib/tabs.svelte.ts @@ -232,6 +232,29 @@ export function createTabStore() { return tabs.find((t) => t.id === id); } + /** + * Minimum display length of a tab handle (git-style short id). Mirrors + * `MIN_TAB_PREFIX_LENGTH` in core's `db/tabs.ts` so the handle the user sees + * is always resolvable by the backend's `resolveTabPrefix`. + */ + const MIN_HANDLE_LENGTH = 4; + + /** + * Compute the shortest unique prefix (≥ MIN_HANDLE_LENGTH chars) of `tabId` + * among all currently-open tabs — the displayed "handle" agents use to + * address each other. Purely DERIVED from the UUIDs already in `tabs`; never + * stored. Grows by one char only when another open tab shares the prefix, and + * shrinks back when that sibling closes. + */ + function shortHandleFor(tabId: string): string { + const others = tabs.map((t) => t.id).filter((id) => id !== tabId); + for (let len = MIN_HANDLE_LENGTH; len < tabId.length; len++) { + const candidate = tabId.slice(0, len); + if (!others.some((id) => id.startsWith(candidate))) return candidate; + } + return tabId; + } + async function createNewTab(): Promise<Tab> { const id = generateId(); const title = "New Tab"; @@ -1926,6 +1949,7 @@ export function createTabStore() { get configReloaded() { return configReloaded; }, + shortHandleFor, createNewTab, switchTab, closeTab, |
