diff options
| author | Adam Malczewski <[email protected]> | 2026-06-24 14:10:03 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-24 14:10:03 +0900 |
| commit | dabcbc79831052effc6ce990021feee07d661f7e (patch) | |
| tree | 3e74e16f36d6a675abe676f0d04ca169f65f0a71 | |
| parent | b58fb8373a1f7311cead23aa9a4d1fcd6927634f (diff) | |
| download | dispatch-dabcbc79831052effc6ce990021feee07d661f7e.tar.gz dispatch-dabcbc79831052effc6ce990021feee07d661f7e.zip | |
fix(kernel+tool-shell): abort hanging tool calls without bricking the conversation
kernel: executeToolCall now races tool.execute against the abort signal
via Promise.race; on abort resolves (not rejects) with an "Aborted" result
so the step completes normally → finishReason "aborted" → turn seals
cleanly (done event) → finally clears activeTurns → conversation freed,
next message accepted. run-turn strips tool-call chunks from the assistant
message on abort (keeps text/thinking) and omits tool-result messages to
avoid persisting dangling tool calls that would 400 the provider next turn.
tool-shell: realSpawn spawns detached (own process group); on abort AND
timeout kills the entire group (process.kill(-pgid, SIGKILL)) and resolves
immediately — no child.on("close") dependency, so a grandchild holding the
pipes can't stall the spawn promise or leak.
Also: ORCHESTRATOR.md migrated to dispatch CLI summon mechanism; .skills
summary; bin/sync-env PATH injection; frontend handoff docs.
1453 vitest pass · tsc -b EXIT 0 · biome clean.
| -rw-r--r-- | .skills/ORCHESTRATOR.md | 2 | ||||
| -rw-r--r-- | ORCHESTRATOR.md | 155 | ||||
| -rwxr-xr-x | bin/sync-env | 9 | ||||
| -rw-r--r-- | broken-chat-repair-handoff.md | 180 | ||||
| -rw-r--r-- | frontend-cwd-resolution-handoff.md | 95 | ||||
| -rw-r--r-- | frontend-model-persistence-handoff.md | 91 | ||||
| -rw-r--r-- | frontend-system-prompt-handoff.md | 92 | ||||
| -rw-r--r-- | packages/kernel/src/runtime/dispatch.test.ts | 535 | ||||
| -rw-r--r-- | packages/kernel/src/runtime/dispatch.ts | 18 | ||||
| -rw-r--r-- | packages/kernel/src/runtime/run-turn.ts | 53 | ||||
| -rw-r--r-- | packages/tool-shell/src/shell.test.ts | 156 | ||||
| -rw-r--r-- | packages/tool-shell/src/shell.ts | 5 | ||||
| -rw-r--r-- | packages/tool-shell/src/spawn.ts | 71 |
13 files changed, 1280 insertions, 182 deletions
diff --git a/.skills/ORCHESTRATOR.md b/.skills/ORCHESTRATOR.md index 620b983..27e3c95 100644 --- a/.skills/ORCHESTRATOR.md +++ b/.skills/ORCHESTRATOR.md @@ -1,3 +1,5 @@ +Operating manual for the dispatch arch-rewrite orchestrator: plan topological waves of single-owner agents, summon via opencode run, verify from contracts + tests (never read implementation), resolve contract gaps. Project-specific to this repo. +--- # ORCHESTRATOR.md — how to drive this project > **You are the orchestrator.** You do NOT write feature code yourself. You plan, diff --git a/ORCHESTRATOR.md b/ORCHESTRATOR.md index 15393c7..8f6c156 100644 --- a/ORCHESTRATOR.md +++ b/ORCHESTRATOR.md @@ -52,10 +52,10 @@ they justify every rule below. existing one?" — surface it to the user; never decide granularity silently. 4. **Write the prompt** to `prompts/<unit>.md` (gitignored). See §3 for the prompt recipe. -5. **Summon the wave** via the Task tool (`subagent_type: "Opus 4.8"`, see §2); disjoint units - run in PARALLEL (§2a). RE-READ the §3 scoping map before each wave so you point each agent at - the right rule files. You MAY read the rules/briefs themselves (they are tiny) — what you must - NEVER do is INLINE their contents into a Task prompt (§2 TOKEN RULE). +5. **Summon the wave** via the `dispatch` CLI (`umans/umans-glm-5.2`, see §2); disjoint units + run in PARALLEL (§2a). RE-READ the §3 scoping map before each wave so you attach the right + rule files via `--file` for each agent. You MAY read the rules/briefs themselves (they are + tiny) — what you must NEVER do is INLINE their contents into the `--text` (§2 TOKEN RULE). 6. **Verify** the reports + independently re-run checks (see §4). Trust nothing until you've re-run `typecheck`/`test`/`check` yourself. 7. **Resolve** any contract gaps / errors (see §5). @@ -63,77 +63,85 @@ they justify every rule below. --- -## 2. Summoning agents via the Task tool (Opus 4.8) +## 2. Summoning agents via the `dispatch` CLI -The **Task tool** is the summon mechanism; **`subagent_type: "Opus 4.8"`** is the BUILDING -agent (Claude Opus 4.8 — capable coder). `deepseek-v4-flash` remains the *app's own runtime -testbench*, never a builder. (The legacy `opencode run` CLI path is retired; +The **`dispatch` CLI** is the summon mechanism; **`umans/umans-glm-5.2`** is the BUILDING +agent (capable coder; browse alternatives with `dispatch models`). `$DISPATCH_MODEL` (`opencode/deepseek-v4-flash`) is the *app's own +runtime testbench*, never a builder. (The legacy `opencode run` and Task-tool paths are retired; `notes/opencode-agents.md` is historical.) -**Same session, full tools.** A Task subagent runs in THIS session's working directory (the repo -root, `/home/tradam/projects/dispatch/arch-rewrite`) with the normal coding toolset -(Read/Edit/Write/Bash/`lsp`) and the same permissions — so its `lsp`/typecheck work and there is -NO headless cross-dir permission hang. The visibility/ownership rules (§6) are NOT enforced by a -sandbox here — they hold only because the prompt states them (the briefs do). - -**THE TOKEN RULE — never INLINE the briefs/rules/TASK into the prompt.** This is the whole point -of the file-based harness: pasting `.dispatch/*` or `prompts/<unit>.md` contents into the Task -`prompt` burns YOUR context and duplicates what the agent can read itself — exactly what we avoid. -(Reading the small rule/brief files for your OWN understanding is allowed — the rule is about -prompt assembly, not about you being ignorant of the rules.) Instead the Task `prompt` -is a SHORT pointer that tells the subagent to READ those files ITSELF. The guardrail bytes then -land in the SUBAGENT's context, never the orchestrator's — the same property the old -`"$(cat …)"` shell-concat gave us (the concat just moves from the shell to the agent's own Read). - -**Canonical summon** — ONE Task call per unit: -- `description`: `"build <unit>"` -- `subagent_type`: `"Opus 4.8"` -- `prompt`: the SHORT pointer below — fill in `<unit>` + its scoped-rule files (§3 map). Do NOT - inline any file contents. +**Prerequisites.** The Dispatch server must be up — probe with `dispatch models` ("Unable to +connect" → boot it, §8); the CLI defaults to it, no `--server` needed. The summoned agent is a +SEPARATE conversation on that server, NOT this session — so set `--cwd` to the repo root so its +file tools operate on the repo. It has the Dispatch runtime's coding toolset (read_file / +write_file / bash — the tool extensions `host-bin` wires); it runs `tsc -b`/vitest/biome via bash +and writes its report to `reports/<unit>.md`. The visibility/ownership rules (§6) are NOT enforced +by a sandbox — they hold only because the briefs state them. + +**THE TOKEN RULE — use `--file` to attach guardrails; never inline into `--text`.** The `dispatch` +CLI's `--file <path>` flag (repeatable) reads a file from disk and attaches its contents to the +agent's message — the guardrail bytes land in the SUBAGENT's context, never the orchestrator's. +This is the whole point of the file-based harness: the briefs/rules/task are delivered to the agent +by the CLI, not pasted by you. NEVER put `.dispatch/*` or `prompts/<unit>.md` contents into the +`--text` message — that duplicates what `--file` already delivers AND burns YOUR context. (Reading +the small rule/brief files for your OWN understanding is allowed — the rule is about prompt +assembly, not about you being ignorant of the rules.) The `--text` is a SHORT instruction that +tells the agent what to DO (implement, verify, report) — not what the rules SAY. + +**Canonical summon** — ONE `dispatch` call per unit (fill in `<unit>` + its scoped-rule `--file`s +per the §3 map; the `--file` ORDER is the assembly order: constitution → brief → rules → task; +omit `--file .dispatch/extension-agent.md` for non-extension units): +```bash +cd /home/tradam/projects/dispatch/arch-rewrite +dispatch umans/umans-glm-5.2 \ + --cwd /home/tradam/projects/dispatch/arch-rewrite \ + --text "You are the single owner-agent for packages/<unit>/. The attached files are your constitution, brief, rules, and task — follow them exactly in the order given. Then IMPLEMENT the task now: edit ONLY files under packages/<unit>/, run tsc -b / vitest / biome for your package, and write your report to reports/<unit>.md. Reply with ONLY a one-line status + the path reports/<unit>.md — no diffs, no logs." \ + --file AGENTS.md \ + --file .dispatch/package-agent.md \ + --file .dispatch/extension-agent.md \ + --file .dispatch/rules/one-owner.md \ + --file .dispatch/rules/isolation-over-dry.md \ + --file .dispatch/rules/biome-clean.md \ + --file prompts/<unit>.md \ + > reports/<unit>.run.log 2>&1 ``` -You are the single owner-agent for packages/<unit>/. Read these files IN FULL with your own tools -and follow them exactly, in this order (do NOT skip — they are your constitution, brief, rules, -and task; do NOT paste them back to me): - 1. AGENTS.md (project constitution; may already be in context) - 2. .dispatch/package-agent.md (base owner brief) - 3. .dispatch/extension-agent.md (ONLY if your unit is an extension — else skip) - 4. .dispatch/rules/one-owner.md .dispatch/rules/isolation-over-dry.md .dispatch/rules/biome-clean.md - (+ the unit's other scoped rules per the §3 map — list them here) - 5. prompts/<unit>.md (YOUR task) -Then IMPLEMENT it now: edit ONLY files under packages/<unit>/, run tsc -b / vitest / biome for -your package, and write your report to reports/<unit>.md. -Reply with ONLY a one-line status + the path reports/<unit>.md — no diffs, no logs. -``` - -The read list IS the old fixed assembly order (constitution → package brief → extension supplement -→ scoped rules → TASK); the ONLY change is the AGENT reads it instead of the shell concatenating -it. Name in line 4 ONLY the scoped-rule files matching the unit's layer (§3 map) — don't dump -every rule. `prompts/<unit>.md` stays JUST the TASK block (§3). - -**Output discipline.** The Task tool returns ONE short final message (not the agent's stream), so -the firehose can't flood you. Keep it that way: the agent replies tiny and writes the real report -to `reports/<unit>.md`, which you then `Read` from disk. NEVER ask an agent to paste diffs/logs -back; never `Read` a giant log into context — `Grep` it for a specific error if needed. -**Parallel waves.** Launch a wave by emitting MULTIPLE Task calls IN ONE message (concurrent) — -one per unit — but ONLY when their file sets are disjoint (single-writer, §6/§2a). Sequence -dependent waves across separate messages (later waves compile against earlier ones). Log parallel -runs in `tasks.md`. No timeout/backgrounding knobs to manage — the Task tool handles long runs and -notifies on completion. +The `--file` list IS the fixed assembly order (constitution → package brief → extension +supplement → scoped rules → TASK); the CLI delivers their contents to the agent directly — no +read_file round-trips needed. Attach ONLY the scoped-rule files matching the unit's layer +(§3 map) — don't attach every rule. `prompts/<unit>.md` stays JUST the TASK block (§3). + +**Output discipline — capture the stream, never display it.** The `dispatch` summon STREAMS the +agent's full response (reasoning + tool calls + results) to stdout — enormous for a building +task. ALWAYS redirect to a log file (`> reports/<unit>.run.log 2>&1`) and do NOT `cat` it back +wholesale. You don't need the raw stream: read the agent's `reports/<unit>.md` report, and, if you +must, `grep`/`tail` the log for a specific error. The conversation ID prints at the end of the +stream as `[conversation] <uuid>` — capture it so you can `dispatch read <short-id>` later (fetch +the last message without re-streaming) or `dispatch send <short-id> --text "…" --queue` to steer +mid-turn. Treat dumping a full run log into context as a hard failure. + +**Run discipline:** +- **Do NOT background it. Use a large timeout** (e.g. 1800000 ms = 30 min) — these are long tasks. + The `dispatch` call blocks until the agent's turn completes; if a `run_shell` times out, the + conversation KEEPS RUNNING on the server — recover with `dispatch read <short-id>`. +- One `run_shell` per summon (foreground, large timeout). For PARALLEL agents on disjoint files, + launch multiple summons as CONCURRENT `run_shell` tool calls — but ONLY when their file sets + do not overlap (single-writer rule, §6/§2a). Log parallel runs in `tasks.md`. **GOTCHAS:** -- **Don't burn your own tokens.** Re-read THE TOKEN RULE above: point the agent at the files; - never inline them. The biggest failure mode here is the orchestrator pasting briefs/rules/ - prompts into a fat Task prompt — that defeats the harness. (Reading them yourself is fine.) -- **No sandbox = state the rules.** Because the subagent shares your tools/cwd, nothing stops it - editing out of its lane except the prompt. Always include the ownership/visibility briefs (they - tell the agent: never edit outside `packages/<unit>/`; read only OTHER units' contracts; if you - think you must read another unit's impl, REPORT and STOP). -- **Make agents IMPLEMENT, not deliberate** (§3): the pointer says "IMPLEMENT it now … write the - report". A plan-only return → re-summon (§5a). -- **Smoke check:** `Task(subagent_type:"Opus 4.8", prompt:"Reply with exactly SMOKE_OK")` should - return `SMOKE_OK`. +- **Don't burn your own tokens.** Re-read THE TOKEN RULE above: attach the files via `--file`; + never paste their contents into `--text`. The biggest failure mode here is the orchestrator + inlining briefs/rules/prompts — that defeats the harness AND pollutes the orchestrator's + context. (Reading them yourself is fine.) +- **No sandbox = state the rules.** Because the subagent shares the repo cwd (via `--cwd`), + nothing stops it editing out of its lane except the prompt. Always include the + ownership/visibility briefs (they tell the agent: never edit outside `packages/<unit>/`; read + only OTHER units' contracts; if you think you must read another unit's impl, REPORT and STOP). +- **Make agents IMPLEMENT, not deliberate** (§3): the `--text` says "IMPLEMENT the task now … + write the report". A plan-only return → re-summon (§5a). +- **Smoke check:** `dispatch umans/umans-glm-5.2 --text "Reply with exactly SMOKE_OK"` should + print `SMOKE_OK`. --- @@ -142,7 +150,7 @@ notifies on completion. Throughput comes from running disjoint units at once. Organise it as waves: - **A wave = units that (a) touch DISJOINT files and (b) have no compile-time dependency on each other** (each imports only already-built packages + existing contracts). Launch a - wave by emitting one summon per unit as CONCURRENT tool calls (§2). Later waves depend on + wave by emitting one `dispatch` summon per unit as CONCURRENT `run_shell` calls (§2). Later waves depend on earlier ones; the composition root (`packages/host-bin/`) is almost always the LAST wave. - **Pre-author the seam to widen the wave.** Because the orchestrator OWNS contracts (§6), author the shared contract / typed handle in `packages/kernel/src/contracts/*` FIRST, then @@ -164,7 +172,7 @@ Throughput comes from running disjoint units at once. Organise it as waves: The invariant guardrails — single-writer directory ownership, visibility, coupling, the engineering standard, isolated verification, and the report format — live ONCE in the -standardized briefs the summon points the agent at (§2; the agent reads them itself): +standardized briefs the summon attaches via `--file` (§2; the CLI delivers them to the agent): - **`.dispatch/package-agent.md`** — the base for EVERY package owner. - **`.dispatch/extension-agent.md`** — the extension-only supplement (added for extension summons). @@ -262,8 +270,9 @@ live runs (§8 bracket trick), since a leak silently poisons the next run's coun barrel `index.ts`, a sibling conforming to a contract change), the orchestrator **summons the owning agent** — it does NOT edit implementation itself. - **Live API errors:** an HTTP 429 `GoUsageLimitError` is an UPSTREAM rate limit, - not a bug. The `opencode-2` key has a monthly cap; `opencode-1` is the backup. - Swap `DISPATCH_API_KEY` in `.env` (both keys are there). + not a bug — and now that building agents are dispatched THROUGH the server's provider (not a + host model), a mid-build 429 is just as likely as a runtime one. The provider key has a monthly + cap; swap `DISPATCH_API_KEY` in `.env` (backup keys noted there) and retry. --- @@ -340,7 +349,7 @@ live runs (§8 bracket trick), since a leak silently poisons the next run's coun ``` /home/tradam/projects/dispatch/arch-rewrite # THE worktree (branch arch/rewrite) - AGENTS.md the subagent constitution (auto-loaded by opencode; you enforce it) + AGENTS.md the subagent constitution (the summon points each agent at it; you enforce it) ORCHESTRATOR.md the orchestrator's operating manual (this file) GLOSSARY.md canonical vocabulary + aliases-to-avoid (human-gated) tasks.md live progress checklist / milestone log @@ -358,7 +367,7 @@ live runs (§8 bracket trick), since a leak silently poisons the next run's coun observability-design.md logging/spans/collector/trace-store design (Phase A–B) cli-design.md CLI design decisions + unit plan (built; §3 = settled decisions) frontend-design.md future web frontend design (IDEATION; separate repo) - opencode-agents.md LEGACY — notes on the retired `opencode run` CLI summon path (§2 now uses the Task tool / Opus 4.8) + opencode-agents.md LEGACY — notes on the retired `opencode run` CLI summon path (§2 now uses the `dispatch` CLI) prompts/ (gitignored — orchestrator→agent TASK blocks) reports/ (gitignored — agent→orchestrator reports) diff --git a/bin/sync-env b/bin/sync-env index 9a69076..4b7c64c 100755 --- a/bin/sync-env +++ b/bin/sync-env @@ -4,12 +4,21 @@ set -euo pipefail ENV_FILE="/etc/dispatch/env" OPENCODE_KEY="sk-P2DW7UFzQZgI8TnYzp6wWKxv6suzFvSGRrwewhUK0OjA4C7bLzsKHki7r7zZOAH1" UMANS_KEY="sk-bvI-mgUrp5qJzMlK8ndoDqJ3dK0U-ds_8KYS8pzpRLw" +# PATH that includes user-installed tools (mise, bun, npm global, etc.) +DISPATCH_PATH="/home/tradam/.local/bin:/home/tradam/.local/share/mise/shims:/home/tradam/.bun/bin:/usr/local/sbin:/usr/local/bin:/usr/bin" if [ ! -f "$ENV_FILE" ]; then echo "Error: $ENV_FILE not found" >&2 exit 1 fi +# Update PATH (so tools like typescript-language-server, npm, bun are found) +if grep -q "^PATH=" "$ENV_FILE"; then + sed -i "s|^PATH=.*|PATH=$DISPATCH_PATH|" "$ENV_FILE" +else + echo "PATH=$DISPATCH_PATH" >> "$ENV_FILE" +fi + # Update DISPATCH_API_KEY if grep -q "^DISPATCH_API_KEY=" "$ENV_FILE"; then sed -i "s|^DISPATCH_API_KEY=.*|DISPATCH_API_KEY=$OPENCODE_KEY|" "$ENV_FILE" diff --git a/broken-chat-repair-handoff.md b/broken-chat-repair-handoff.md new file mode 100644 index 0000000..12deec0 --- /dev/null +++ b/broken-chat-repair-handoff.md @@ -0,0 +1,180 @@ +# Handoff → orchestrator (bcb5): broken-chat self-repair + +> From: diagnostic session. Agent/conversation `77574596` +> (`77574596-3e7b-46f8-8d67-c9e17a529dee`) "broke unrecoverably." User goal: +> **chats must self-heal when broken so they can continue.** Implement the fixes +> below. Full diagnosis + plan also in `reports/broken-chat-repair-diagnosis.md`. + +## 0. Your job (TL;DR) + +`reconcile()` only repairs orphaned tool-calls. The production DB has **two other +broken states** it doesn't handle, and they make a chat uncontinuable. Implement a +read-time repair so broken chats auto-heal on next open — **no DB surgery** +(append-only durability preserved; repair is a turn-path transform that runs on +every `load()`). Three units, two repos: + +- **Wave 1 (arch-rewrite, PARALLEL — disjoint packages):** + - `conversation-store` — extend `reconcile` (Layer 1) + harden `load()`. + - `openai-stream` — harden `convertMessages` args (Layer 2). +- **Wave 2 (separate repo `../claude`, SEPARATE agent):** + - `provider-anthropic` — harden its `safeJson` (Layer 2 equivalent). + +**Key architectural insight that shapes the waves:** Layer 1 lives in +`conversation-store.reconcile`, which runs in `load()` BEFORE any provider sees +the messages. So the Layer 1 fix protects **every** provider (openai-compat AND +anthropic) — the Claude plugin needs **no** Layer 1 change. Layer 2 (malformed +tool-call args) is **per-provider** serialization safety, so it must be applied +in each provider's converter (openai-stream + provider-anthropic). + +## 1. The break (what actually happened in `77574596`) + +Production DB: `/var/lib/dispatch/dispatch.db` (systemd `dispatch.service`). +136 chunks; seq counter = 136; **all JSON valid; no orphaned tool-calls** — so +`reconcile()` finds nothing wrong, yet the chat is uncontinuable. The tail: + +| seq | role | type | note | +|---|---|---|---| +| 133 | assistant | text | "Wave 0 fully verified…" | +| 134 | assistant | tool-call | `todo_write`, `input` = **malformed JSON** (`json_type=text`, raw string) | +| 135 | tool | tool-result | isError: "todo_write args must be an object with a `todos` array" | +| 136 | assistant | **error** | `HTTP 400: unexpected character: line 1 column 1413 (char 1412). Received Model Group=glm-5.2` | + +### Root cause (confirmed byte-for-byte) +- seq 134's `input` is a raw string. Parsing it fails + `Expecting ':' delimiter: line 1 column 1413 (char 1412)` — an **exact match** to + the provider's `unexpected character: line 1 column 1413`. The **model emitted + malformed JSON as the `todo_write` arguments**. +- Chain: model emits text + malformed-args tool-call (step 5) → kernel dispatches + the tool, which returns an error result (seq 135) → kernel calls the provider + again (step 6); the request re-includes the assistant message carrying the + malformed `arguments` → provider 400s → persisted as an `error` chunk (seq 136). + +### Why it's "unrecoverable" +- `openai-stream` `convertAssistantMessage` serializes tool-call args as + `typeof c.input === "string" ? c.input : JSON.stringify(c.input)` — passes the + malformed string straight through as the OpenAI `arguments` field → provider + 400s on **every** continuation. +- The trailing `assistant` message whose only chunk is `error` serializes to + `content:""` + no tool_calls (error chunk is filtered out, leaving an empty + assistant message) → also uncontinuable. +- `reconcile()` touches neither. `load()` also has no try/catch on + `JSON.parse(value)` — a single corrupt row would throw and brick the chat. + +### Scope (production DB, 140 conversations) +- **6 conversations end in a trailing `error` chunk:** `102587c0`(seq2, HTTP 401 + model-not-supported), `2bf78252`(seq2), `61127511`(seq250), `77574596`(seq136), + `d0d85eca`(seq2), `e1ee0989`(seq20). +- **2 tool-calls total** carry a raw malformed-string `input`. +- `102587c0` has **only** the trailing-error break (no args, no tool-calls) — + proving Layer 1 is independently necessary. `77574596` has **both**. + +## 2. The fix + +### Layer 1 — `conversation-store` `reconcile.ts` (structural repair) +Extend `reconcileWithReport` to: +1. **Strip `error` chunks from assistant messages.** An `error` chunk is a + failed-generation marker, never valid provider content (no provider understands + an "error" content type) — provider-agnostic. +2. **Drop any assistant message left with no `text` and no `tool-call` chunks** + (the now-empty error-only message). This is what unblocks continuation. **Safe:** + an error-only step ends with no tool-calls, so it is never followed by a `tool` + message — no "tool-without-preceding-assistant-tool_calls" 400 can result. Keep + the existing orphaned-tool-call synthesis unchanged. +3. Extend `ReconcileReport` with counts of stripped error chunks / dropped messages + (for the existing `reconcile.repair` boot/log span). + +Why here: the constitution designates `reconcile` as "the pure function run on load +that repairs any partial turn." A trailing error-only assistant message IS a +partial/broken turn. Pure, provider-agnostic, runs on every `load()` → auto-repairs +all 6 broken chats. Repair is read-time only; storage (append-only) untouched. +`loadSince` (FE reads) is intentionally NOT reconciled, so the user still SEES the +error while the provider gets clean history. + +### Hardening — `conversation-store` `store.ts` `load()` (same unit) +Wrap the per-chunk `JSON.parse(value)` in try/catch: on a corrupt/unparseable row, +log + skip it (don't throw) so `reconcile` can still run on the rest. Today a single +bad row makes `load()` throw → unrecoverable. (0 such rows today; "never leave the +system broken" asks for it.) + +### Layer 2 — `openai-stream` `convert-messages.ts` (serialization safety) +In `convertAssistantMessage`, ensure a tool-call's `arguments` is **always a valid +JSON string**: if `input` is a string, `JSON.parse` it; on failure substitute a +valid fallback object (e.g. `JSON.stringify({})` or a wrapped +`{ _malformed_arguments: <truncated> }`). Objects pass through `JSON.stringify` as +today. This neutralizes already-stored malformed args (seq 134) so the provider +stops 400ing on continuation. Follow the SAME semantics as the Claude fix below +(isolation over DRY: each provider reimplements locally, same behavior). + +### Layer 2 (equivalent) — `../claude` `provider-anthropic` `convert.ts` (SEPARATE agent) +The Claude plugin already has a `safeJson(s)` helper (line ~115) used at +`input: typeof c.input === "string" ? safeJson(c.input) : c.input`. But its fallback +**returns the raw string `s` on parse failure** — for Anthropic, `tool_use.input` +must be an object, so a raw string can still 400 when a historical malformed tool_use +is re-sent. Fix: make `safeJson` return a **valid object fallback** (e.g. `{}`) on +parse failure instead of the raw string. (Layer 1 does NOT apply here — the +arch-rewrite `reconcile` already strips error chunks before the Claude provider sees +the messages, so the Claude converter never receives error-only assistant messages.) + +## 3. Waves & summoning + +- **Wave 1 (arch-rewrite, PARALLEL):** `conversation-store` (Layer 1 + `load()` + hardening) and `openai-stream` (Layer 2). Disjoint packages, no contract/type + change, both depend only on already-built `@dispatch/kernel` contracts. Standard + summon per ORCHESTRATOR §2/§3 (attach the scoped rules: conversation-store gets + `pure-core.md`+`no-internal-mocks.md`+`typed-handles.md`+`extension-logging.md`; + openai-stream gets `pure-core.md`+`no-internal-mocks.md`+`extension-logging.md`; + both get `one-owner.md`+`isolation-over-dry.md`+`biome-clean.md`+`package-agent.md`+ + `extension-agent.md`). +- **Wave 2 (separate repo, SEPARATE agent):** summon against + `--cwd /home/tradam/projects/dispatch/claude` for `packages/provider-anthropic` + (`convert.ts` `safeJson`). That repo has its own `AGENTS.md`; attach the + arch-rewrite `package-agent.md`+scoped rules as needed. Can run in parallel with + Wave 1 (different repo, no shared files). + +## 4. Why this auto-heals `77574596` (and the other 5) — no DB surgery +On next open/continue, `load()` returns history ending at seq 135 (the tool-result): +Layer 1 strips the seq-136 error message; Layer 2 sanitizes the seq-134 args to +valid JSON. The provider receives +`[…, assistant{text+tool-call(args:{})}, tool{error result}]` — a valid "continue +after a tool result" state. The model sees its `todo_write` failed and adjusts. +Chat continues. Same auto-repair applies to the other 5 (Layer 1 alone for the +401/empty cases; Layer 1+2 for any malformed-args case). + +## 5. Test requirements (regression scar tissue) + +**conversation-store `reconcile.test.ts`:** +- `reconcile strips error-only trailing assistant message` (the 77574596/102587c0 + shape: `[user, assistant{error}]` → `[user]`). +- `reconcile strips error chunk but keeps sibling text` + (`assistant{text,error}` → `assistant{text}`). +- `reconcile drops assistant message left empty after stripping error` + (`assistant{error}` only → dropped). +- `reconcile keeps tool-call + strips error` (`assistant{tool-call,error}` with a + matching result → `assistant{tool-call}`). +- existing orphaned-tool-call behavior unchanged (regression). +- (hardening) corrupt-JSON chunk row is skipped, rest load + reconcile. + +**openai-stream `convert-messages.test.ts`:** +- `arguments is valid JSON when input is a malformed string` (seed from seq 134's + raw string → output `JSON.parse`s, no throw). +- `arguments passes through valid string input` and `stringifies object input` + (regression). + +**provider-anthropic `convert.test.ts` (claude repo):** +- `safeJson returns a valid object fallback on malformed string` (raw malformed + string → `{}` or wrapped object, not the raw string). +- `safeJson parses valid string input` (regression). + +## 6. Verify (ORCHESTRATOR §4) +`bun run typecheck && bun run test && bun run check` whole-project green; both agents +in-lane (`git status --short`); zero internal mocks in the pure-core units. Live-spot: +open `77574596` against a probe/`bin/up` and confirm it now continues past the tool +result instead of 400-looping. + +## 7. Notes / out of scope +- **Parse-time prevention** (openai-stream / provider-anthropic could reject or + repair malformed args when the model emits them, instead of storing a raw + string) is a deeper follow-up; Layer 2 is the safety net that also repairs + already-stored data. +- Deploying the fix auto-repairs the 6 broken production chats on next load — no + migration needed. diff --git a/frontend-cwd-resolution-handoff.md b/frontend-cwd-resolution-handoff.md new file mode 100644 index 0000000..66c93b4 --- /dev/null +++ b/frontend-cwd-resolution-handoff.md @@ -0,0 +1,95 @@ +# Backend handoff — cwd resolution fixes (backend → FE) — courier doc + +> **From:** arch-rewrite orchestrator · **To:** dispatch-web orchestrator (b18a) · **Courier:** the user. +> Response to the cwd bug report you sent to backend agent ab13. The fixes are DONE and +> live-verified on the dev stack. + +## Version bumps + +| Package | From | To | Notes | +|---|---|---|---| +| `@dispatch/wire` | — | — | **Unchanged** | +| `@dispatch/transport-contract` | — | — | **Unchanged** | +| `@dispatch/ui-contract` | — | — | **Unchanged** | + +**This is a behavior-only change.** No wire/transport-contract types changed. No FE re-pin or +re-mirror needed. The FE needs NO contract change to benefit. + +--- + +## 1. The fix (what was broken → what now works) + +You reported: a workspace `defaultCwd` set, a conversation with no explicit cwd, and `pwd` ran in +the server default (`process.cwd()`) instead of the workspace `defaultCwd`. Plus your desired +behavior: a per-conversation cwd **relative to the workspace `defaultCwd`** unless absolute. + +**Root cause (backend-only):** the workspace-relative resolution lived in +`conversation-store.getEffectiveCwd`, which only resolved the *persisted* cwd. But the FE sends the +CwdField value as a **per-turn `cwd` on `chat.send`**, and `session-orchestrator` used a per-turn +`cwd` **as-is** — bypassing `getEffectiveCwd` entirely. So a relative `cwd` like `"arch-rewrite"` +reached `run_shell` raw → resolved against `process.cwd()` → a nonexistent path → `pwd` broke. + +**Three backend fixes (all live-verified):** + +1. **Per-turn `cwd` is now resolved.** `session-orchestrator` passes the per-turn `cwd` (on + `chat.send`/`POST /chat` AND on manual `POST /chat/warm`) through `getEffectiveCwd` as an + override, so it goes through the same workspace-relative algorithm as the persisted cwd. +2. **New-conversation timing.** A brand-new conversation's first turn previously ran + `getEffectiveCwd` *before* the workspace was assigned (so it saw `"default"`, not the request's + workspace). Now the workspace is assigned first. A relative per-turn `cwd` on the FIRST message + of a new conversation now resolves against the intended workspace. +3. **`DELETE /conversations/:id/cwd` was a stub** (returned `{cwd:null}` but did NOT clear the + persisted key). It now calls `clearCwd` and truly deletes the persisted cwd. + +## 2. The resolution algorithm (now applied to BOTH persisted and per-turn cwd) + +``` +workspaceId = persisted conversation workspaceId ("default" fallback) +workspaceCwd = workspace.defaultCwd ?? null +conversationCwd = the explicit cwd (persisted via GET /cwd, OR the per-turn chat.send cwd) + +if (conversationCwd == null) → workspaceCwd ?? serverDefaultCwd // process.cwd() +else if (conversationCwd absolute) → conversationCwd // starts with "/" +else → path.resolve(workspaceCwd ?? serverDefaultCwd, conversationCwd) +``` + +`serverDefaultCwd` = `process.cwd()` (the server's cwd). + +## 3. FE impact (minimal — no contract change) + +You do NOT need to change anything. Both FE patterns now work correctly: + +- **If you omit `cwd` on `chat.send`** (your current code): the backend resolves the persisted + conversation cwd (set via `PUT /conversations/:id/cwd`) through the algorithm. ✅ +- **If you send a relative `cwd` on `chat.send`**: it is resolved against the workspace + `defaultCwd`. ✅ (was broken — used raw) +- **If you send an absolute `cwd`** (starts `/`): overrides outright. ✅ + +### Endpoints (semantics — shapes unchanged) + +- `GET /conversations/:id/cwd` → **unchanged**: the RAW explicit conversation cwd (`null` = + inheriting workspace default). Your CwdField shows what the user typed. +- `GET /conversations/:id/lsp` → returns the **effective** (resolved) cwd. It now roots LSP at the + effective cwd INCLUDING the server-default fallthrough (when neither conversation nor workspace + cwd is set, LSP roots at `process.cwd()`). Previously returned `cwd: null` + empty `servers` when + no cwd was set. +- `DELETE /conversations/:id/cwd` → **now actually clears** the persisted cwd (was a no-op stub). + Returns `{ conversationId, cwd: null }` (unchanged shape). Use this to reset a conversation's cwd + to "inherit workspace default". +- `PUT /conversations/:id/cwd` → unchanged (persists the raw value). + +## 4. Optional FE simplification (not required) + +You MAY now safely **omit `cwd` on `chat.send`** entirely and rely on the backend resolving the +persisted conversation cwd (set via `PUT /conversations/:id/cwd`). This was the design you +described in the original report. Either path (send cwd, or omit it) is correct; the backend +resolves both consistently. Sending it is harmless; omitting it avoids sending redundant data. + +## 5. Live-verified (dev stack, workspace `test` defaultCwd `/home/tradam/projects/dispatch`) + +- Existing conversation, per-turn `cwd:"arch-rewrite"` → `pwd` = `/home/tradam/projects/dispatch/arch-rewrite` ✅ +- Brand-new conversation, per-turn `cwd:"arch-rewrite"` → `pwd` = `/home/tradam/projects/dispatch/arch-rewrite` ✅ +- Chat omitting `cwd` (persisted cwd `arch-rewrite`) → `pwd` = `/home/tradam/projects/dispatch/arch-rewrite` ✅ +- `PUT /tmp/test` → GET `/tmp/test` → DELETE → GET `null` (actually cleared) ✅ + +`tsc -b` EXIT 0, biome clean, 1311 vitest pass. diff --git a/frontend-model-persistence-handoff.md b/frontend-model-persistence-handoff.md new file mode 100644 index 0000000..912cea6 --- /dev/null +++ b/frontend-model-persistence-handoff.md @@ -0,0 +1,91 @@ +# Frontend handoff — per-conversation model persistence + +## What changed + +A chat's selected provider + model is now **persisted per conversation** +(like `cwd` and `reasoningEffort` already are). Opening a conversation in a new +browser session recalls the originally selected model instead of defaulting to +the server default. + +## Contract version bump + +`@dispatch/transport-contract` `0.19.0 → 0.20.0` — re-pin the `file:` dep and +re-mirror `.dispatch/transport-contract.reference.md`. + +## New types (additive) + +```ts +// GET /conversations/:id/model +export interface ModelResponse { + readonly conversationId: string; + readonly model: string | null; // <credentialName>/<model> form, or null +} + +// PUT /conversations/:id/model +export interface SetModelRequest { + readonly model: string | null; // null clears the persisted selection +} +``` + +## New endpoints + +### `GET /conversations/:id/model` +Returns `ModelResponse`. `model` is `null` when never set (the server then +resolves turns using the default provider + model). + +### `PUT /conversations/:id/model` +Body: `SetModelRequest`. Set `model` to a `<credentialName>/<model>` string +(one of the values from `GET /models`) to persist it. Set `model` to `null` +to clear the persisted selection. Returns `ModelResponse` with the resulting +value. + +## What the FE should do + +1. **On conversation open** — call `GET /conversations/:id/model` to fetch the + persisted model. If non-null, set the model selector to that value. If null, + use the global default (current behavior). + +2. **On model select** — call `PUT /conversations/:id/model` with the selected + model name (`<credentialName>/<model>` form). This persists it so future + turns (and new browser sessions) use the same model. + +3. **On model clear** (if the FE supports clearing back to default) — call + `PUT /conversations/:id/model` with `{ model: null }`. + +4. **No `ChatRequest.model` change needed** — the FE may continue sending + `model` on `chat.send` (per-turn override); the backend persists it. Or the + FE may omit `model` on `chat.send` and rely on the persisted value — the + backend resolves it. Either way works. + +## Backend behavior + +- **Per-turn override** (`ChatRequest.model` / `chat.send` model) takes + precedence and is persisted. +- **No per-turn override** → backend checks `getModel(conversationId)` → if + non-null, uses it; if null, falls through to the default provider. +- **Warm path** also resolves the model from persistence when no explicit + override is given (parity with real turns). + +## No FE handoff needed for tasks 1 & 2 + +- **Task 1** (workspace tab broadcast): already couriered to 29ae by a prior + orchestrator agent (`frontend-workspace-open-handoff.md`). +- **Task 2** (system-prompt cwd reconstruction): backend-only fix, no contract + version bump, no FE action needed. + +## Assumptions made (user was away) + +1. **Persist the model name string** (`<credentialName>/<model>` form), not + the provider/credential separately — the model name already encodes both + (the credential binds to a provider). This mirrors how the CLI sends + `--model` and how `ChatRequest.model` works. +2. **No model validation on PUT** — the backend doesn't validate the model + name on `PUT /conversations/:id/model` (it's just a string). The provider + resolves it at turn time; an unknown model → turn error, not a 400. This + matches the contract doc on `SetModelRequest`. +3. **Empty string clears** — `setModel(id, "")` deletes the key. The HTTP + `PUT` with `{ model: null }` maps to this. This is an implementation detail + the FE doesn't need to know about (it sends `null`). +4. **No `model` field on `ConversationMeta`** — following the precedent of `cwd` + and `reasoningEffort` (which are NOT on `ConversationMeta` but fetched via + dedicated endpoints). The FE calls `GET /conversations/:id/model` to read. diff --git a/frontend-system-prompt-handoff.md b/frontend-system-prompt-handoff.md index c0739b3..c135145 100644 --- a/frontend-system-prompt-handoff.md +++ b/frontend-system-prompt-handoff.md @@ -1,50 +1,18 @@ -# FE Courier Handoff: System Prompt Builder +# FE Courier Handoff: System Prompt Builder (Updated) -> Backend→FE courier. The user couriers this to `../dispatch-web` (FE agent `ffe3`). -> `@dispatch/transport-contract` bumped to `0.18.0` (additive types). +> Backend→FE courier. Send to FE agent `ffe3`. +> Supersedes the earlier `frontend-system-prompt-handoff.md` — adds `prompt:workspace_id`. -## Overview +## API endpoints -A template-based system prompt builder. The user defines a template with variable -placeholders (`[type:name]`) and conditionals (`[if]`/`[else]`/`[endif]`). Variables are -resolved at construction time (once per conversation, persisted for cache safety — -reconstructed only on compaction). +### `GET /system-prompt` → `{ template: string }` +Returns the current global template. When none is stored, returns the built-in default. -## API endpoints +### `PUT /system-prompt` ← `{ template: string }` → `{ template: string }` +Set the global template. Empty string = "no system prompt". 400 if `template` missing/wrong type. 503 if service unavailable. -### `GET /system-prompt` → `SystemPromptTemplateResponse` -```ts -{ template: string } -``` -Returns the current global template. When no template is stored (or the service is -unavailable), returns the built-in `DEFAULT_TEMPLATE`. - -### `PUT /system-prompt` ← `SetSystemPromptTemplateRequest` -```ts -// Request body: -{ template: string } -// Response: -{ template: string } // echoed back -``` -- `template` can be empty (means "no system prompt"). -- 400 if `template` is missing or not a string. -- 503 if the system-prompt service is unavailable. - -### `GET /system-prompt/variables` → `SystemPromptVariablesResponse` -```ts -{ - variables: readonly SystemPromptVariable[] -} -// SystemPromptVariable: -{ - type: string; // "system", "file", "prompt", "git" - name: string; // "time", "date", "os", "cwd", etc. - description: string; // human-readable - dynamic?: boolean; // true for file: (any path is valid) -} -``` -Static catalog — always available (no service dependency). Use this to render the -variable selector buttons in the builder UI. +### `GET /system-prompt/variables` → `{ variables: SystemPromptVariable[] }` +Static catalog — always available (no service dependency). Use this to render the variable selector buttons. ## Template format @@ -52,28 +20,25 @@ variable selector buttons in the builder UI. ``` [type:name] ``` -Resolves the variable at construction time. Unknown type → blank string. Non-existent -variable (e.g. file not found) → blank string. +Resolves at construction time. Unknown type → blank. Non-existent variable (e.g. file not found) → blank. ### Conditional blocks ``` [if type:name] - ...content if variable exists... + ...if variable exists... [else] - ...content if variable does NOT exist... + ...if not... [endif] ``` -Negated condition: +Negated: ``` [if !type:name] - ...content if variable does NOT exist... + ...if variable does NOT exist... [endif] ``` -- Nested `[if]` blocks: supported. -- Multi-line content: supported. -- Unmatched `[if]`/`[endif]`: treated as literal text. +Nested `[if]`: supported. Multi-line: supported. Unmatched `[if]`/`[endif]`: literal text. -## Available variables +## Available variables (updated) | Type:Name | Description | Dynamic? | |---|---|---| @@ -84,26 +49,19 @@ Negated condition: | `prompt:cwd` | Working directory | No | | `prompt:model` | Current model name | No | | `prompt:conversation_id` | Conversation ID | No | +| `prompt:workspace_id` | Workspace identifier — lets the AI know which workspace it's in, useful when summoning agents | No | | `git:branch` | Current git branch | No | | `git:status` | Short git status | No | | `file:<path>` | File contents (relative to cwd, or absolute if starts `/`) | **Yes** | -For `file:<path>`, the FE should allow free-text input for the path. Any filename is -valid — the backend resolves it relative to the conversation's cwd (or absolute if it -starts with `/`). - -## Caching behavior (important for FE) +For `file:<path>`, allow free-text input for the path. -The system prompt is **constructed once** (on the first turn of a new conversation) and -**persisted**. It is reused on all subsequent turns (no reconstruction — this preserves -the prompt cache). It is only **reconstructed on compaction** (fresh variable resolution). +## Caching behavior -Changing the template (via `PUT /system-prompt`) does NOT affect existing conversations -until they are compacted. New conversations use the new template on their first turn. +System prompt is **constructed once** (first turn of a new conversation) and **persisted**. Reused on all subsequent turns (cache-safe). Reconstructed only on **compaction**. Changing the template does NOT affect existing conversations until compacted. ## Default template -When no template is stored, the backend uses: ``` You are a helpful coding assistant. @@ -113,11 +71,3 @@ You are a helpful coding assistant. The current working directory is [prompt:cwd]. ``` - -## FE UI suggestion - -The builder is a full-page modal split into two: -1. **Text editor** (left/main): the template text. -2. **Variable selectors** (right/side): buttons grouped by type. Clicking a variable - inserts `[type:name]` at the cursor position. For `file:` (dynamic), show a text - input for the path + an "insert" button. diff --git a/packages/kernel/src/runtime/dispatch.test.ts b/packages/kernel/src/runtime/dispatch.test.ts new file mode 100644 index 0000000..afbfb39 --- /dev/null +++ b/packages/kernel/src/runtime/dispatch.test.ts @@ -0,0 +1,535 @@ +import { describe, expect, it } from "vitest"; +import type { ChatMessage } from "../contracts/conversation.js"; +import type { AgentEvent } from "../contracts/events.js"; +import type { ProviderContract, ProviderEvent } from "../contracts/provider.js"; +import type { ToolContract, ToolExecuteContext, ToolResult } from "../contracts/tool.js"; +import { executeToolCall } from "./dispatch.js"; +import { runTurn } from "./run-turn.js"; + +// --------------------------------------------------------------------------- +// Helpers (no internal mocks — kernel standard; fakes only) +// --------------------------------------------------------------------------- + +function delay(ms: number): Promise<void> { + return new Promise((resolve) => { + setTimeout(resolve, ms); + }); +} + +function createFakeProvider(script: ProviderEvent[][]): ProviderContract { + let callIndex = 0; + return { + id: "fake", + stream() { + const events = script[callIndex] ?? []; + callIndex++; + return (async function* () { + for (const event of events) { + yield event; + } + })(); + }, + }; +} + +function createFakeTool( + name: string, + handler?: (input: unknown, ctx: ToolExecuteContext) => Promise<ToolResult>, + opts?: { concurrencySafe?: boolean }, +): ToolContract { + return { + name, + description: `Fake tool: ${name}`, + parameters: { type: "object" }, + ...(opts?.concurrencySafe !== undefined ? { concurrencySafe: opts.concurrencySafe } : {}), + execute: handler ?? (async (input) => ({ content: `${name}: ${JSON.stringify(input)}` })), + }; +} + +function createCollectingEmit(): { events: AgentEvent[]; emit: (event: AgentEvent) => void } { + const events: AgentEvent[] = []; + return { events, emit: (event) => events.push(event) }; +} + +const noopEmit = () => {}; + +const userMessage: ChatMessage = { + role: "user", + chunks: [{ type: "text", text: "hello" }], +}; + +const ABORTED_RESULT: ToolResult = { content: "Aborted", isError: true }; + +// =========================================================================== +// executeToolCall — direct unit tests for the abort-signal race +// =========================================================================== + +describe("executeToolCall", () => { + it("returns the tool's result when the tool resolves before abort", async () => { + const ac = new AbortController(); + const tool = createFakeTool("echo", async (input) => ({ + content: `echo: ${JSON.stringify(input)}`, + })); + + const result = await executeToolCall( + { id: "tc1", name: "echo", input: { x: 1 } }, + tool, + ac.signal, + noopEmit, + "conv-1", + "turn-1", + ); + + expect(result).toEqual({ content: 'echo: {"x":1}' }); + }); + + it("returns Aborted immediately when signal is already aborted at call time", async () => { + const ac = new AbortController(); + ac.abort(); + const tool = createFakeTool("echo", async () => ({ content: "should not run" })); + + const result = await executeToolCall( + { id: "tc1", name: "echo", input: {} }, + tool, + ac.signal, + noopEmit, + "conv-1", + "turn-1", + ); + + expect(result).toEqual(ABORTED_RESULT); + }); + + it("returns Aborted when a hanging tool is raced against an abort signal", async () => { + const ac = new AbortController(); + // A tool that never resolves and ignores ctx.signal + const tool = createFakeTool("hang", () => new Promise<ToolResult>(() => {})); + + const promise = executeToolCall( + { id: "tc1", name: "hang", input: {} }, + tool, + ac.signal, + noopEmit, + "conv-1", + "turn-1", + ); + + // Abort after the tool has started + await delay(10); + ac.abort(); + + const result = await promise; + expect(result).toEqual(ABORTED_RESULT); + }); + + it("returns the tool's own result when a signal-aware tool resolves on abort", async () => { + const ac = new AbortController(); + const toolResult: ToolResult = { content: "aborted by tool", isError: true }; + const tool = createFakeTool("aware", (_input, ctx) => { + return new Promise<ToolResult>((resolve) => { + ctx.signal.addEventListener("abort", () => resolve(toolResult), { once: true }); + }); + }); + + const promise = executeToolCall( + { id: "tc1", name: "aware", input: {} }, + tool, + ac.signal, + noopEmit, + "conv-1", + "turn-1", + ); + + await delay(10); + ac.abort(); + + const result = await promise; + // The tool listens to the signal and resolves its own result. Whether + // the tool's result or the race's "Aborted" wins is timing-dependent; + // both are isError and let the turn seal with finishReason "aborted". + expect(result.isError).toBe(true); + expect(result.content).toBe("aborted by tool"); + }); + + it("swallows a late rejection from the orphaned tool promise after abort wins the race", async () => { + const ac = new AbortController(); + let rejectTool: ((err: Error) => void) | undefined; + const tool = createFakeTool("late-reject", () => { + return new Promise<ToolResult>((_resolve, reject) => { + rejectTool = reject; + }); + }); + + const promise = executeToolCall( + { id: "tc1", name: "late-reject", input: {} }, + tool, + ac.signal, + noopEmit, + "conv-1", + "turn-1", + ); + + await delay(10); + ac.abort(); + + const result = await promise; + expect(result).toEqual(ABORTED_RESULT); + + // The tool rejects AFTER the race already resolved with "Aborted". + // The no-op catch must swallow this — no unhandled rejection. + rejectTool?.(new Error("late boom")); + // Give the microtask queue a tick to flush + await delay(5); + // If we reach here without an unhandledRejection crashing the process, + // the test passes. (vitest surfaces unhandled rejections as failures.) + }); + + it("returns an error result when the tool rejects before abort", async () => { + const ac = new AbortController(); + const tool = createFakeTool("boom", async () => { + throw new Error("tool exploded"); + }); + + const result = await executeToolCall( + { id: "tc1", name: "boom", input: {} }, + tool, + ac.signal, + noopEmit, + "conv-1", + "turn-1", + ); + + expect(result.isError).toBe(true); + expect(result.content).toContain("tool exploded"); + }); + + it("returns Unknown tool when the tool is undefined", async () => { + const ac = new AbortController(); + const result = await executeToolCall( + { id: "tc1", name: "nonexistent", input: {} }, + undefined, + ac.signal, + noopEmit, + "conv-1", + "turn-1", + ); + + expect(result.isError).toBe(true); + expect(result.content).toContain("Unknown tool"); + }); +}); + +// =========================================================================== +// runTurn — integration tests for the abort-signal race (durability) +// =========================================================================== + +describe("runTurn abort-race durability", () => { + // Required test 1: A hanging tool (never resolves, ignores ctx.signal) + // must not keep runTurn from returning when the signal aborts. + it("hanging tool + abort → runTurn returns with finishReason aborted and emits done", async () => { + const ac = new AbortController(); + + // A tool whose execute returns a promise that NEVER resolves and + // ignores ctx.signal entirely. + const tool = createFakeTool("hang", () => new Promise<ToolResult>(() => {})); + + // Use eager: true so the tool starts BEFORE the signal aborts. + // This exercises the race (not the early signal.aborted return). + const provider: ProviderContract = { + id: "fake", + stream() { + return (async function* () { + yield { + type: "tool-call", + toolCallId: "tc1", + toolName: "hang", + input: {}, + } as ProviderEvent; + ac.abort(); + await delay(10); + yield { type: "finish", reason: "tool-calls" } as ProviderEvent; + })(); + }, + }; + + const { events, emit } = createCollectingEmit(); + + const result = await runTurn({ + provider, + messages: [userMessage], + tools: [tool], + dispatch: { maxConcurrent: 1, eager: true }, + conversationId: "conv-1", + turnId: "turn-1", + emit, + signal: ac.signal, + }); + + // runTurn returned (didn't hang) → the race worked. + expect(result.finishReason).toBe("aborted"); + + // A done event was emitted with reason "aborted". + const doneEvents = events.filter((e) => e.type === "done"); + expect(doneEvents).toHaveLength(1); + if (doneEvents[0]?.type === "done") { + expect(doneEvents[0].reason).toBe("aborted"); + } + }); + + // Required test 2: A signal-aware tool that resolves its own result on + // abort must also let runTurn return with finishReason "aborted". + it("signal-aware tool + abort → runTurn returns with finishReason aborted", async () => { + const ac = new AbortController(); + + const tool = createFakeTool("aware", (_input, ctx) => { + return new Promise<ToolResult>((resolve) => { + ctx.signal.addEventListener( + "abort", + () => resolve({ content: "aborted by tool", isError: true }), + { once: true }, + ); + }); + }); + + const provider: ProviderContract = { + id: "fake", + stream() { + return (async function* () { + yield { + type: "tool-call", + toolCallId: "tc1", + toolName: "aware", + input: {}, + } as ProviderEvent; + ac.abort(); + await delay(10); + yield { type: "finish", reason: "tool-calls" } as ProviderEvent; + })(); + }, + }; + + const { events, emit } = createCollectingEmit(); + + const result = await runTurn({ + provider, + messages: [userMessage], + tools: [tool], + dispatch: { maxConcurrent: 1, eager: true }, + conversationId: "conv-1", + turnId: "turn-1", + emit, + signal: ac.signal, + }); + + expect(result.finishReason).toBe("aborted"); + + const doneEvents = events.filter((e) => e.type === "done"); + expect(doneEvents).toHaveLength(1); + if (doneEvents[0]?.type === "done") { + expect(doneEvents[0].reason).toBe("aborted"); + } + + // When the step is aborted, tool-result MESSAGES are omitted from the + // result (the tool-result EVENT is still emitted by executeStep for + // live UI updates, but the message is not persisted). This prevents + // orphaned `tool` messages from breaking the next turn's provider + // request. The assistant message has its tool-call chunks stripped. + const toolResultMsg = result.messages.find((m) => m.role === "tool"); + expect(toolResultMsg).toBeUndefined(); + + // The assistant message should NOT contain tool-call chunks. + const assistantMsg = result.messages.find( + (m) => m.role === "assistant" && m.chunks.some((c) => c.type === "tool-call"), + ); + expect(assistantMsg).toBeUndefined(); + }); + + // Required test 3 (regression guard): Without abort, a normal tool runs + // and its result is used; finishReason reflects the model. + it("no abort → tool runs normally and its result is used (regression)", async () => { + const tool = createFakeTool("normal", async (input) => ({ + content: `result: ${JSON.stringify(input)}`, + })); + + const provider = createFakeProvider([ + [ + { type: "tool-call", toolCallId: "tc1", toolName: "normal", input: { x: 1 } }, + { type: "finish", reason: "tool-calls" }, + ], + [ + { type: "text-delta", delta: "done" }, + { type: "finish", reason: "stop" }, + ], + ]); + + const { events, emit } = createCollectingEmit(); + + const result = await runTurn({ + provider, + messages: [userMessage], + tools: [tool], + dispatch: { maxConcurrent: 1, eager: true }, + conversationId: "conv-1", + turnId: "turn-1", + emit, + }); + + // finishReason reflects the model (second step's "stop"). + expect(result.finishReason).toBe("stop"); + + // The tool's result was used (fed back, not "Aborted"). + const toolResultMsg = result.messages.find((m) => m.role === "tool"); + expect(toolResultMsg).toBeDefined(); + const trChunk = toolResultMsg?.chunks[0]; + expect(trChunk?.type).toBe("tool-result"); + if (trChunk?.type === "tool-result") { + expect(trChunk.content).toBe('result: {"x":1}'); + expect(trChunk.isError).toBe(false); + } + + // done event emitted with reason "stop". + const doneEvents = events.filter((e) => e.type === "done"); + expect(doneEvents).toHaveLength(1); + if (doneEvents[0]?.type === "done") { + expect(doneEvents[0].reason).toBe("stop"); + } + }); + + // Bonus: multiple hanging tools + abort → all resolve via the race, + // drain() doesn't deadlock, and runTurn returns. Tool-result messages + // are omitted from the result (aborted step); the turn seals cleanly. + it("multiple hanging tools + abort → drain completes and runTurn returns", async () => { + const ac = new AbortController(); + + // Two tools that never resolve and ignore ctx.signal. + const toolA = createFakeTool("hangA", () => new Promise<ToolResult>(() => {})); + const toolB = createFakeTool("hangB", () => new Promise<ToolResult>(() => {})); + + const provider: ProviderContract = { + id: "fake", + stream() { + return (async function* () { + yield { + type: "tool-call", + toolCallId: "tc1", + toolName: "hangA", + input: {}, + } as ProviderEvent; + yield { + type: "tool-call", + toolCallId: "tc2", + toolName: "hangB", + input: {}, + } as ProviderEvent; + ac.abort(); + await delay(10); + yield { type: "finish", reason: "tool-calls" } as ProviderEvent; + })(); + }, + }; + + const { events, emit } = createCollectingEmit(); + + const result = await runTurn({ + provider, + messages: [userMessage], + tools: [toolA, toolB], + dispatch: { maxConcurrent: 2, eager: true }, + conversationId: "conv-1", + turnId: "turn-1", + emit, + signal: ac.signal, + }); + + expect(result.finishReason).toBe("aborted"); + + // tool-result EVENTS are still emitted by executeStep (for live UI), + // but tool-result MESSAGES are omitted from the result (not persisted). + const toolResultEvents = events.filter((e) => e.type === "tool-result"); + expect(toolResultEvents).toHaveLength(2); + for (const tr of toolResultEvents) { + if (tr.type === "tool-result") { + expect(tr.isError).toBe(true); + } + } + + // No tool messages in the result (they would orphan on the next turn). + const toolMessages = result.messages.filter((m) => m.role === "tool"); + expect(toolMessages).toHaveLength(0); + + // Assistant message has no tool-call chunks. + const assistantMsgs = result.messages.filter((m) => m.role === "assistant"); + for (const msg of assistantMsgs) { + expect(msg.chunks.some((c) => c.type === "tool-call")).toBe(false); + } + + const doneEvents = events.filter((e) => e.type === "done"); + expect(doneEvents).toHaveLength(1); + if (doneEvents[0]?.type === "done") { + expect(doneEvents[0].reason).toBe("aborted"); + } + }); + + // Critical regression: after an aborted tool call, the result messages + // must NOT contain orphaned tool messages. If they did, the next turn + // would send a `tool` role message to the provider without a preceding + // `assistant` message carrying `tool_calls` → 400 error. + it("aborted step produces no tool messages and no tool-call chunks in result", async () => { + const ac = new AbortController(); + + // Tool that hangs forever + const tool = createFakeTool("hang", () => new Promise<ToolResult>(() => {})); + + const provider: ProviderContract = { + id: "fake", + stream() { + return (async function* () { + yield { type: "text-delta", delta: "Let me run that for you" } as ProviderEvent; + yield { + type: "tool-call", + toolCallId: "tc1", + toolName: "hang", + input: {}, + } as ProviderEvent; + ac.abort(); + await delay(10); + yield { type: "finish", reason: "tool-calls" } as ProviderEvent; + })(); + }, + }; + + const result = await runTurn({ + provider, + messages: [userMessage], + tools: [tool], + dispatch: { maxConcurrent: 1, eager: true }, + conversationId: "conv-1", + turnId: "turn-1", + emit: noopEmit, + signal: ac.signal, + }); + + expect(result.finishReason).toBe("aborted"); + + // No tool messages in the result + const toolMessages = result.messages.filter((m) => m.role === "tool"); + expect(toolMessages).toHaveLength(0); + + // The assistant message should preserve text but NOT tool-call chunks + const assistantMsg = result.messages.find((m) => m.role === "assistant"); + expect(assistantMsg).toBeDefined(); + if (assistantMsg !== undefined) { + const hasToolCall = assistantMsg.chunks.some((c) => c.type === "tool-call"); + expect(hasToolCall).toBe(false); + // Text content should be preserved + const hasText = assistantMsg.chunks.some((c) => c.type === "text"); + expect(hasText).toBe(true); + } + + // Simulate what the next turn would see: the result messages are the + // conversation history (minus the user message). If we feed these to + // a simple converter, there should be NO `tool` role messages. + const toolRoleCount = result.messages.filter((m) => m.role === "tool").length; + expect(toolRoleCount).toBe(0); + }); +}); diff --git a/packages/kernel/src/runtime/dispatch.ts b/packages/kernel/src/runtime/dispatch.ts index d1c46cb..e0be1b4 100644 --- a/packages/kernel/src/runtime/dispatch.ts +++ b/packages/kernel/src/runtime/dispatch.ts @@ -35,8 +35,24 @@ export async function executeToolCall( conversationId, ...(cwd !== undefined ? { cwd } : {}), }; + // Race the tool's execute promise against the abort signal so a tool + // that hangs (ignores ctx.signal, or blocks on something the signal + // can't interrupt) can't keep runTurn from returning. When the signal + // fires we RESOLVE (not reject) with an "Aborted" result so the step + // completes normally and the existing signal.aborted → finishReason = + // "aborted" path seals the turn cleanly (done event), letting the + // caller's finally clear active state and the FE clear its spinner. try { - return await tool.execute(call.input, ctx); + const toolPromise = tool.execute(call.input, ctx); + const abortPromise = new Promise<ToolResult>((resolve) => { + signal.addEventListener("abort", () => resolve({ content: "Aborted", isError: true }), { + once: true, + }); + }); + // Swallow late rejections from the orphaned tool promise: the tool + // may reject after the race already resolved with "Aborted". + void toolPromise.catch(() => {}); + return await Promise.race([toolPromise, abortPromise]); } catch (err) { const message = err instanceof Error ? err.message : String(err); return { content: `Tool execution error: ${message}`, isError: true }; diff --git a/packages/kernel/src/runtime/run-turn.ts b/packages/kernel/src/runtime/run-turn.ts index 4a12e28..f5d80d3 100644 --- a/packages/kernel/src/runtime/run-turn.ts +++ b/packages/kernel/src/runtime/run-turn.ts @@ -89,6 +89,19 @@ function appendThinkingDelta(chunks: Chunk[], delta: string): void { } } +/** + * Remove tool-call chunks from an assistant message, returning a new message + * with only the non-tool-call chunks (text, thinking, error). Returns + * `undefined` when all chunks were tool-calls (so the caller can omit the + * message entirely). Used when a step is aborted to avoid persisting + * incomplete tool calls whose placeholder "Aborted" results would create + * orphaned `tool` messages in the next turn's history. + */ +function stripToolCallChunks(msg: ChatMessage): ChatMessage | undefined { + const stripped = msg.chunks.filter((c) => c.type !== "tool-call"); + return stripped.length > 0 ? { role: msg.role, chunks: stripped } : undefined; +} + interface StepContext { readonly provider: ProviderContract; readonly messages: ChatMessage[]; @@ -516,12 +529,36 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> { totalUsage = addUsage(totalUsage, stepResult.usage); lastStepUsage = stepResult.usage; - if (stepResult.assistantMessage !== undefined) { - messages.push(stepResult.assistantMessage); - resultMessages.push(stepResult.assistantMessage); + // When the signal is aborted mid-step, the tool results are + // placeholders ({ content: "Aborted", isError: true }). If these + // are persisted and included in the next turn's message history, + // the provider sees a `tool` role message without a preceding + // `assistant` message carrying `tool_calls` → 400 error. + // + // To prevent this, when the signal is aborted we: + // 1. Strip tool-call chunks from the assistant message (keep + // text/thinking/error chunks so the partial response is + // preserved). + // 2. Omit tool-result messages entirely (they are not persisted, + // not added to resultMessages, and not passed to onStepComplete). + // + // This keeps the conversation history clean: the assistant's + // partial text is preserved, but no incomplete tool calls are + // left dangling. The `done` event still carries + // `reason: "aborted"`, so the turn seals cleanly. + const stepAborted = signal.aborted; + const assistantMessage = + stepAborted && stepResult.assistantMessage !== undefined + ? stripToolCallChunks(stepResult.assistantMessage) + : stepResult.assistantMessage; + const toolMessages = stepAborted ? [] : stepResult.toolMessages; + + if (assistantMessage !== undefined) { + messages.push(assistantMessage); + resultMessages.push(assistantMessage); } - for (const msg of stepResult.toolMessages) { + for (const msg of toolMessages) { messages.push(msg); resultMessages.push(msg); } @@ -532,10 +569,10 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> { // SAME objects in resultMessages — the caller must NOT double-persist. if (input.onStepComplete !== undefined) { const stepMessages: ChatMessage[] = []; - if (stepResult.assistantMessage !== undefined) { - stepMessages.push(stepResult.assistantMessage); + if (assistantMessage !== undefined) { + stepMessages.push(assistantMessage); } - for (const msg of stepResult.toolMessages) { + for (const msg of toolMessages) { stepMessages.push(msg); } if (stepMessages.length > 0) { @@ -543,7 +580,7 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> { } } - if (signal.aborted) { + if (stepAborted) { finishReason = "aborted"; break; } diff --git a/packages/tool-shell/src/shell.test.ts b/packages/tool-shell/src/shell.test.ts index a70693b..07e0af4 100644 --- a/packages/tool-shell/src/shell.test.ts +++ b/packages/tool-shell/src/shell.test.ts @@ -22,8 +22,12 @@ function stubCtx(overrides?: Partial<ToolExecuteContext>): ToolExecuteContext { }; } -function fakeSpawn(result: { exitCode: number | null; timedOut: boolean }): SpawnShell { - return async () => result; +function fakeSpawn(result: { + exitCode: number | null; + timedOut: boolean; + aborted?: boolean; +}): SpawnShell { + return async () => ({ aborted: false, ...result }); } describe("validateArgs", () => { @@ -201,7 +205,7 @@ describe("createRunShellTool", () => { workdir: "/tmp", spawn: async (_params) => { _params.onOutput("hello\n", "stdout"); - return { exitCode: 0, timedOut: false }; + return { exitCode: 0, timedOut: false, aborted: false }; }, }); const result = await tool.execute({ command: "echo hello" }, stubCtx()); @@ -214,7 +218,7 @@ describe("createRunShellTool", () => { workdir: "/tmp", spawn: async (_params) => { _params.onOutput("error output\n", "stderr"); - return { exitCode: 1, timedOut: false }; + return { exitCode: 1, timedOut: false, aborted: false }; }, }); const result = await tool.execute({ command: "false" }, stubCtx()); @@ -227,7 +231,7 @@ describe("createRunShellTool", () => { workdir: "/tmp", spawn: async (_params) => { _params.onOutput("partial\n", "stdout"); - return { exitCode: null, timedOut: true }; + return { exitCode: null, timedOut: true, aborted: false }; }, }); const result = await tool.execute({ command: "sleep 999" }, stubCtx()); @@ -242,7 +246,7 @@ describe("createRunShellTool", () => { outputCap: cap, spawn: async (_params) => { _params.onOutput("a".repeat(200), "stdout"); - return { exitCode: 0, timedOut: false }; + return { exitCode: 0, timedOut: false, aborted: false }; }, }); const result = await tool.execute({ command: "gen" }, stubCtx()); @@ -258,7 +262,7 @@ describe("createRunShellTool", () => { params.onOutput("line1\n", "stdout"); params.onOutput("err1\n", "stderr"); params.onOutput("line2\n", "stdout"); - return { exitCode: 0, timedOut: false }; + return { exitCode: 0, timedOut: false, aborted: false }; }, }); await tool.execute( @@ -280,7 +284,7 @@ describe("createRunShellTool", () => { workdir: "/baked", spawn: async (params) => { receivedCwd = params.cwd; - return { exitCode: 0, timedOut: false }; + return { exitCode: 0, timedOut: false, aborted: false }; }, }); await tool.execute({ command: "pwd" }, stubCtx({ cwd: "/custom" })); @@ -293,7 +297,7 @@ describe("createRunShellTool", () => { workdir: "/baked", spawn: async (params) => { receivedCwd = params.cwd; - return { exitCode: 0, timedOut: false }; + return { exitCode: 0, timedOut: false, aborted: false }; }, }); await tool.execute({ command: "pwd" }, stubCtx()); @@ -317,7 +321,7 @@ describe("createRunShellTool", () => { controller.abort(); const tool = createRunShellTool({ workdir: "/tmp", - spawn: async () => ({ exitCode: 0, timedOut: false }), + spawn: async () => ({ exitCode: 0, timedOut: false, aborted: false }), }); const result = await tool.execute({ command: "test" }, stubCtx({ signal: controller.signal })); expect(result.isError).toBe(true); @@ -329,7 +333,7 @@ describe("createRunShellTool", () => { workdir: "/tmp", spawn: async (params) => { receivedTimeout = params.timeout; - return { exitCode: 0, timedOut: false }; + return { exitCode: 0, timedOut: false, aborted: false }; }, }); await tool.execute({ command: "test", timeout: 5000 }, stubCtx()); @@ -355,3 +359,133 @@ describe("createRunShellTool (integration)", () => { expect(streamed).toContain("hello-from-shell"); }); }); + +describe("realSpawn — process-group kill on abort/timeout", () => { + it("aborts a command with a grandchild holding the pipes and resolves immediately", async () => { + const { realSpawn } = await import("./spawn.js"); + const controller = new AbortController(); + + // "sleep 30 & wait" spawns a grandchild (sleep) that inherits the stdio + // pipes. Killing just the sh parent does NOT close the pipes → close never + // fires. With detached:true + process-group kill, the grandchild dies too. + const promise = realSpawn({ + command: "sleep 30 & wait", + cwd: "/tmp", + signal: controller.signal, + timeout: 60_000, + onOutput: () => {}, + }); + + // Give the shell time to actually spawn the grandchild. + await new Promise((r) => setTimeout(r, 500)); + + controller.abort(); + + // Must resolve promptly (not wait 30s for the grandchild's sleep). + const result = await promise; + expect(result.aborted).toBe(true); + expect(result.timedOut).toBe(false); + + // Give the OS a moment to reap the killed processes. + await new Promise((r) => setTimeout(r, 200)); + + // The grandchild sleep process should be gone. Check via pgrep. + const { execSync } = await import("node:child_process"); + let sleeping: string[]; + try { + sleeping = execSync("pgrep -f 'sleep 30'", { encoding: "utf-8" }).trim().split("\n"); + } catch { + // pgrep returns non-zero when no processes match → all gone. + sleeping = []; + } + expect(sleeping.length).toBe(0); + }); + + it("times out a command with a grandchild holding the pipes and resolves promptly", async () => { + const { realSpawn } = await import("./spawn.js"); + const controller = new AbortController(); + + const promise = realSpawn({ + command: "sleep 30 & wait", + cwd: "/tmp", + signal: controller.signal, + timeout: 500, + onOutput: () => {}, + }); + + // Must resolve within a short window (not 30s). + const start = Date.now(); + const result = await promise; + const elapsed = Date.now() - start; + + expect(result.timedOut).toBe(true); + expect(result.aborted).toBe(false); + // Should resolve shortly after the 500ms timeout, well under 30s. + expect(elapsed).toBeLessThan(10_000); + + // Grandchild should be dead. + await new Promise((r) => setTimeout(r, 200)); + const { execSync } = await import("node:child_process"); + let sleeping: string[]; + try { + sleeping = execSync("pgrep -f 'sleep 30'", { encoding: "utf-8" }).trim().split("\n"); + } catch { + sleeping = []; + } + expect(sleeping.length).toBe(0); + }); + + it("captures stdout on normal completion (regression guard)", async () => { + const { realSpawn } = await import("./spawn.js"); + const controller = new AbortController(); + let output = ""; + + const result = await realSpawn({ + command: "echo hi", + cwd: "/tmp", + signal: controller.signal, + timeout: 5_000, + onOutput: (data) => { + output += data; + }, + }); + + expect(result.aborted).toBe(false); + expect(result.timedOut).toBe(false); + expect(result.exitCode).toBe(0); + expect(output).toContain("hi"); + }); + + it("aborts a simple single-process command and resolves with aborted: true", async () => { + const { realSpawn } = await import("./spawn.js"); + const controller = new AbortController(); + + const promise = realSpawn({ + command: "sleep 30", + cwd: "/tmp", + signal: controller.signal, + timeout: 60_000, + onOutput: () => {}, + }); + + // Let the sleep actually start. + await new Promise((r) => setTimeout(r, 300)); + + controller.abort(); + + const result = await promise; + expect(result.aborted).toBe(true); + expect(result.timedOut).toBe(false); + + // The sleep process should be gone. + await new Promise((r) => setTimeout(r, 200)); + const { execSync } = await import("node:child_process"); + let sleeping: string[]; + try { + sleeping = execSync("pgrep -f 'sleep 30'", { encoding: "utf-8" }).trim().split("\n"); + } catch { + sleeping = []; + } + expect(sleeping.length).toBe(0); + }); +}); diff --git a/packages/tool-shell/src/shell.ts b/packages/tool-shell/src/shell.ts index d96d73e..cc76bca 100644 --- a/packages/tool-shell/src/shell.ts +++ b/packages/tool-shell/src/shell.ts @@ -12,6 +12,7 @@ export interface ValidatedArgs { export interface SpawnResult { readonly exitCode: number | null; readonly timedOut: boolean; + readonly aborted: boolean; } export type SpawnShell = (params: { @@ -139,7 +140,6 @@ export function createRunShellTool(deps: { }; let spawnResult: SpawnResult; - let aborted = false; try { spawnResult = await deps.spawn({ @@ -154,7 +154,6 @@ export function createRunShellTool(deps: { }); } catch (err: unknown) { if (ctx.signal.aborted) { - aborted = true; return buildResult({ exitCode: null, timedOut: false, @@ -172,7 +171,7 @@ export function createRunShellTool(deps: { return buildResult({ exitCode: spawnResult.exitCode, timedOut: spawnResult.timedOut, - aborted, + aborted: spawnResult.aborted, output, cap, }); diff --git a/packages/tool-shell/src/spawn.ts b/packages/tool-shell/src/spawn.ts index 9025c26..9b1d7e4 100644 --- a/packages/tool-shell/src/spawn.ts +++ b/packages/tool-shell/src/spawn.ts @@ -3,24 +3,66 @@ import type { SpawnResult, SpawnShell } from "./shell.js"; export const realSpawn: SpawnShell = (params): Promise<SpawnResult> => { return new Promise<SpawnResult>((resolve) => { + // detached: true puts the child in its own process group (pgid = child.pid). + // This lets us kill the entire group (child + any grandchildren that inherit + // the pipes) via process.kill(-pgid, "SIGKILL") on abort/timeout, so a + // backgrounded grandchild can't keep the stdio pipes open and stall the + // promise on child.on("close"). const child = nodeSpawn("sh", ["-c", params.command], { cwd: params.cwd, stdio: ["ignore", "pipe", "pipe"], + detached: true, }); + let settled = false; let timedOut = false; - let killed = false; - const timer = setTimeout(() => { - timedOut = true; - child.kill("SIGKILL"); - }, params.timeout); + let timer: ReturnType<typeof setTimeout> | undefined; + + /** Kill the entire child process group (best-effort — group may be gone). */ + const killGroup = () => { + if (child.pid !== undefined) { + try { + process.kill(-child.pid, "SIGKILL"); + } catch { + // Process group may already be gone — ignore. + } + } + }; + + /** Remove the abort listener and clear the timeout timer (no leaks). */ + const cleanup = () => { + if (timer !== undefined) { + clearTimeout(timer); + timer = undefined; + } + params.signal.removeEventListener("abort", onAbort); + }; + + /** Resolve once, then clean up so listeners/timers never leak. */ + const settle = (result: SpawnResult) => { + if (settled) return; + settled = true; + cleanup(); + resolve(result); + }; const onAbort = () => { - killed = true; - child.kill("SIGKILL"); + if (settled) return; + killGroup(); + // Resolve immediately — do NOT wait for child.on("close"), which may + // never fire if a grandchild holds the pipes open. + settle({ exitCode: null, timedOut: false, aborted: true }); }; params.signal.addEventListener("abort", onAbort, { once: true }); + timer = setTimeout(() => { + if (settled) return; + timedOut = true; + killGroup(); + // Resolve immediately — same reasoning as abort. + settle({ exitCode: null, timedOut: true, aborted: false }); + }, params.timeout); + child.stdout.on("data", (chunk: Buffer) => { params.onOutput(chunk.toString(), "stdout"); }); @@ -29,18 +71,17 @@ export const realSpawn: SpawnShell = (params): Promise<SpawnResult> => { params.onOutput(chunk.toString(), "stderr"); }); + // Normal-completion path: wait for "close" so all stdout/stderr is captured. + // If abort/timeout already settled, this is a no-op (settled === true). child.on("close", (code) => { - clearTimeout(timer); - params.signal.removeEventListener("abort", onAbort); - resolve({ exitCode: code, timedOut }); + settle({ exitCode: code, timedOut, aborted: false }); }); + // Spawn error (e.g. bad cwd, sh not found). Kill the group just in case + // and resolve — never leave the promise pending. child.on("error", () => { - clearTimeout(timer); - params.signal.removeEventListener("abort", onAbort); - if (!killed && !timedOut) { - resolve({ exitCode: 1, timedOut: false }); - } + killGroup(); + settle({ exitCode: 1, timedOut: false, aborted: false }); }); }); }; |
