fix(kernel+tool-shell): abort hanging tool calls without bricking the conversation

kernel: executeToolCall now races tool.execute against the abort signal via Promise.race; on abort resolves (not rejects) with an "Aborted" result so the step completes normally → finishReason "aborted" → turn seals cleanly (done event) → finally clears activeTurns → conversation freed, next message accepted. run-turn strips tool-call chunks from the assistant message on abort (keeps text/thinking) and omits tool-result messages to avoid persisting dangling tool calls that would 400 the provider next turn. tool-shell: realSpawn spawns detached (own process group); on abort AND timeout kills the entire group (process.kill(-pgid, SIGKILL)) and resolves immediately — no child.on("close") dependency, so a grandchild holding the pipes can't stall the spawn promise or leak. Also: ORCHESTRATOR.md migrated to dispatch CLI summon mechanism; .skills summary; bin/sync-env PATH injection; frontend handoff docs. 1453 vitest pass · tsc -b EXIT 0 · biome clean.
author: Adam Malczewski <[email protected]> 2026-06-24 14:10:03 +0900
committer: Adam Malczewski <[email protected]> 2026-06-24 14:10:03 +0900
commit: dabcbc79831052effc6ce990021feee07d661f7e (patch)
tree: 3e74e16f36d6a675abe676f0d04ca169f65f0a71
parent: b58fb8373a1f7311cead23aa9a4d1fcd6927634f (diff)
download: dispatch-dabcbc79831052effc6ce990021feee07d661f7e.tar.gz
dispatch-dabcbc79831052effc6ce990021feee07d661f7e.zip
13 files changed, 1280 insertions, 182 deletions
diff --git a/.skills/ORCHESTRATOR.md b/.skills/ORCHESTRATOR.md
index 620b983..27e3c95 100644
--- a/.skills/ORCHESTRATOR.md
+++ b/.skills/ORCHESTRATOR.md
@@ -1,3 +1,5 @@
+Operating manual for the dispatch arch-rewrite orchestrator: plan topological waves of single-owner agents, summon via opencode run, verify from contracts + tests (never read implementation), resolve contract gaps. Project-specific to this repo.
+---
 # ORCHESTRATOR.md — how to drive this project
 
 > **You are the orchestrator.** You do NOT write feature code yourself. You plan,
diff --git a/ORCHESTRATOR.md b/ORCHESTRATOR.md
index 15393c7..8f6c156 100644
--- a/ORCHESTRATOR.md
+++ b/ORCHESTRATOR.md
@@ -52,10 +52,10 @@ they justify every rule below.
    existing one?" — surface it to the user; never decide granularity silently.
 4. **Write the prompt** to `prompts/<unit>.md` (gitignored). See §3 for the
    prompt recipe.
-5. **Summon the wave** via the Task tool (`subagent_type: "Opus 4.8"`, see §2); disjoint units
-   run in PARALLEL (§2a). RE-READ the §3 scoping map before each wave so you point each agent at
-   the right rule files. You MAY read the rules/briefs themselves (they are tiny) — what you must
-   NEVER do is INLINE their contents into a Task prompt (§2 TOKEN RULE).
+5. **Summon the wave** via the `dispatch` CLI (`umans/umans-glm-5.2`, see §2); disjoint units
+   run in PARALLEL (§2a). RE-READ the §3 scoping map before each wave so you attach the right
+   rule files via `--file` for each agent. You MAY read the rules/briefs themselves (they are
+   tiny) — what you must NEVER do is INLINE their contents into the `--text` (§2 TOKEN RULE).
 6. **Verify** the reports + independently re-run checks (see §4). Trust nothing
    until you've re-run `typecheck`/`test`/`check` yourself.
 7. **Resolve** any contract gaps / errors (see §5).
@@ -63,77 +63,85 @@ they justify every rule below.
 
 ---
 
-## 2. Summoning agents via the Task tool (Opus 4.8)
+## 2. Summoning agents via the `dispatch` CLI
 
-The **Task tool** is the summon mechanism; **`subagent_type: "Opus 4.8"`** is the BUILDING
-agent (Claude Opus 4.8 — capable coder). `deepseek-v4-flash` remains the *app's own runtime
-testbench*, never a builder. (The legacy `opencode run` CLI path is retired;
+The **`dispatch` CLI** is the summon mechanism; **`umans/umans-glm-5.2`** is the BUILDING
+agent (capable coder; browse alternatives with `dispatch models`). `$DISPATCH_MODEL` (`opencode/deepseek-v4-flash`) is the *app's own
+runtime testbench*, never a builder. (The legacy `opencode run` and Task-tool paths are retired;
 `notes/opencode-agents.md` is historical.)
 
-**Same session, full tools.** A Task subagent runs in THIS session's working directory (the repo
-root, `/home/tradam/projects/dispatch/arch-rewrite`) with the normal coding toolset
-(Read/Edit/Write/Bash/`lsp`) and the same permissions — so its `lsp`/typecheck work and there is
-NO headless cross-dir permission hang. The visibility/ownership rules (§6) are NOT enforced by a
-sandbox here — they hold only because the prompt states them (the briefs do).
-
-**THE TOKEN RULE — never INLINE the briefs/rules/TASK into the prompt.** This is the whole point
-of the file-based harness: pasting `.dispatch/*` or `prompts/<unit>.md` contents into the Task
-`prompt` burns YOUR context and duplicates what the agent can read itself — exactly what we avoid.
-(Reading the small rule/brief files for your OWN understanding is allowed — the rule is about
-prompt assembly, not about you being ignorant of the rules.) Instead the Task `prompt`
-is a SHORT pointer that tells the subagent to READ those files ITSELF. The guardrail bytes then
-land in the SUBAGENT's context, never the orchestrator's — the same property the old
-`"$(cat …)"` shell-concat gave us (the concat just moves from the shell to the agent's own Read).
-
-**Canonical summon** — ONE Task call per unit:
-- `description`: `"build <unit>"`
-- `subagent_type`: `"Opus 4.8"`
-- `prompt`: the SHORT pointer below — fill in `<unit>` + its scoped-rule files (§3 map). Do NOT
-  inline any file contents.
+**Prerequisites.** The Dispatch server must be up — probe with `dispatch models` ("Unable to
+connect" → boot it, §8); the CLI defaults to it, no `--server` needed. The summoned agent is a
+SEPARATE conversation on that server, NOT this session — so set `--cwd` to the repo root so its
+file tools operate on the repo. It has the Dispatch runtime's coding toolset (read_file /
+write_file / bash — the tool extensions `host-bin` wires); it runs `tsc -b`/vitest/biome via bash
+and writes its report to `reports/<unit>.md`. The visibility/ownership rules (§6) are NOT enforced
+by a sandbox — they hold only because the briefs state them.
+
+**THE TOKEN RULE — use `--file` to attach guardrails; never inline into `--text`.** The `dispatch`
+CLI's `--file <path>` flag (repeatable) reads a file from disk and attaches its contents to the
+agent's message — the guardrail bytes land in the SUBAGENT's context, never the orchestrator's.
+This is the whole point of the file-based harness: the briefs/rules/task are delivered to the agent
+by the CLI, not pasted by you. NEVER put `.dispatch/*` or `prompts/<unit>.md` contents into the
+`--text` message — that duplicates what `--file` already delivers AND burns YOUR context. (Reading
+the small rule/brief files for your OWN understanding is allowed — the rule is about prompt
+assembly, not about you being ignorant of the rules.) The `--text` is a SHORT instruction that
+tells the agent what to DO (implement, verify, report) — not what the rules SAY.
+
+**Canonical summon** — ONE `dispatch` call per unit (fill in `<unit>` + its scoped-rule `--file`s
+per the §3 map; the `--file` ORDER is the assembly order: constitution → brief → rules → task;
+omit `--file .dispatch/extension-agent.md` for non-extension units):
 
+```bash
+cd /home/tradam/projects/dispatch/arch-rewrite
+dispatch umans/umans-glm-5.2 \
+  --cwd /home/tradam/projects/dispatch/arch-rewrite \
+  --text "You are the single owner-agent for packages/<unit>/. The attached files are your constitution, brief, rules, and task — follow them exactly in the order given. Then IMPLEMENT the task now: edit ONLY files under packages/<unit>/, run tsc -b / vitest / biome for your package, and write your report to reports/<unit>.md. Reply with ONLY a one-line status + the path reports/<unit>.md — no diffs, no logs." \
+  --file AGENTS.md \
+  --file .dispatch/package-agent.md \
+  --file .dispatch/extension-agent.md \
+  --file .dispatch/rules/one-owner.md \
+  --file .dispatch/rules/isolation-over-dry.md \
+  --file .dispatch/rules/biome-clean.md \
+  --file prompts/<unit>.md \
+  > reports/<unit>.run.log 2>&1
 ```
-You are the single owner-agent for packages/<unit>/. Read these files IN FULL with your own tools
-and follow them exactly, in this order (do NOT skip — they are your constitution, brief, rules,
-and task; do NOT paste them back to me):
-  1. AGENTS.md                                   (project constitution; may already be in context)
-  2. .dispatch/package-agent.md                  (base owner brief)
-  3. .dispatch/extension-agent.md                (ONLY if your unit is an extension — else skip)
-  4. .dispatch/rules/one-owner.md .dispatch/rules/isolation-over-dry.md .dispatch/rules/biome-clean.md
-     (+ the unit's other scoped rules per the §3 map — list them here)
-  5. prompts/<unit>.md                           (YOUR task)
-Then IMPLEMENT it now: edit ONLY files under packages/<unit>/, run tsc -b / vitest / biome for
-your package, and write your report to reports/<unit>.md.
-Reply with ONLY a one-line status + the path reports/<unit>.md — no diffs, no logs.
-```
-
-The read list IS the old fixed assembly order (constitution → package brief → extension supplement
-→ scoped rules → TASK); the ONLY change is the AGENT reads it instead of the shell concatenating
-it. Name in line 4 ONLY the scoped-rule files matching the unit's layer (§3 map) — don't dump
-every rule. `prompts/<unit>.md` stays JUST the TASK block (§3).
-
-**Output discipline.** The Task tool returns ONE short final message (not the agent's stream), so
-the firehose can't flood you. Keep it that way: the agent replies tiny and writes the real report
-to `reports/<unit>.md`, which you then `Read` from disk. NEVER ask an agent to paste diffs/logs
-back; never `Read` a giant log into context — `Grep` it for a specific error if needed.
 
-**Parallel waves.** Launch a wave by emitting MULTIPLE Task calls IN ONE message (concurrent) —
-one per unit — but ONLY when their file sets are disjoint (single-writer, §6/§2a). Sequence
-dependent waves across separate messages (later waves compile against earlier ones). Log parallel
-runs in `tasks.md`. No timeout/backgrounding knobs to manage — the Task tool handles long runs and
-notifies on completion.
+The `--file` list IS the fixed assembly order (constitution → package brief → extension
+supplement → scoped rules → TASK); the CLI delivers their contents to the agent directly — no
+read_file round-trips needed. Attach ONLY the scoped-rule files matching the unit's layer
+(§3 map) — don't attach every rule. `prompts/<unit>.md` stays JUST the TASK block (§3).
+
+**Output discipline — capture the stream, never display it.** The `dispatch` summon STREAMS the
+agent's full response (reasoning + tool calls + results) to stdout — enormous for a building
+task. ALWAYS redirect to a log file (`> reports/<unit>.run.log 2>&1`) and do NOT `cat` it back
+wholesale. You don't need the raw stream: read the agent's `reports/<unit>.md` report, and, if you
+must, `grep`/`tail` the log for a specific error. The conversation ID prints at the end of the
+stream as `[conversation] <uuid>` — capture it so you can `dispatch read <short-id>` later (fetch
+the last message without re-streaming) or `dispatch send <short-id> --text "…" --queue` to steer
+mid-turn. Treat dumping a full run log into context as a hard failure.
+
+**Run discipline:**
+- **Do NOT background it. Use a large timeout** (e.g. 1800000 ms = 30 min) — these are long tasks.
+  The `dispatch` call blocks until the agent's turn completes; if a `run_shell` times out, the
+  conversation KEEPS RUNNING on the server — recover with `dispatch read <short-id>`.
+- One `run_shell` per summon (foreground, large timeout). For PARALLEL agents on disjoint files,
+  launch multiple summons as CONCURRENT `run_shell` tool calls — but ONLY when their file sets
+  do not overlap (single-writer rule, §6/§2a). Log parallel runs in `tasks.md`.
 
 **GOTCHAS:**
-- **Don't burn your own tokens.** Re-read THE TOKEN RULE above: point the agent at the files;
-  never inline them. The biggest failure mode here is the orchestrator pasting briefs/rules/
-  prompts into a fat Task prompt — that defeats the harness. (Reading them yourself is fine.)
-- **No sandbox = state the rules.** Because the subagent shares your tools/cwd, nothing stops it
-  editing out of its lane except the prompt. Always include the ownership/visibility briefs (they
-  tell the agent: never edit outside `packages/<unit>/`; read only OTHER units' contracts; if you
-  think you must read another unit's impl, REPORT and STOP).
-- **Make agents IMPLEMENT, not deliberate** (§3): the pointer says "IMPLEMENT it now … write the
-  report". A plan-only return → re-summon (§5a).
-- **Smoke check:** `Task(subagent_type:"Opus 4.8", prompt:"Reply with exactly SMOKE_OK")` should
-  return `SMOKE_OK`.
+- **Don't burn your own tokens.** Re-read THE TOKEN RULE above: attach the files via `--file`;
+  never paste their contents into `--text`. The biggest failure mode here is the orchestrator
+  inlining briefs/rules/prompts — that defeats the harness AND pollutes the orchestrator's
+  context. (Reading them yourself is fine.)
+- **No sandbox = state the rules.** Because the subagent shares the repo cwd (via `--cwd`),
+  nothing stops it editing out of its lane except the prompt. Always include the
+  ownership/visibility briefs (they tell the agent: never edit outside `packages/<unit>/`; read
+  only OTHER units' contracts; if you think you must read another unit's impl, REPORT and STOP).
+- **Make agents IMPLEMENT, not deliberate** (§3): the `--text` says "IMPLEMENT the task now …
+  write the report". A plan-only return → re-summon (§5a).
+- **Smoke check:** `dispatch umans/umans-glm-5.2 --text "Reply with exactly SMOKE_OK"` should
+  print `SMOKE_OK`.
 
 ---
 
@@ -142,7 +150,7 @@ notifies on completion.
 Throughput comes from running disjoint units at once. Organise it as waves:
 - **A wave = units that (a) touch DISJOINT files and (b) have no compile-time dependency
   on each other** (each imports only already-built packages + existing contracts). Launch a
-  wave by emitting one summon per unit as CONCURRENT tool calls (§2). Later waves depend on
+  wave by emitting one `dispatch` summon per unit as CONCURRENT `run_shell` calls (§2). Later waves depend on
   earlier ones; the composition root (`packages/host-bin/`) is almost always the LAST wave.
 - **Pre-author the seam to widen the wave.** Because the orchestrator OWNS contracts (§6),
   author the shared contract / typed handle in `packages/kernel/src/contracts/*` FIRST, then
@@ -164,7 +172,7 @@ Throughput comes from running disjoint units at once. Organise it as waves:
 
 The invariant guardrails — single-writer directory ownership, visibility, coupling, the
 engineering standard, isolated verification, and the report format — live ONCE in the
-standardized briefs the summon points the agent at (§2; the agent reads them itself):
+standardized briefs the summon attaches via `--file` (§2; the CLI delivers them to the agent):
 - **`.dispatch/package-agent.md`** — the base for EVERY package owner.
 - **`.dispatch/extension-agent.md`** — the extension-only supplement (added for extension summons).
 
@@ -262,8 +270,9 @@ live runs (§8 bracket trick), since a leak silently poisons the next run's coun
   barrel `index.ts`, a sibling conforming to a contract change), the orchestrator
   **summons the owning agent** — it does NOT edit implementation itself.
 - **Live API errors:** an HTTP 429 `GoUsageLimitError` is an UPSTREAM rate limit,
-  not a bug. The `opencode-2` key has a monthly cap; `opencode-1` is the backup.
-  Swap `DISPATCH_API_KEY` in `.env` (both keys are there).
+  not a bug — and now that building agents are dispatched THROUGH the server's provider (not a
+  host model), a mid-build 429 is just as likely as a runtime one. The provider key has a monthly
+  cap; swap `DISPATCH_API_KEY` in `.env` (backup keys noted there) and retry.
 
 ---
 
@@ -340,7 +349,7 @@ live runs (§8 bracket trick), since a leak silently poisons the next run's coun
 ```
 /home/tradam/projects/dispatch/arch-rewrite   # THE worktree (branch arch/rewrite)
 
-  AGENTS.md        the subagent constitution (auto-loaded by opencode; you enforce it)
+  AGENTS.md        the subagent constitution (the summon points each agent at it; you enforce it)
   ORCHESTRATOR.md  the orchestrator's operating manual (this file)
   GLOSSARY.md      canonical vocabulary + aliases-to-avoid (human-gated)
   tasks.md         live progress checklist / milestone log
@@ -358,7 +367,7 @@ live runs (§8 bracket trick), since a leak silently poisons the next run's coun
     observability-design.md  logging/spans/collector/trace-store design (Phase A–B)
     cli-design.md            CLI design decisions + unit plan (built; §3 = settled decisions)
     frontend-design.md       future web frontend design (IDEATION; separate repo)
-    opencode-agents.md       LEGACY — notes on the retired `opencode run` CLI summon path (§2 now uses the Task tool / Opus 4.8)
+    opencode-agents.md       LEGACY — notes on the retired `opencode run` CLI summon path (§2 now uses the `dispatch` CLI)
 
   prompts/   (gitignored — orchestrator→agent TASK blocks)
   reports/   (gitignored — agent→orchestrator reports)
diff --git a/bin/sync-env b/bin/sync-env
index 9a69076..4b7c64c 100755
--- a/bin/sync-env
+++ b/bin/sync-env
@@ -4,12 +4,21 @@ set -euo pipefail
 ENV_FILE="/etc/dispatch/env"
 OPENCODE_KEY="sk-P2DW7UFzQZgI8TnYzp6wWKxv6suzFvSGRrwewhUK0OjA4C7bLzsKHki7r7zZOAH1"
 UMANS_KEY="sk-bvI-mgUrp5qJzMlK8ndoDqJ3dK0U-ds_8KYS8pzpRLw"
+# PATH that includes user-installed tools (mise, bun, npm global, etc.)
+DISPATCH_PATH="/home/tradam/.local/bin:/home/tradam/.local/share/mise/shims:/home/tradam/.bun/bin:/usr/local/sbin:/usr/local/bin:/usr/bin"
 
 if [ ! -f "$ENV_FILE" ]; then
 	echo "Error: $ENV_FILE not found" >&2
 	exit 1
 fi
 
+# Update PATH (so tools like typescript-language-server, npm, bun are found)
+if grep -q "^PATH=" "$ENV_FILE"; then
+	sed -i "s|^PATH=.*|PATH=$DISPATCH_PATH|" "$ENV_FILE"
+else
+	echo "PATH=$DISPATCH_PATH" >> "$ENV_FILE"
+fi
+
 # Update DISPATCH_API_KEY
 if grep -q "^DISPATCH_API_KEY=" "$ENV_FILE"; then
 	sed -i "s|^DISPATCH_API_KEY=.*|DISPATCH_API_KEY=$OPENCODE_KEY|" "$ENV_FILE"
diff --git a/broken-chat-repair-handoff.md b/broken-chat-repair-handoff.md
new file mode 100644
index 0000000..12deec0
--- /dev/null
+++ b/broken-chat-repair-handoff.md
@@ -0,0 +1,180 @@
+# Handoff → orchestrator (bcb5): broken-chat self-repair
+
+> From: diagnostic session. Agent/conversation `77574596`
+> (`77574596-3e7b-46f8-8d67-c9e17a529dee`) "broke unrecoverably." User goal:
+> **chats must self-heal when broken so they can continue.** Implement the fixes
+> below. Full diagnosis + plan also in `reports/broken-chat-repair-diagnosis.md`.
+
+## 0. Your job (TL;DR)
+
+`reconcile()` only repairs orphaned tool-calls. The production DB has **two other
+broken states** it doesn't handle, and they make a chat uncontinuable. Implement a
+read-time repair so broken chats auto-heal on next open — **no DB surgery**
+(append-only durability preserved; repair is a turn-path transform that runs on
+every `load()`). Three units, two repos:
+
+- **Wave 1 (arch-rewrite, PARALLEL — disjoint packages):**
+  - `conversation-store` — extend `reconcile` (Layer 1) + harden `load()`.
+  - `openai-stream` — harden `convertMessages` args (Layer 2).
+- **Wave 2 (separate repo `../claude`, SEPARATE agent):**
+  - `provider-anthropic` — harden its `safeJson` (Layer 2 equivalent).
+
+**Key architectural insight that shapes the waves:** Layer 1 lives in
+`conversation-store.reconcile`, which runs in `load()` BEFORE any provider sees
+the messages. So the Layer 1 fix protects **every** provider (openai-compat AND
+anthropic) — the Claude plugin needs **no** Layer 1 change. Layer 2 (malformed
+tool-call args) is **per-provider** serialization safety, so it must be applied
+in each provider's converter (openai-stream + provider-anthropic).
+
+## 1. The break (what actually happened in `77574596`)
+
+Production DB: `/var/lib/dispatch/dispatch.db` (systemd `dispatch.service`).
+136 chunks; seq counter = 136; **all JSON valid; no orphaned tool-calls** — so
+`reconcile()` finds nothing wrong, yet the chat is uncontinuable. The tail:
+
+| seq | role | type | note |
+|---|---|---|---|
+| 133 | assistant | text | "Wave 0 fully verified…" |
+| 134 | assistant | tool-call | `todo_write`, `input` = **malformed JSON** (`json_type=text`, raw string) |
+| 135 | tool | tool-result | isError: "todo_write args must be an object with a `todos` array" |
+| 136 | assistant | **error** | `HTTP 400: unexpected character: line 1 column 1413 (char 1412). Received Model Group=glm-5.2` |
+
+### Root cause (confirmed byte-for-byte)
+- seq 134's `input` is a raw string. Parsing it fails
+  `Expecting ':' delimiter: line 1 column 1413 (char 1412)` — an **exact match** to
+  the provider's `unexpected character: line 1 column 1413`. The **model emitted
+  malformed JSON as the `todo_write` arguments**.
+- Chain: model emits text + malformed-args tool-call (step 5) → kernel dispatches
+  the tool, which returns an error result (seq 135) → kernel calls the provider
+  again (step 6); the request re-includes the assistant message carrying the
+  malformed `arguments` → provider 400s → persisted as an `error` chunk (seq 136).
+
+### Why it's "unrecoverable"
+- `openai-stream` `convertAssistantMessage` serializes tool-call args as
+  `typeof c.input === "string" ? c.input : JSON.stringify(c.input)` — passes the
+  malformed string straight through as the OpenAI `arguments` field → provider
+  400s on **every** continuation.
+- The trailing `assistant` message whose only chunk is `error` serializes to
+  `content:""` + no tool_calls (error chunk is filtered out, leaving an empty
+  assistant message) → also uncontinuable.
+- `reconcile()` touches neither. `load()` also has no try/catch on
+  `JSON.parse(value)` — a single corrupt row would throw and brick the chat.
+
+### Scope (production DB, 140 conversations)
+- **6 conversations end in a trailing `error` chunk:** `102587c0`(seq2, HTTP 401
+  model-not-supported), `2bf78252`(seq2), `61127511`(seq250), `77574596`(seq136),
+  `d0d85eca`(seq2), `e1ee0989`(seq20).
+- **2 tool-calls total** carry a raw malformed-string `input`.
+- `102587c0` has **only** the trailing-error break (no args, no tool-calls) —
+  proving Layer 1 is independently necessary. `77574596` has **both**.
+
+## 2. The fix
+
+### Layer 1 — `conversation-store` `reconcile.ts` (structural repair)
+Extend `reconcileWithReport` to:
+1. **Strip `error` chunks from assistant messages.** An `error` chunk is a
+   failed-generation marker, never valid provider content (no provider understands
+   an "error" content type) — provider-agnostic.
+2. **Drop any assistant message left with no `text` and no `tool-call` chunks**
+   (the now-empty error-only message). This is what unblocks continuation. **Safe:**
+   an error-only step ends with no tool-calls, so it is never followed by a `tool`
+   message — no "tool-without-preceding-assistant-tool_calls" 400 can result. Keep
+   the existing orphaned-tool-call synthesis unchanged.
+3. Extend `ReconcileReport` with counts of stripped error chunks / dropped messages
+   (for the existing `reconcile.repair` boot/log span).
+
+Why here: the constitution designates `reconcile` as "the pure function run on load
+that repairs any partial turn." A trailing error-only assistant message IS a
+partial/broken turn. Pure, provider-agnostic, runs on every `load()` → auto-repairs
+all 6 broken chats. Repair is read-time only; storage (append-only) untouched.
+`loadSince` (FE reads) is intentionally NOT reconciled, so the user still SEES the
+error while the provider gets clean history.
+
+### Hardening — `conversation-store` `store.ts` `load()` (same unit)
+Wrap the per-chunk `JSON.parse(value)` in try/catch: on a corrupt/unparseable row,
+log + skip it (don't throw) so `reconcile` can still run on the rest. Today a single
+bad row makes `load()` throw → unrecoverable. (0 such rows today; "never leave the
+system broken" asks for it.)
+
+### Layer 2 — `openai-stream` `convert-messages.ts` (serialization safety)
+In `convertAssistantMessage`, ensure a tool-call's `arguments` is **always a valid
+JSON string**: if `input` is a string, `JSON.parse` it; on failure substitute a
+valid fallback object (e.g. `JSON.stringify({})` or a wrapped
+`{ _malformed_arguments: <truncated> }`). Objects pass through `JSON.stringify` as
+today. This neutralizes already-stored malformed args (seq 134) so the provider
+stops 400ing on continuation. Follow the SAME semantics as the Claude fix below
+(isolation over DRY: each provider reimplements locally, same behavior).
+
+### Layer 2 (equivalent) — `../claude` `provider-anthropic` `convert.ts` (SEPARATE agent)
+The Claude plugin already has a `safeJson(s)` helper (line ~115) used at
+`input: typeof c.input === "string" ? safeJson(c.input) : c.input`. But its fallback
+**returns the raw string `s` on parse failure** — for Anthropic, `tool_use.input`
+must be an object, so a raw string can still 400 when a historical malformed tool_use
+is re-sent. Fix: make `safeJson` return a **valid object fallback** (e.g. `{}`) on
+parse failure instead of the raw string. (Layer 1 does NOT apply here — the
+arch-rewrite `reconcile` already strips error chunks before the Claude provider sees
+the messages, so the Claude converter never receives error-only assistant messages.)
+
+## 3. Waves & summoning
+
+- **Wave 1 (arch-rewrite, PARALLEL):** `conversation-store` (Layer 1 + `load()`
+  hardening) and `openai-stream` (Layer 2). Disjoint packages, no contract/type
+  change, both depend only on already-built `@dispatch/kernel` contracts. Standard
+  summon per ORCHESTRATOR §2/§3 (attach the scoped rules: conversation-store gets
+  `pure-core.md`+`no-internal-mocks.md`+`typed-handles.md`+`extension-logging.md`;
+  openai-stream gets `pure-core.md`+`no-internal-mocks.md`+`extension-logging.md`;
+  both get `one-owner.md`+`isolation-over-dry.md`+`biome-clean.md`+`package-agent.md`+
+  `extension-agent.md`).
+- **Wave 2 (separate repo, SEPARATE agent):** summon against
+  `--cwd /home/tradam/projects/dispatch/claude` for `packages/provider-anthropic`
+  (`convert.ts` `safeJson`). That repo has its own `AGENTS.md`; attach the
+  arch-rewrite `package-agent.md`+scoped rules as needed. Can run in parallel with
+  Wave 1 (different repo, no shared files).
+
+## 4. Why this auto-heals `77574596` (and the other 5) — no DB surgery
+On next open/continue, `load()` returns history ending at seq 135 (the tool-result):
+Layer 1 strips the seq-136 error message; Layer 2 sanitizes the seq-134 args to
+valid JSON. The provider receives
+`[…, assistant{text+tool-call(args:{})}, tool{error result}]` — a valid "continue
+after a tool result" state. The model sees its `todo_write` failed and adjusts.
+Chat continues. Same auto-repair applies to the other 5 (Layer 1 alone for the
+401/empty cases; Layer 1+2 for any malformed-args case).
+
+## 5. Test requirements (regression scar tissue)
+
+**conversation-store `reconcile.test.ts`:**
+- `reconcile strips error-only trailing assistant message` (the 77574596/102587c0
+  shape: `[user, assistant{error}]` → `[user]`).
+- `reconcile strips error chunk but keeps sibling text`
+  (`assistant{text,error}` → `assistant{text}`).
+- `reconcile drops assistant message left empty after stripping error`
+  (`assistant{error}` only → dropped).
+- `reconcile keeps tool-call + strips error` (`assistant{tool-call,error}` with a
+  matching result → `assistant{tool-call}`).
+- existing orphaned-tool-call behavior unchanged (regression).
+- (hardening) corrupt-JSON chunk row is skipped, rest load + reconcile.
+
+**openai-stream `convert-messages.test.ts`:**
+- `arguments is valid JSON when input is a malformed string` (seed from seq 134's
+  raw string → output `JSON.parse`s, no throw).
+- `arguments passes through valid string input` and `stringifies object input`
+  (regression).
+
+**provider-anthropic `convert.test.ts` (claude repo):**
+- `safeJson returns a valid object fallback on malformed string` (raw malformed
+  string → `{}` or wrapped object, not the raw string).
+- `safeJson parses valid string input` (regression).
+
+## 6. Verify (ORCHESTRATOR §4)
+`bun run typecheck && bun run test && bun run check` whole-project green; both agents
+in-lane (`git status --short`); zero internal mocks in the pure-core units. Live-spot:
+open `77574596` against a probe/`bin/up` and confirm it now continues past the tool
+result instead of 400-looping.
+
+## 7. Notes / out of scope
+- **Parse-time prevention** (openai-stream / provider-anthropic could reject or
+  repair malformed args when the model emits them, instead of storing a raw
+  string) is a deeper follow-up; Layer 2 is the safety net that also repairs
+  already-stored data.
+- Deploying the fix auto-repairs the 6 broken production chats on next load — no
+  migration needed.
diff --git a/frontend-cwd-resolution-handoff.md b/frontend-cwd-resolution-handoff.md
new file mode 100644
index 0000000..66c93b4
--- /dev/null
+++ b/frontend-cwd-resolution-handoff.md
@@ -0,0 +1,95 @@
+# Backend handoff — cwd resolution fixes (backend → FE) — courier doc
+
+> **From:** arch-rewrite orchestrator · **To:** dispatch-web orchestrator (b18a) · **Courier:** the user.
+> Response to the cwd bug report you sent to backend agent ab13. The fixes are DONE and
+> live-verified on the dev stack.
+
+## Version bumps
+
+| Package | From | To | Notes |
+|---|---|---|---|
+| `@dispatch/wire` | — | — | **Unchanged** |
+| `@dispatch/transport-contract` | — | — | **Unchanged** |
+| `@dispatch/ui-contract` | — | — | **Unchanged** |
+
+**This is a behavior-only change.** No wire/transport-contract types changed. No FE re-pin or
+re-mirror needed. The FE needs NO contract change to benefit.
+
+---
+
+## 1. The fix (what was broken → what now works)
+
+You reported: a workspace `defaultCwd` set, a conversation with no explicit cwd, and `pwd` ran in
+the server default (`process.cwd()`) instead of the workspace `defaultCwd`. Plus your desired
+behavior: a per-conversation cwd **relative to the workspace `defaultCwd`** unless absolute.
+
+**Root cause (backend-only):** the workspace-relative resolution lived in
+`conversation-store.getEffectiveCwd`, which only resolved the *persisted* cwd. But the FE sends the
+CwdField value as a **per-turn `cwd` on `chat.send`**, and `session-orchestrator` used a per-turn
+`cwd` **as-is** — bypassing `getEffectiveCwd` entirely. So a relative `cwd` like `"arch-rewrite"`
+reached `run_shell` raw → resolved against `process.cwd()` → a nonexistent path → `pwd` broke.
+
+**Three backend fixes (all live-verified):**
+
+1. **Per-turn `cwd` is now resolved.** `session-orchestrator` passes the per-turn `cwd` (on
+   `chat.send`/`POST /chat` AND on manual `POST /chat/warm`) through `getEffectiveCwd` as an
+   override, so it goes through the same workspace-relative algorithm as the persisted cwd.
+2. **New-conversation timing.** A brand-new conversation's first turn previously ran
+   `getEffectiveCwd` *before* the workspace was assigned (so it saw `"default"`, not the request's
+   workspace). Now the workspace is assigned first. A relative per-turn `cwd` on the FIRST message
+   of a new conversation now resolves against the intended workspace.
+3. **`DELETE /conversations/:id/cwd` was a stub** (returned `{cwd:null}` but did NOT clear the
+   persisted key). It now calls `clearCwd` and truly deletes the persisted cwd.
+
+## 2. The resolution algorithm (now applied to BOTH persisted and per-turn cwd)
+
+```
+workspaceId     = persisted conversation workspaceId ("default" fallback)
+workspaceCwd    = workspace.defaultCwd ?? null
+conversationCwd = the explicit cwd (persisted via GET /cwd, OR the per-turn chat.send cwd)
+
+if (conversationCwd == null)        → workspaceCwd ?? serverDefaultCwd   // process.cwd()
+else if (conversationCwd absolute) → conversationCwd                    // starts with "/"
+else                               → path.resolve(workspaceCwd ?? serverDefaultCwd, conversationCwd)
+```
+
+`serverDefaultCwd` = `process.cwd()` (the server's cwd).
+
+## 3. FE impact (minimal — no contract change)
+
+You do NOT need to change anything. Both FE patterns now work correctly:
+
+- **If you omit `cwd` on `chat.send`** (your current code): the backend resolves the persisted
+  conversation cwd (set via `PUT /conversations/:id/cwd`) through the algorithm. ✅
+- **If you send a relative `cwd` on `chat.send`**: it is resolved against the workspace
+  `defaultCwd`. ✅ (was broken — used raw)
+- **If you send an absolute `cwd`** (starts `/`): overrides outright. ✅
+
+### Endpoints (semantics — shapes unchanged)
+
+- `GET /conversations/:id/cwd` → **unchanged**: the RAW explicit conversation cwd (`null` =
+  inheriting workspace default). Your CwdField shows what the user typed.
+- `GET /conversations/:id/lsp` → returns the **effective** (resolved) cwd. It now roots LSP at the
+  effective cwd INCLUDING the server-default fallthrough (when neither conversation nor workspace
+  cwd is set, LSP roots at `process.cwd()`). Previously returned `cwd: null` + empty `servers` when
+  no cwd was set.
+- `DELETE /conversations/:id/cwd` → **now actually clears** the persisted cwd (was a no-op stub).
+  Returns `{ conversationId, cwd: null }` (unchanged shape). Use this to reset a conversation's cwd
+  to "inherit workspace default".
+- `PUT /conversations/:id/cwd` → unchanged (persists the raw value).
+
+## 4. Optional FE simplification (not required)
+
+You MAY now safely **omit `cwd` on `chat.send`** entirely and rely on the backend resolving the
+persisted conversation cwd (set via `PUT /conversations/:id/cwd`). This was the design you
+described in the original report. Either path (send cwd, or omit it) is correct; the backend
+resolves both consistently. Sending it is harmless; omitting it avoids sending redundant data.
+
+## 5. Live-verified (dev stack, workspace `test` defaultCwd `/home/tradam/projects/dispatch`)
+
+- Existing conversation, per-turn `cwd:"arch-rewrite"` → `pwd` = `/home/tradam/projects/dispatch/arch-rewrite` ✅
+- Brand-new conversation, per-turn `cwd:"arch-rewrite"` → `pwd` = `/home/tradam/projects/dispatch/arch-rewrite` ✅
+- Chat omitting `cwd` (persisted cwd `arch-rewrite`) → `pwd` = `/home/tradam/projects/dispatch/arch-rewrite` ✅
+- `PUT /tmp/test` → GET `/tmp/test` → DELETE → GET `null` (actually cleared) ✅
+
+`tsc -b` EXIT 0, biome clean, 1311 vitest pass.
diff --git a/frontend-model-persistence-handoff.md b/frontend-model-persistence-handoff.md
new file mode 100644
index 0000000..912cea6
--- /dev/null
+++ b/frontend-model-persistence-handoff.md
@@ -0,0 +1,91 @@
+# Frontend handoff — per-conversation model persistence
+
+## What changed
+
+A chat's selected provider + model is now **persisted per conversation**
+(like `cwd` and `reasoningEffort` already are). Opening a conversation in a new
+browser session recalls the originally selected model instead of defaulting to
+the server default.
+
+## Contract version bump
+
+`@dispatch/transport-contract` `0.19.0 → 0.20.0` — re-pin the `file:` dep and
+re-mirror `.dispatch/transport-contract.reference.md`.
+
+## New types (additive)
+
+```ts
+// GET /conversations/:id/model
+export interface ModelResponse {
+  readonly conversationId: string;
+  readonly model: string | null;  // <credentialName>/<model> form, or null
+}
+
+// PUT /conversations/:id/model
+export interface SetModelRequest {
+  readonly model: string | null;  // null clears the persisted selection
+}
+```
+
+## New endpoints
+
+### `GET /conversations/:id/model`
+Returns `ModelResponse`. `model` is `null` when never set (the server then
+resolves turns using the default provider + model).
+
+### `PUT /conversations/:id/model`
+Body: `SetModelRequest`. Set `model` to a `<credentialName>/<model>` string
+(one of the values from `GET /models`) to persist it. Set `model` to `null`
+to clear the persisted selection. Returns `ModelResponse` with the resulting
+value.
+
+## What the FE should do
+
+1. **On conversation open** — call `GET /conversations/:id/model` to fetch the
+   persisted model. If non-null, set the model selector to that value. If null,
+   use the global default (current behavior).
+
+2. **On model select** — call `PUT /conversations/:id/model` with the selected
+   model name (`<credentialName>/<model>` form). This persists it so future
+   turns (and new browser sessions) use the same model.
+
+3. **On model clear** (if the FE supports clearing back to default) — call
+   `PUT /conversations/:id/model` with `{ model: null }`.
+
+4. **No `ChatRequest.model` change needed** — the FE may continue sending
+   `model` on `chat.send` (per-turn override); the backend persists it. Or the
+   FE may omit `model` on `chat.send` and rely on the persisted value — the
+   backend resolves it. Either way works.
+
+## Backend behavior
+
+- **Per-turn override** (`ChatRequest.model` / `chat.send` model) takes
+  precedence and is persisted.
+- **No per-turn override** → backend checks `getModel(conversationId)` → if
+  non-null, uses it; if null, falls through to the default provider.
+- **Warm path** also resolves the model from persistence when no explicit
+  override is given (parity with real turns).
+
+## No FE handoff needed for tasks 1 & 2
+
+- **Task 1** (workspace tab broadcast): already couriered to 29ae by a prior
+  orchestrator agent (`frontend-workspace-open-handoff.md`).
+- **Task 2** (system-prompt cwd reconstruction): backend-only fix, no contract
+  version bump, no FE action needed.
+
+## Assumptions made (user was away)
+
+1. **Persist the model name string** (`<credentialName>/<model>` form), not
+   the provider/credential separately — the model name already encodes both
+   (the credential binds to a provider). This mirrors how the CLI sends
+   `--model` and how `ChatRequest.model` works.
+2. **No model validation on PUT** — the backend doesn't validate the model
+   name on `PUT /conversations/:id/model` (it's just a string). The provider
+   resolves it at turn time; an unknown model → turn error, not a 400. This
+   matches the contract doc on `SetModelRequest`.
+3. **Empty string clears** — `setModel(id, "")` deletes the key. The HTTP
+   `PUT` with `{ model: null }` maps to this. This is an implementation detail
+   the FE doesn't need to know about (it sends `null`).
+4. **No `model` field on `ConversationMeta`** — following the precedent of `cwd`
+   and `reasoningEffort` (which are NOT on `ConversationMeta` but fetched via
+   dedicated endpoints). The FE calls `GET /conversations/:id/model` to read.
diff --git a/frontend-system-prompt-handoff.md b/frontend-system-prompt-handoff.md
index c0739b3..c135145 100644
--- a/frontend-system-prompt-handoff.md
+++ b/frontend-system-prompt-handoff.md
@@ -1,50 +1,18 @@
-# FE Courier Handoff: System Prompt Builder
+# FE Courier Handoff: System Prompt Builder (Updated)
 
-> Backend→FE courier. The user couriers this to `../dispatch-web` (FE agent `ffe3`).
-> `@dispatch/transport-contract` bumped to `0.18.0` (additive types).
+> Backend→FE courier. Send to FE agent `ffe3`.
+> Supersedes the earlier `frontend-system-prompt-handoff.md` — adds `prompt:workspace_id`.
 
-## Overview
+## API endpoints
 
-A template-based system prompt builder. The user defines a template with variable
-placeholders (`[type:name]`) and conditionals (`[if]`/`[else]`/`[endif]`). Variables are
-resolved at construction time (once per conversation, persisted for cache safety —
-reconstructed only on compaction).
+### `GET /system-prompt` → `{ template: string }`
+Returns the current global template. When none is stored, returns the built-in default.
 
-## API endpoints
+### `PUT /system-prompt` ← `{ template: string }` → `{ template: string }`
+Set the global template. Empty string = "no system prompt". 400 if `template` missing/wrong type. 503 if service unavailable.
 
-### `GET /system-prompt` → `SystemPromptTemplateResponse`
-```ts
-{ template: string }
-```
-Returns the current global template. When no template is stored (or the service is
-unavailable), returns the built-in `DEFAULT_TEMPLATE`.
-
-### `PUT /system-prompt` ← `SetSystemPromptTemplateRequest`
-```ts
-// Request body:
-{ template: string }
-// Response:
-{ template: string }  // echoed back
-```
-- `template` can be empty (means "no system prompt").
-- 400 if `template` is missing or not a string.
-- 503 if the system-prompt service is unavailable.
-
-### `GET /system-prompt/variables` → `SystemPromptVariablesResponse`
-```ts
-{
-  variables: readonly SystemPromptVariable[]
-}
-// SystemPromptVariable:
-{
-  type: string;        // "system", "file", "prompt", "git"
-  name: string;        // "time", "date", "os", "cwd", etc.
-  description: string; // human-readable
-  dynamic?: boolean;   // true for file: (any path is valid)
-}
-```
-Static catalog — always available (no service dependency). Use this to render the
-variable selector buttons in the builder UI.
+### `GET /system-prompt/variables` → `{ variables: SystemPromptVariable[] }`
+Static catalog — always available (no service dependency). Use this to render the variable selector buttons.
 
 ## Template format
 
@@ -52,28 +20,25 @@ variable selector buttons in the builder UI.
 ```
 [type:name]
 ```
-Resolves the variable at construction time. Unknown type → blank string. Non-existent
-variable (e.g. file not found) → blank string.
+Resolves at construction time. Unknown type → blank. Non-existent variable (e.g. file not found) → blank.
 
 ### Conditional blocks
 ```
 [if type:name]
-  ...content if variable exists...
+  ...if variable exists...
 [else]
-  ...content if variable does NOT exist...
+  ...if not...
 [endif]
 ```
-Negated condition:
+Negated:
 ```
 [if !type:name]
-  ...content if variable does NOT exist...
+  ...if variable does NOT exist...
 [endif]
 ```
-- Nested `[if]` blocks: supported.
-- Multi-line content: supported.
-- Unmatched `[if]`/`[endif]`: treated as literal text.
+Nested `[if]`: supported. Multi-line: supported. Unmatched `[if]`/`[endif]`: literal text.
 
-## Available variables
+## Available variables (updated)
 
 | Type:Name | Description | Dynamic? |
 |---|---|---|
@@ -84,26 +49,19 @@ Negated condition:
 | `prompt:cwd` | Working directory | No |
 | `prompt:model` | Current model name | No |
 | `prompt:conversation_id` | Conversation ID | No |
+| `prompt:workspace_id` | Workspace identifier — lets the AI know which workspace it's in, useful when summoning agents | No |
 | `git:branch` | Current git branch | No |
 | `git:status` | Short git status | No |
 | `file:<path>` | File contents (relative to cwd, or absolute if starts `/`) | **Yes** |
 
-For `file:<path>`, the FE should allow free-text input for the path. Any filename is
-valid — the backend resolves it relative to the conversation's cwd (or absolute if it
-starts with `/`).
-
-## Caching behavior (important for FE)
+For `file:<path>`, allow free-text input for the path.
 
-The system prompt is **constructed once** (on the first turn of a new conversation) and
-**persisted**. It is reused on all subsequent turns (no reconstruction — this preserves
-the prompt cache). It is only **reconstructed on compaction** (fresh variable resolution).
+## Caching behavior
 
-Changing the template (via `PUT /system-prompt`) does NOT affect existing conversations
-until they are compacted. New conversations use the new template on their first turn.
+System prompt is **constructed once** (first turn of a new conversation) and **persisted**. Reused on all subsequent turns (cache-safe). Reconstructed only on **compaction**. Changing the template does NOT affect existing conversations until compacted.
 
 ## Default template
 
-When no template is stored, the backend uses:
 ```
 You are a helpful coding assistant.
 
@@ -113,11 +71,3 @@ You are a helpful coding assistant.
 
 The current working directory is [prompt:cwd].
 ```
-
-## FE UI suggestion
-
-The builder is a full-page modal split into two:
-1. **Text editor** (left/main): the template text.
-2. **Variable selectors** (right/side): buttons grouped by type. Clicking a variable
-   inserts `[type:name]` at the cursor position. For `file:` (dynamic), show a text
-   input for the path + an "insert" button.
diff --git a/packages/kernel/src/runtime/dispatch.test.ts b/packages/kernel/src/runtime/dispatch.test.ts
new file mode 100644
index 0000000..afbfb39
--- /dev/null
+++ b/packages/kernel/src/runtime/dispatch.test.ts
@@ -0,0 +1,535 @@
+import { describe, expect, it } from "vitest";
+import type { ChatMessage } from "../contracts/conversation.js";
+import type { AgentEvent } from "../contracts/events.js";
+import type { ProviderContract, ProviderEvent } from "../contracts/provider.js";
+import type { ToolContract, ToolExecuteContext, ToolResult } from "../contracts/tool.js";
+import { executeToolCall } from "./dispatch.js";
+import { runTurn } from "./run-turn.js";
+
+// ---------------------------------------------------------------------------
+// Helpers (no internal mocks — kernel standard; fakes only)
+// ---------------------------------------------------------------------------
+
+function delay(ms: number): Promise<void> {
+	return new Promise((resolve) => {
+		setTimeout(resolve, ms);
+	});
+}
+
+function createFakeProvider(script: ProviderEvent[][]): ProviderContract {
+	let callIndex = 0;
+	return {
+		id: "fake",
+		stream() {
+			const events = script[callIndex] ?? [];
+			callIndex++;
+			return (async function* () {
+				for (const event of events) {
+					yield event;
+				}
+			})();
+		},
+	};
+}
+
+function createFakeTool(
+	name: string,
+	handler?: (input: unknown, ctx: ToolExecuteContext) => Promise<ToolResult>,
+	opts?: { concurrencySafe?: boolean },
+): ToolContract {
+	return {
+		name,
+		description: `Fake tool: ${name}`,
+		parameters: { type: "object" },
+		...(opts?.concurrencySafe !== undefined ? { concurrencySafe: opts.concurrencySafe } : {}),
+		execute: handler ?? (async (input) => ({ content: `${name}: ${JSON.stringify(input)}` })),
+	};
+}
+
+function createCollectingEmit(): { events: AgentEvent[]; emit: (event: AgentEvent) => void } {
+	const events: AgentEvent[] = [];
+	return { events, emit: (event) => events.push(event) };
+}
+
+const noopEmit = () => {};
+
+const userMessage: ChatMessage = {
+	role: "user",
+	chunks: [{ type: "text", text: "hello" }],
+};
+
+const ABORTED_RESULT: ToolResult = { content: "Aborted", isError: true };
+
+// ===========================================================================
+// executeToolCall — direct unit tests for the abort-signal race
+// ===========================================================================
+
+describe("executeToolCall", () => {
+	it("returns the tool's result when the tool resolves before abort", async () => {
+		const ac = new AbortController();
+		const tool = createFakeTool("echo", async (input) => ({
+			content: `echo: ${JSON.stringify(input)}`,
+		}));
+
+		const result = await executeToolCall(
+			{ id: "tc1", name: "echo", input: { x: 1 } },
+			tool,
+			ac.signal,
+			noopEmit,
+			"conv-1",
+			"turn-1",
+		);
+
+		expect(result).toEqual({ content: 'echo: {"x":1}' });
+	});
+
+	it("returns Aborted immediately when signal is already aborted at call time", async () => {
+		const ac = new AbortController();
+		ac.abort();
+		const tool = createFakeTool("echo", async () => ({ content: "should not run" }));
+
+		const result = await executeToolCall(
+			{ id: "tc1", name: "echo", input: {} },
+			tool,
+			ac.signal,
+			noopEmit,
+			"conv-1",
+			"turn-1",
+		);
+
+		expect(result).toEqual(ABORTED_RESULT);
+	});
+
+	it("returns Aborted when a hanging tool is raced against an abort signal", async () => {
+		const ac = new AbortController();
+		// A tool that never resolves and ignores ctx.signal
+		const tool = createFakeTool("hang", () => new Promise<ToolResult>(() => {}));
+
+		const promise = executeToolCall(
+			{ id: "tc1", name: "hang", input: {} },
+			tool,
+			ac.signal,
+			noopEmit,
+			"conv-1",
+			"turn-1",
+		);
+
+		// Abort after the tool has started
+		await delay(10);
+		ac.abort();
+
+		const result = await promise;
+		expect(result).toEqual(ABORTED_RESULT);
+	});
+
+	it("returns the tool's own result when a signal-aware tool resolves on abort", async () => {
+		const ac = new AbortController();
+		const toolResult: ToolResult = { content: "aborted by tool", isError: true };
+		const tool = createFakeTool("aware", (_input, ctx) => {
+			return new Promise<ToolResult>((resolve) => {
+				ctx.signal.addEventListener("abort", () => resolve(toolResult), { once: true });
+			});
+		});
+
+		const promise = executeToolCall(
+			{ id: "tc1", name: "aware", input: {} },
+			tool,
+			ac.signal,
+			noopEmit,
+			"conv-1",
+			"turn-1",
+		);
+
+		await delay(10);
+		ac.abort();
+
+		const result = await promise;
+		// The tool listens to the signal and resolves its own result. Whether
+		// the tool's result or the race's "Aborted" wins is timing-dependent;
+		// both are isError and let the turn seal with finishReason "aborted".
+		expect(result.isError).toBe(true);
+		expect(result.content).toBe("aborted by tool");
+	});
+
+	it("swallows a late rejection from the orphaned tool promise after abort wins the race", async () => {
+		const ac = new AbortController();
+		let rejectTool: ((err: Error) => void) | undefined;
+		const tool = createFakeTool("late-reject", () => {
+			return new Promise<ToolResult>((_resolve, reject) => {
+				rejectTool = reject;
+			});
+		});
+
+		const promise = executeToolCall(
+			{ id: "tc1", name: "late-reject", input: {} },
+			tool,
+			ac.signal,
+			noopEmit,
+			"conv-1",
+			"turn-1",
+		);
+
+		await delay(10);
+		ac.abort();
+
+		const result = await promise;
+		expect(result).toEqual(ABORTED_RESULT);
+
+		// The tool rejects AFTER the race already resolved with "Aborted".
+		// The no-op catch must swallow this — no unhandled rejection.
+		rejectTool?.(new Error("late boom"));
+		// Give the microtask queue a tick to flush
+		await delay(5);
+		// If we reach here without an unhandledRejection crashing the process,
+		// the test passes. (vitest surfaces unhandled rejections as failures.)
+	});
+
+	it("returns an error result when the tool rejects before abort", async () => {
+		const ac = new AbortController();
+		const tool = createFakeTool("boom", async () => {
+			throw new Error("tool exploded");
+		});
+
+		const result = await executeToolCall(
+			{ id: "tc1", name: "boom", input: {} },
+			tool,
+			ac.signal,
+			noopEmit,
+			"conv-1",
+			"turn-1",
+		);
+
+		expect(result.isError).toBe(true);
+		expect(result.content).toContain("tool exploded");
+	});
+
+	it("returns Unknown tool when the tool is undefined", async () => {
+		const ac = new AbortController();
+		const result = await executeToolCall(
+			{ id: "tc1", name: "nonexistent", input: {} },
+			undefined,
+			ac.signal,
+			noopEmit,
+			"conv-1",
+			"turn-1",
+		);
+
+		expect(result.isError).toBe(true);
+		expect(result.content).toContain("Unknown tool");
+	});
+});
+
+// ===========================================================================
+// runTurn — integration tests for the abort-signal race (durability)
+// ===========================================================================
+
+describe("runTurn abort-race durability", () => {
+	// Required test 1: A hanging tool (never resolves, ignores ctx.signal)
+	// must not keep runTurn from returning when the signal aborts.
+	it("hanging tool + abort → runTurn returns with finishReason aborted and emits done", async () => {
+		const ac = new AbortController();
+
+		// A tool whose execute returns a promise that NEVER resolves and
+		// ignores ctx.signal entirely.
+		const tool = createFakeTool("hang", () => new Promise<ToolResult>(() => {}));
+
+		// Use eager: true so the tool starts BEFORE the signal aborts.
+		// This exercises the race (not the early signal.aborted return).
+		const provider: ProviderContract = {
+			id: "fake",
+			stream() {
+				return (async function* () {
+					yield {
+						type: "tool-call",
+						toolCallId: "tc1",
+						toolName: "hang",
+						input: {},
+					} as ProviderEvent;
+					ac.abort();
+					await delay(10);
+					yield { type: "finish", reason: "tool-calls" } as ProviderEvent;
+				})();
+			},
+		};
+
+		const { events, emit } = createCollectingEmit();
+
+		const result = await runTurn({
+			provider,
+			messages: [userMessage],
+			tools: [tool],
+			dispatch: { maxConcurrent: 1, eager: true },
+			conversationId: "conv-1",
+			turnId: "turn-1",
+			emit,
+			signal: ac.signal,
+		});
+
+		// runTurn returned (didn't hang) → the race worked.
+		expect(result.finishReason).toBe("aborted");
+
+		// A done event was emitted with reason "aborted".
+		const doneEvents = events.filter((e) => e.type === "done");
+		expect(doneEvents).toHaveLength(1);
+		if (doneEvents[0]?.type === "done") {
+			expect(doneEvents[0].reason).toBe("aborted");
+		}
+	});
+
+	// Required test 2: A signal-aware tool that resolves its own result on
+	// abort must also let runTurn return with finishReason "aborted".
+	it("signal-aware tool + abort → runTurn returns with finishReason aborted", async () => {
+		const ac = new AbortController();
+
+		const tool = createFakeTool("aware", (_input, ctx) => {
+			return new Promise<ToolResult>((resolve) => {
+				ctx.signal.addEventListener(
+					"abort",
+					() => resolve({ content: "aborted by tool", isError: true }),
+					{ once: true },
+				);
+			});
+		});
+
+		const provider: ProviderContract = {
+			id: "fake",
+			stream() {
+				return (async function* () {
+					yield {
+						type: "tool-call",
+						toolCallId: "tc1",
+						toolName: "aware",
+						input: {},
+					} as ProviderEvent;
+					ac.abort();
+					await delay(10);
+					yield { type: "finish", reason: "tool-calls" } as ProviderEvent;
+				})();
+			},
+		};
+
+		const { events, emit } = createCollectingEmit();
+
+		const result = await runTurn({
+			provider,
+			messages: [userMessage],
+			tools: [tool],
+			dispatch: { maxConcurrent: 1, eager: true },
+			conversationId: "conv-1",
+			turnId: "turn-1",
+			emit,
+			signal: ac.signal,
+		});
+
+		expect(result.finishReason).toBe("aborted");
+
+		const doneEvents = events.filter((e) => e.type === "done");
+		expect(doneEvents).toHaveLength(1);
+		if (doneEvents[0]?.type === "done") {
+			expect(doneEvents[0].reason).toBe("aborted");
+		}
+
+		// When the step is aborted, tool-result MESSAGES are omitted from the
+		// result (the tool-result EVENT is still emitted by executeStep for
+		// live UI updates, but the message is not persisted). This prevents
+		// orphaned `tool` messages from breaking the next turn's provider
+		// request. The assistant message has its tool-call chunks stripped.
+		const toolResultMsg = result.messages.find((m) => m.role === "tool");
+		expect(toolResultMsg).toBeUndefined();
+
+		// The assistant message should NOT contain tool-call chunks.
+		const assistantMsg = result.messages.find(
+			(m) => m.role === "assistant" && m.chunks.some((c) => c.type === "tool-call"),
+		);
+		expect(assistantMsg).toBeUndefined();
+	});
+
+	// Required test 3 (regression guard): Without abort, a normal tool runs
+	// and its result is used; finishReason reflects the model.
+	it("no abort → tool runs normally and its result is used (regression)", async () => {
+		const tool = createFakeTool("normal", async (input) => ({
+			content: `result: ${JSON.stringify(input)}`,
+		}));
+
+		const provider = createFakeProvider([
+			[
+				{ type: "tool-call", toolCallId: "tc1", toolName: "normal", input: { x: 1 } },
+				{ type: "finish", reason: "tool-calls" },
+			],
+			[
+				{ type: "text-delta", delta: "done" },
+				{ type: "finish", reason: "stop" },
+			],
+		]);
+
+		const { events, emit } = createCollectingEmit();
+
+		const result = await runTurn({
+			provider,
+			messages: [userMessage],
+			tools: [tool],
+			dispatch: { maxConcurrent: 1, eager: true },
+			conversationId: "conv-1",
+			turnId: "turn-1",
+			emit,
+		});
+
+		// finishReason reflects the model (second step's "stop").
+		expect(result.finishReason).toBe("stop");
+
+		// The tool's result was used (fed back, not "Aborted").
+		const toolResultMsg = result.messages.find((m) => m.role === "tool");
+		expect(toolResultMsg).toBeDefined();
+		const trChunk = toolResultMsg?.chunks[0];
+		expect(trChunk?.type).toBe("tool-result");
+		if (trChunk?.type === "tool-result") {
+			expect(trChunk.content).toBe('result: {"x":1}');
+			expect(trChunk.isError).toBe(false);
+		}
+
+		// done event emitted with reason "stop".
+		const doneEvents = events.filter((e) => e.type === "done");
+		expect(doneEvents).toHaveLength(1);
+		if (doneEvents[0]?.type === "done") {
+			expect(doneEvents[0].reason).toBe("stop");
+		}
+	});
+
+	// Bonus: multiple hanging tools + abort → all resolve via the race,
+	// drain() doesn't deadlock, and runTurn returns. Tool-result messages
+	// are omitted from the result (aborted step); the turn seals cleanly.
+	it("multiple hanging tools + abort → drain completes and runTurn returns", async () => {
+		const ac = new AbortController();
+
+		// Two tools that never resolve and ignore ctx.signal.
+		const toolA = createFakeTool("hangA", () => new Promise<ToolResult>(() => {}));
+		const toolB = createFakeTool("hangB", () => new Promise<ToolResult>(() => {}));
+
+		const provider: ProviderContract = {
+			id: "fake",
+			stream() {
+				return (async function* () {
+					yield {
+						type: "tool-call",
+						toolCallId: "tc1",
+						toolName: "hangA",
+						input: {},
+					} as ProviderEvent;
+					yield {
+						type: "tool-call",
+						toolCallId: "tc2",
+						toolName: "hangB",
+						input: {},
+					} as ProviderEvent;
+					ac.abort();
+					await delay(10);
+					yield { type: "finish", reason: "tool-calls" } as ProviderEvent;
+				})();
+			},
+		};
+
+		const { events, emit } = createCollectingEmit();
+
+		const result = await runTurn({
+			provider,
+			messages: [userMessage],
+			tools: [toolA, toolB],
+			dispatch: { maxConcurrent: 2, eager: true },
+			conversationId: "conv-1",
+			turnId: "turn-1",
+			emit,
+			signal: ac.signal,
+		});
+
+		expect(result.finishReason).toBe("aborted");
+
+		// tool-result EVENTS are still emitted by executeStep (for live UI),
+		// but tool-result MESSAGES are omitted from the result (not persisted).
+		const toolResultEvents = events.filter((e) => e.type === "tool-result");
+		expect(toolResultEvents).toHaveLength(2);
+		for (const tr of toolResultEvents) {
+			if (tr.type === "tool-result") {
+				expect(tr.isError).toBe(true);
+			}
+		}
+
+		// No tool messages in the result (they would orphan on the next turn).
+		const toolMessages = result.messages.filter((m) => m.role === "tool");
+		expect(toolMessages).toHaveLength(0);
+
+		// Assistant message has no tool-call chunks.
+		const assistantMsgs = result.messages.filter((m) => m.role === "assistant");
+		for (const msg of assistantMsgs) {
+			expect(msg.chunks.some((c) => c.type === "tool-call")).toBe(false);
+		}
+
+		const doneEvents = events.filter((e) => e.type === "done");
+		expect(doneEvents).toHaveLength(1);
+		if (doneEvents[0]?.type === "done") {
+			expect(doneEvents[0].reason).toBe("aborted");
+		}
+	});
+
+	// Critical regression: after an aborted tool call, the result messages
+	// must NOT contain orphaned tool messages. If they did, the next turn
+	// would send a `tool` role message to the provider without a preceding
+	// `assistant` message carrying `tool_calls` → 400 error.
+	it("aborted step produces no tool messages and no tool-call chunks in result", async () => {
+		const ac = new AbortController();
+
+		// Tool that hangs forever
+		const tool = createFakeTool("hang", () => new Promise<ToolResult>(() => {}));
+
+		const provider: ProviderContract = {
+			id: "fake",
+			stream() {
+				return (async function* () {
+					yield { type: "text-delta", delta: "Let me run that for you" } as ProviderEvent;
+					yield {
+						type: "tool-call",
+						toolCallId: "tc1",
+						toolName: "hang",
+						input: {},
+					} as ProviderEvent;
+					ac.abort();
+					await delay(10);
+					yield { type: "finish", reason: "tool-calls" } as ProviderEvent;
+				})();
+			},
+		};
+
+		const result = await runTurn({
+			provider,
+			messages: [userMessage],
+			tools: [tool],
+			dispatch: { maxConcurrent: 1, eager: true },
+			conversationId: "conv-1",
+			turnId: "turn-1",
+			emit: noopEmit,
+			signal: ac.signal,
+		});
+
+		expect(result.finishReason).toBe("aborted");
+
+		// No tool messages in the result
+		const toolMessages = result.messages.filter((m) => m.role === "tool");
+		expect(toolMessages).toHaveLength(0);
+
+		// The assistant message should preserve text but NOT tool-call chunks
+		const assistantMsg = result.messages.find((m) => m.role === "assistant");
+		expect(assistantMsg).toBeDefined();
+		if (assistantMsg !== undefined) {
+			const hasToolCall = assistantMsg.chunks.some((c) => c.type === "tool-call");
+			expect(hasToolCall).toBe(false);
+			// Text content should be preserved
+			const hasText = assistantMsg.chunks.some((c) => c.type === "text");
+			expect(hasText).toBe(true);
+		}
+
+		// Simulate what the next turn would see: the result messages are the
+		// conversation history (minus the user message). If we feed these to
+		// a simple converter, there should be NO `tool` role messages.
+		const toolRoleCount = result.messages.filter((m) => m.role === "tool").length;
+		expect(toolRoleCount).toBe(0);
+	});
+});
diff --git a/packages/kernel/src/runtime/dispatch.ts b/packages/kernel/src/runtime/dispatch.ts
index d1c46cb..e0be1b4 100644
--- a/packages/kernel/src/runtime/dispatch.ts
+++ b/packages/kernel/src/runtime/dispatch.ts
@@ -35,8 +35,24 @@ export async function executeToolCall(
 		conversationId,
 		...(cwd !== undefined ? { cwd } : {}),
 	};
+	// Race the tool's execute promise against the abort signal so a tool
+	// that hangs (ignores ctx.signal, or blocks on something the signal
+	// can't interrupt) can't keep runTurn from returning. When the signal
+	// fires we RESOLVE (not reject) with an "Aborted" result so the step
+	// completes normally and the existing signal.aborted → finishReason =
+	// "aborted" path seals the turn cleanly (done event), letting the
+	// caller's finally clear active state and the FE clear its spinner.
 	try {
-		return await tool.execute(call.input, ctx);
+		const toolPromise = tool.execute(call.input, ctx);
+		const abortPromise = new Promise<ToolResult>((resolve) => {
+			signal.addEventListener("abort", () => resolve({ content: "Aborted", isError: true }), {
+				once: true,
+			});
+		});
+		// Swallow late rejections from the orphaned tool promise: the tool
+		// may reject after the race already resolved with "Aborted".
+		void toolPromise.catch(() => {});
+		return await Promise.race([toolPromise, abortPromise]);
 	} catch (err) {
 		const message = err instanceof Error ? err.message : String(err);
 		return { content: `Tool execution error: ${message}`, isError: true };
diff --git a/packages/kernel/src/runtime/run-turn.ts b/packages/kernel/src/runtime/run-turn.ts
index 4a12e28..f5d80d3 100644
--- a/packages/kernel/src/runtime/run-turn.ts
+++ b/packages/kernel/src/runtime/run-turn.ts
@@ -89,6 +89,19 @@ function appendThinkingDelta(chunks: Chunk[], delta: string): void {
 	}
 }
 
+/**
+ * Remove tool-call chunks from an assistant message, returning a new message
+ * with only the non-tool-call chunks (text, thinking, error). Returns
+ * `undefined` when all chunks were tool-calls (so the caller can omit the
+ * message entirely). Used when a step is aborted to avoid persisting
+ * incomplete tool calls whose placeholder "Aborted" results would create
+ * orphaned `tool` messages in the next turn's history.
+ */
+function stripToolCallChunks(msg: ChatMessage): ChatMessage | undefined {
+	const stripped = msg.chunks.filter((c) => c.type !== "tool-call");
+	return stripped.length > 0 ? { role: msg.role, chunks: stripped } : undefined;
+}
+
 interface StepContext {
 	readonly provider: ProviderContract;
 	readonly messages: ChatMessage[];
@@ -516,12 +529,36 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> {
 			totalUsage = addUsage(totalUsage, stepResult.usage);
 			lastStepUsage = stepResult.usage;
 
-			if (stepResult.assistantMessage !== undefined) {
-				messages.push(stepResult.assistantMessage);
-				resultMessages.push(stepResult.assistantMessage);
+			// When the signal is aborted mid-step, the tool results are
+			// placeholders ({ content: "Aborted", isError: true }). If these
+			// are persisted and included in the next turn's message history,
+			// the provider sees a `tool` role message without a preceding
+			// `assistant` message carrying `tool_calls` → 400 error.
+			//
+			// To prevent this, when the signal is aborted we:
+			//  1. Strip tool-call chunks from the assistant message (keep
+			//     text/thinking/error chunks so the partial response is
+			//     preserved).
+			//  2. Omit tool-result messages entirely (they are not persisted,
+			//     not added to resultMessages, and not passed to onStepComplete).
+			//
+			// This keeps the conversation history clean: the assistant's
+			// partial text is preserved, but no incomplete tool calls are
+			// left dangling. The `done` event still carries
+			// `reason: "aborted"`, so the turn seals cleanly.
+			const stepAborted = signal.aborted;
+			const assistantMessage =
+				stepAborted && stepResult.assistantMessage !== undefined
+					? stripToolCallChunks(stepResult.assistantMessage)
+					: stepResult.assistantMessage;
+			const toolMessages = stepAborted ? [] : stepResult.toolMessages;
+
+			if (assistantMessage !== undefined) {
+				messages.push(assistantMessage);
+				resultMessages.push(assistantMessage);
 			}
 
-			for (const msg of stepResult.toolMessages) {
+			for (const msg of toolMessages) {
 				messages.push(msg);
 				resultMessages.push(msg);
 			}
@@ -532,10 +569,10 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> {
 			// SAME objects in resultMessages — the caller must NOT double-persist.
 			if (input.onStepComplete !== undefined) {
 				const stepMessages: ChatMessage[] = [];
-				if (stepResult.assistantMessage !== undefined) {
-					stepMessages.push(stepResult.assistantMessage);
+				if (assistantMessage !== undefined) {
+					stepMessages.push(assistantMessage);
 				}
-				for (const msg of stepResult.toolMessages) {
+				for (const msg of toolMessages) {
 					stepMessages.push(msg);
 				}
 				if (stepMessages.length > 0) {
@@ -543,7 +580,7 @@ export async function runTurn(input: RunTurnInput): Promise<RunTurnResult> {
 				}
 			}
 
-			if (signal.aborted) {
+			if (stepAborted) {
 				finishReason = "aborted";
 				break;
 			}
diff --git a/packages/tool-shell/src/shell.test.ts b/packages/tool-shell/src/shell.test.ts
index a70693b..07e0af4 100644
--- a/packages/tool-shell/src/shell.test.ts
+++ b/packages/tool-shell/src/shell.test.ts
@@ -22,8 +22,12 @@ function stubCtx(overrides?: Partial<ToolExecuteContext>): ToolExecuteContext {
 	};
 }
 
-function fakeSpawn(result: { exitCode: number | null; timedOut: boolean }): SpawnShell {
-	return async () => result;
+function fakeSpawn(result: {
+	exitCode: number | null;
+	timedOut: boolean;
+	aborted?: boolean;
+}): SpawnShell {
+	return async () => ({ aborted: false, ...result });
 }
 
 describe("validateArgs", () => {
@@ -201,7 +205,7 @@ describe("createRunShellTool", () => {
 			workdir: "/tmp",
 			spawn: async (_params) => {
 				_params.onOutput("hello\n", "stdout");
-				return { exitCode: 0, timedOut: false };
+				return { exitCode: 0, timedOut: false, aborted: false };
 			},
 		});
 		const result = await tool.execute({ command: "echo hello" }, stubCtx());
@@ -214,7 +218,7 @@ describe("createRunShellTool", () => {
 			workdir: "/tmp",
 			spawn: async (_params) => {
 				_params.onOutput("error output\n", "stderr");
-				return { exitCode: 1, timedOut: false };
+				return { exitCode: 1, timedOut: false, aborted: false };
 			},
 		});
 		const result = await tool.execute({ command: "false" }, stubCtx());
@@ -227,7 +231,7 @@ describe("createRunShellTool", () => {
 			workdir: "/tmp",
 			spawn: async (_params) => {
 				_params.onOutput("partial\n", "stdout");
-				return { exitCode: null, timedOut: true };
+				return { exitCode: null, timedOut: true, aborted: false };
 			},
 		});
 		const result = await tool.execute({ command: "sleep 999" }, stubCtx());
@@ -242,7 +246,7 @@ describe("createRunShellTool", () => {
 			outputCap: cap,
 			spawn: async (_params) => {
 				_params.onOutput("a".repeat(200), "stdout");
-				return { exitCode: 0, timedOut: false };
+				return { exitCode: 0, timedOut: false, aborted: false };
 			},
 		});
 		const result = await tool.execute({ command: "gen" }, stubCtx());
@@ -258,7 +262,7 @@ describe("createRunShellTool", () => {
 				params.onOutput("line1\n", "stdout");
 				params.onOutput("err1\n", "stderr");
 				params.onOutput("line2\n", "stdout");
-				return { exitCode: 0, timedOut: false };
+				return { exitCode: 0, timedOut: false, aborted: false };
 			},
 		});
 		await tool.execute(
@@ -280,7 +284,7 @@ describe("createRunShellTool", () => {
 			workdir: "/baked",
 			spawn: async (params) => {
 				receivedCwd = params.cwd;
-				return { exitCode: 0, timedOut: false };
+				return { exitCode: 0, timedOut: false, aborted: false };
 			},
 		});
 		await tool.execute({ command: "pwd" }, stubCtx({ cwd: "/custom" }));
@@ -293,7 +297,7 @@ describe("createRunShellTool", () => {
 			workdir: "/baked",
 			spawn: async (params) => {
 				receivedCwd = params.cwd;
-				return { exitCode: 0, timedOut: false };
+				return { exitCode: 0, timedOut: false, aborted: false };
 			},
 		});
 		await tool.execute({ command: "pwd" }, stubCtx());
@@ -317,7 +321,7 @@ describe("createRunShellTool", () => {
 		controller.abort();
 		const tool = createRunShellTool({
 			workdir: "/tmp",
-			spawn: async () => ({ exitCode: 0, timedOut: false }),
+			spawn: async () => ({ exitCode: 0, timedOut: false, aborted: false }),
 		});
 		const result = await tool.execute({ command: "test" }, stubCtx({ signal: controller.signal }));
 		expect(result.isError).toBe(true);
@@ -329,7 +333,7 @@ describe("createRunShellTool", () => {
 			workdir: "/tmp",
 			spawn: async (params) => {
 				receivedTimeout = params.timeout;
-				return { exitCode: 0, timedOut: false };
+				return { exitCode: 0, timedOut: false, aborted: false };
 			},
 		});
 		await tool.execute({ command: "test", timeout: 5000 }, stubCtx());
@@ -355,3 +359,133 @@ describe("createRunShellTool (integration)", () => {
 		expect(streamed).toContain("hello-from-shell");
 	});
 });
+
+describe("realSpawn — process-group kill on abort/timeout", () => {
+	it("aborts a command with a grandchild holding the pipes and resolves immediately", async () => {
+		const { realSpawn } = await import("./spawn.js");
+		const controller = new AbortController();
+
+		// "sleep 30 & wait" spawns a grandchild (sleep) that inherits the stdio
+		// pipes. Killing just the sh parent does NOT close the pipes → close never
+		// fires. With detached:true + process-group kill, the grandchild dies too.
+		const promise = realSpawn({
+			command: "sleep 30 & wait",
+			cwd: "/tmp",
+			signal: controller.signal,
+			timeout: 60_000,
+			onOutput: () => {},
+		});
+
+		// Give the shell time to actually spawn the grandchild.
+		await new Promise((r) => setTimeout(r, 500));
+
+		controller.abort();
+
+		// Must resolve promptly (not wait 30s for the grandchild's sleep).
+		const result = await promise;
+		expect(result.aborted).toBe(true);
+		expect(result.timedOut).toBe(false);
+
+		// Give the OS a moment to reap the killed processes.
+		await new Promise((r) => setTimeout(r, 200));
+
+		// The grandchild sleep process should be gone. Check via pgrep.
+		const { execSync } = await import("node:child_process");
+		let sleeping: string[];
+		try {
+			sleeping = execSync("pgrep -f 'sleep 30'", { encoding: "utf-8" }).trim().split("\n");
+		} catch {
+			// pgrep returns non-zero when no processes match → all gone.
+			sleeping = [];
+		}
+		expect(sleeping.length).toBe(0);
+	});
+
+	it("times out a command with a grandchild holding the pipes and resolves promptly", async () => {
+		const { realSpawn } = await import("./spawn.js");
+		const controller = new AbortController();
+
+		const promise = realSpawn({
+			command: "sleep 30 & wait",
+			cwd: "/tmp",
+			signal: controller.signal,
+			timeout: 500,
+			onOutput: () => {},
+		});
+
+		// Must resolve within a short window (not 30s).
+		const start = Date.now();
+		const result = await promise;
+		const elapsed = Date.now() - start;
+
+		expect(result.timedOut).toBe(true);
+		expect(result.aborted).toBe(false);
+		// Should resolve shortly after the 500ms timeout, well under 30s.
+		expect(elapsed).toBeLessThan(10_000);
+
+		// Grandchild should be dead.
+		await new Promise((r) => setTimeout(r, 200));
+		const { execSync } = await import("node:child_process");
+		let sleeping: string[];
+		try {
+			sleeping = execSync("pgrep -f 'sleep 30'", { encoding: "utf-8" }).trim().split("\n");
+		} catch {
+			sleeping = [];
+		}
+		expect(sleeping.length).toBe(0);
+	});
+
+	it("captures stdout on normal completion (regression guard)", async () => {
+		const { realSpawn } = await import("./spawn.js");
+		const controller = new AbortController();
+		let output = "";
+
+		const result = await realSpawn({
+			command: "echo hi",
+			cwd: "/tmp",
+			signal: controller.signal,
+			timeout: 5_000,
+			onOutput: (data) => {
+				output += data;
+			},
+		});
+
+		expect(result.aborted).toBe(false);
+		expect(result.timedOut).toBe(false);
+		expect(result.exitCode).toBe(0);
+		expect(output).toContain("hi");
+	});
+
+	it("aborts a simple single-process command and resolves with aborted: true", async () => {
+		const { realSpawn } = await import("./spawn.js");
+		const controller = new AbortController();
+
+		const promise = realSpawn({
+			command: "sleep 30",
+			cwd: "/tmp",
+			signal: controller.signal,
+			timeout: 60_000,
+			onOutput: () => {},
+		});
+
+		// Let the sleep actually start.
+		await new Promise((r) => setTimeout(r, 300));
+
+		controller.abort();
+
+		const result = await promise;
+		expect(result.aborted).toBe(true);
+		expect(result.timedOut).toBe(false);
+
+		// The sleep process should be gone.
+		await new Promise((r) => setTimeout(r, 200));
+		const { execSync } = await import("node:child_process");
+		let sleeping: string[];
+		try {
+			sleeping = execSync("pgrep -f 'sleep 30'", { encoding: "utf-8" }).trim().split("\n");
+		} catch {
+			sleeping = [];
+		}
+		expect(sleeping.length).toBe(0);
+	});
+});
diff --git a/packages/tool-shell/src/shell.ts b/packages/tool-shell/src/shell.ts
index d96d73e..cc76bca 100644
--- a/packages/tool-shell/src/shell.ts
+++ b/packages/tool-shell/src/shell.ts
@@ -12,6 +12,7 @@ export interface ValidatedArgs {
 export interface SpawnResult {
 	readonly exitCode: number | null;
 	readonly timedOut: boolean;
+	readonly aborted: boolean;
 }
 
 export type SpawnShell = (params: {
@@ -139,7 +140,6 @@ export function createRunShellTool(deps: {
 			};
 
 			let spawnResult: SpawnResult;
-			let aborted = false;
 
 			try {
 				spawnResult = await deps.spawn({
@@ -154,7 +154,6 @@ export function createRunShellTool(deps: {
 				});
 			} catch (err: unknown) {
 				if (ctx.signal.aborted) {
-					aborted = true;
 					return buildResult({
 						exitCode: null,
 						timedOut: false,
@@ -172,7 +171,7 @@ export function createRunShellTool(deps: {
 			return buildResult({
 				exitCode: spawnResult.exitCode,
 				timedOut: spawnResult.timedOut,
-				aborted,
+				aborted: spawnResult.aborted,
 				output,
 				cap,
 			});
diff --git a/packages/tool-shell/src/spawn.ts b/packages/tool-shell/src/spawn.ts
index 9025c26..9b1d7e4 100644
--- a/packages/tool-shell/src/spawn.ts
+++ b/packages/tool-shell/src/spawn.ts
@@ -3,24 +3,66 @@ import type { SpawnResult, SpawnShell } from "./shell.js";
 
 export const realSpawn: SpawnShell = (params): Promise<SpawnResult> => {
 	return new Promise<SpawnResult>((resolve) => {
+		// detached: true puts the child in its own process group (pgid = child.pid).
+		// This lets us kill the entire group (child + any grandchildren that inherit
+		// the pipes) via process.kill(-pgid, "SIGKILL") on abort/timeout, so a
+		// backgrounded grandchild can't keep the stdio pipes open and stall the
+		// promise on child.on("close").
 		const child = nodeSpawn("sh", ["-c", params.command], {
 			cwd: params.cwd,
 			stdio: ["ignore", "pipe", "pipe"],
+			detached: true,
 		});
 
+		let settled = false;
 		let timedOut = false;
-		let killed = false;
-		const timer = setTimeout(() => {
-			timedOut = true;
-			child.kill("SIGKILL");
-		}, params.timeout);
+		let timer: ReturnType<typeof setTimeout> | undefined;
+
+		/** Kill the entire child process group (best-effort — group may be gone). */
+		const killGroup = () => {
+			if (child.pid !== undefined) {
+				try {
+					process.kill(-child.pid, "SIGKILL");
+				} catch {
+					// Process group may already be gone — ignore.
+				}
+			}
+		};
+
+		/** Remove the abort listener and clear the timeout timer (no leaks). */
+		const cleanup = () => {
+			if (timer !== undefined) {
+				clearTimeout(timer);
+				timer = undefined;
+			}
+			params.signal.removeEventListener("abort", onAbort);
+		};
+
+		/** Resolve once, then clean up so listeners/timers never leak. */
+		const settle = (result: SpawnResult) => {
+			if (settled) return;
+			settled = true;
+			cleanup();
+			resolve(result);
+		};
 
 		const onAbort = () => {
-			killed = true;
-			child.kill("SIGKILL");
+			if (settled) return;
+			killGroup();
+			// Resolve immediately — do NOT wait for child.on("close"), which may
+			// never fire if a grandchild holds the pipes open.
+			settle({ exitCode: null, timedOut: false, aborted: true });
 		};
 		params.signal.addEventListener("abort", onAbort, { once: true });
 
+		timer = setTimeout(() => {
+			if (settled) return;
+			timedOut = true;
+			killGroup();
+			// Resolve immediately — same reasoning as abort.
+			settle({ exitCode: null, timedOut: true, aborted: false });
+		}, params.timeout);
+
 		child.stdout.on("data", (chunk: Buffer) => {
 			params.onOutput(chunk.toString(), "stdout");
 		});
@@ -29,18 +71,17 @@ export const realSpawn: SpawnShell = (params): Promise<SpawnResult> => {
 			params.onOutput(chunk.toString(), "stderr");
 		});
 
+		// Normal-completion path: wait for "close" so all stdout/stderr is captured.
+		// If abort/timeout already settled, this is a no-op (settled === true).
 		child.on("close", (code) => {
-			clearTimeout(timer);
-			params.signal.removeEventListener("abort", onAbort);
-			resolve({ exitCode: code, timedOut });
+			settle({ exitCode: code, timedOut, aborted: false });
 		});
 
+		// Spawn error (e.g. bad cwd, sh not found). Kill the group just in case
+		// and resolve — never leave the promise pending.
 		child.on("error", () => {
-			clearTimeout(timer);
-			params.signal.removeEventListener("abort", onAbort);
-			if (!killed && !timedOut) {
-				resolve({ exitCode: 1, timedOut: false });
-			}
+			killGroup();
+			settle({ exitCode: 1, timedOut: false, aborted: false });
 		});
 	});
 };
author	Adam Malczewski <[email protected]>	2026-06-24 14:10:03 +0900
committer	Adam Malczewski <[email protected]>	2026-06-24 14:10:03 +0900
commit	dabcbc79831052effc6ce990021feee07d661f7e (patch)
tree	3e74e16f36d6a675abe676f0d04ca169f65f0a71
parent	b58fb8373a1f7311cead23aa9a4d1fcd6927634f (diff)
download	dispatch-dabcbc79831052effc6ce990021feee07d661f7e.tar.gz dispatch-dabcbc79831052effc6ce990021feee07d661f7e.zip