# Dispatch Restructure — Living Plan > **Status:** Planning only. No implementation has begun. > **Purpose:** Capture the target architecture, the engineering principles that > govern it, and the current-state map — so any agent or human picking this up > has the full picture in one place. This is a *living* document: update it as > decisions are made and pieces land. --- ## 0. The goal in one paragraph Restructure Dispatch so the **kernel is the absolute minimum** — just enough to run an agent turn and host extensions — and **every feature is an extension**. Extensions must be creatable and loadable *from outside this project* (custom / third-party extensions), with identical contracts to the bundled ones. For now we are planning the **backend only**; the frontend will be reworked separately and modularly later, so **no design decision here should be driven by the current frontend**. --- ## 1. Engineering principles (the standard for this project) These are adopted because each solves a **specific, named problem in this codebase** — not because they are popular. Each carries its stopping point so we don't over-apply it. ### P1 — Feature-as-a-library Every feature is independently importable with a clean, documented, minimal API. The acceptance test: *can you import just this feature and use it standalone, without dragging in the whole app?* - **Evidence:** `agent-manager.ts` is ~2,453 lines where no single behavior (queueing, tool-assembly, fallback) can be extracted or reasoned about in isolation. By contrast `chunks/transform.ts` is deliberately DB-free so the backend *and* frontend share the same pure logic — feature-as-a-library done right, already in the repo. - **Stopping point:** Do **not** over-split into dozens of micro npm packages with version-skew and `package.json` ceremony. Internal import-cleanliness first; a separately *publishable* package only when there's a genuine outside consumer. ### P2 — Functional core / imperative shell Pure *decision* logic ("given this state + event, what should happen?") as pure functions; the actual I/O (shell, fs, LLM, SQLite) lives in thin adapters **injected** at the edges. - **Evidence:** `wake-scheduler.ts` already does this and says so: "Pure helpers… side-effect-free so the logic can be unit-tested without spinning up Hono or touching SQLite." The giant `vi.mock("@dispatch/core")` blocks in `agent-manager.test.ts` exist *because* effects are reached for instead of passed in. - **The honest framing:** An agent system *is* side effects — running shell, writing files, calling the LLM are the product. The goal is **testability and predictability, not purity for its own sake.** - **Stopping point:** Where separating decision from effect makes a unit obviously testable, do it. Where it would only add ceremony (DI containers, effect-wrapper types) around an unavoidable `await spawn(cmd)`, don't. Purity is a means; if it stops paying for itself, drop it. ### P3 — No ambient / hidden state State is **owned and passed explicitly**, never reached for as a hidden global or stateful singleton. - **Evidence:** Wishlist bugs #16 ("agent tools leak across tabs") and #17 ("agent/model setting changes on tab switch") are *caused by* shared mutable singletons / frontend-held state. Explicit per-tab state ownership fixes them structurally. - **Stopping point:** Stateless classes-as-namespaces are fine. Stateful god-objects (today's managers) are the thing we're killing. The tool-set for a turn must be reproducible from `(agent profile + capabilities + active extensions)` — pure input → output. ### P4 — Don't adopt by reputation (meta-principle) Every pattern, library, or methodology — **including the "minimal kernel + extensions" architecture itself** — earns its place by solving a specific, named problem in *this* codebase, and we note where it stops paying off. "It's a known good practice" is a hypothesis to test, not a justification. ### P5 — The repo is a harness, not just code Meta-information that guides future agents is a **first-class deliverable**, maintained like code. Modeled as a *tiered cache* of context: small always-loaded files + larger on-demand files, so an agent gets the right info at the right moment without burning context. (Source: "The AI Harness" — see §7. Bounded to our scale in §7.4.) ### P6 — Document only the non-inferable Harness docs contain **tribal knowledge and scar tissue only** — never generic best-practice the model already knows. Test: *"Could a fresh frontier model figure this out by reading the code? If yes, leave it out."* (This is P4 applied to documentation — it self-limits harness bloat.) ### P7 — The harness is extension-scoped Every extension ships **its own** constitution snippet, safety rules, feature doc, glossary terms, and skills — portable with the code. This is P1 (feature-as-a-library) applied to documentation: import the extension, get its harness too. Better than a repo-global harness for a modular system. ### P8 — One canonical vocabulary A `GLOSSARY.md` with an **"aliases to avoid"** column governs naming. New code reuses existing terms; it never invents a synonym for an existing concept. - **Evidence:** This codebase overloads **tab / session / conversation** and **chunk / message / turn / step** — the chunk-log refactor notes exist precisely because those terms got tangled. - **Live application:** "core" now has a precise meaning (the extension tier in §2.6) — it must NOT be reused for the kernel. Kernel ≠ core. --- ## 2. Target architecture — minimal kernel + extensions ### 2.1 Layered picture ``` ┌───────────────────────────────────────────────────────────────┐ │ Clients (any frontend — reworked later, out of scope now) │ └───────────────────────────────────────────────────────────────┘ ▲ typed events / commands (via a transport extension) ┌───────────────────────────────────────────────────────────────┐ │ STANDARD extensions (the features people think of as Dispatch)│ │ tools (read_file, run_shell…) · agents · skills · lsp · │ │ compaction · notifications · scheduler · attachments · … │ └───────────────────────────────────────────────────────────────┘ ▲ depend on kernel + core (never upward) ┌───────────────────────────────────────────────────────────────┐ │ CORE extensions (minimum glue to run ONE turn end-to-end) │ │ transport · provider · auth · session-orchestrator │ └───────────────────────────────────────────────────────────────┘ ▲ register contributions ▲ receive Host API ┌───────────────────────────────────────────────────────────────┐ │ KERNEL (minimal; not an extension) │ │ │ │ Extension Host Agent Runtime Event/Hook Bus │ │ (discover/resolve/ (the turn loop, (typed pub/sub │ │ activate/registries) provider+tool + filters) │ │ agnostic) │ │ │ │ Kernel Services (exposed through Host API): │ │ • Capability/Permission gate • Config (merge + schema) │ │ • Storage + migration runner • Secret/credential vault │ │ • Conversation/chunk store • Logger │ │ │ │ Contracts (the stable ABI every extension compiles against) │ └───────────────────────────────────────────────────────────────┘ ``` ### 2.2 The Kernel — the "absolute minimum" Five things, nothing more: 1. **Contracts (the stable ABI).** The only types extensions depend on, versioned independently from implementations. Seeded from today's `types/index.ts`: - `ToolContract` (today's `ToolDefinition`: `{ name, description, parameters, execute(args, ctx) }`) — see §3.3 for the `ctx` requirements concurrency forces. - `ProviderContract` (model factory + streaming + catalog/capability entries) - `AuthContract` (credential sources / OAuth flows feeding the vault) - `Extension` + `Manifest` (id, version, apiVersion range, deps, activation, contributions, capabilities) - `HostAPI` (what an extension receives on activate — see §2.3) - `Hook`/event taxonomy (the lifecycle surface) - Conversation model (`ChatMessage`, `Chunk`, turn/step) 2. **Extension Host.** Discover → validate manifest → resolve dependency DAG → check apiVersion compat → run migrations → activate (topological) → register contributions → dispose on shutdown/reload. Owns the **registries** (tools, providers, hooks, routes/commands, services, settings, migrations, jobs). 3. **Agent Runtime (the turn loop).** The refactored heart of today's `agent.ts`: takes *a resolved provider + a tool set + messages + a dispatch policy*, streams, dispatches tool calls (see §3.3), dedups, truncates/spills, emits events. **Provider-agnostic and tool-agnostic** — knows only the contracts. Names no concrete tool or provider. 4. **Event / Hook Bus.** Typed pub/sub plus *filters*: - **Observers** react (notifications, persistence, usage accounting). - **Filters** transform in a chain (system-prompt assembly, message pre-send, tool-result transform, tool-set filtering). 5. **Kernel Services (via Host API).** The kernel exposes *interfaces* and pure logic here — **never concrete I/O backends** (those are `core` extensions; see §2.8). This keeps "kernel touches no I/O" (§2.7) literally true. - **Config loader** — merged loader (global → project) + per-extension settings schema/validation. **Must be in the kernel** (not an extension): it's needed at boot to *find and resolve* extensions — a chicken-and-egg the extension system itself can't solve. Seeded from today's `config/`. - **Logger** — always-on, available before any extension activates. - **Permission rule *evaluation*** — the pure `evaluate(rules, request) → decision` function (today's `permission/evaluate.ts`): rules in, decision out, no I/O. The *interactive prompting* (asking a human, today's `permission-manager.ts`) is a transport/UI concern owned by a `core`/ `standard` extension, not the kernel. - **Storage interface + migration runner** — the kernel defines the storage *contract* (namespaced KV/SQL + per-extension migration registration) and exposes `host.storage(ns)`, but the **concrete backend (SQLite) is a `core` extension** (`storage-sqlite`), swappable for an in-memory store in tests (serves P2 directly). Bootstrap ordering: the storage backend activates first (no deps) so later extensions can run their migrations. - **Secret/credential vault interface** — `host.secrets` (capability-gated); the concrete store and the *auth flows* that fill it are extensions. - **Conversation/chunk store** — NOTE: the kernel owns only the **conversation model TYPES** (`Chunk`/`ChatMessage` in contracts) and the pure explode/group transforms (today's DB-free `chunks/transform.ts`). The **persistent store itself is a `core` extension** built on `host.storage` — because persistence is I/O. The runtime reads/writes history *through the orchestrator*, which calls the store; the kernel's `runTurn` takes `messages` as a plain input and returns result messages (it never touches the DB). > **Deliberately NOT in the kernel:** any concrete tool, any provider, any > concrete persistence/secret backend, the persona/system-prompt text, the HTTP > server, interactive permission prompting, tab/queue orchestration, sub-agents, > skills, LSP, notifications, compaction, scheduling. ### 2.3 The extension model - **What it is:** a directory or npm package with a **manifest** + entry module exporting `activate(host)` and optional `deactivate()`. - **Manifest shape:** `id, name, version, apiVersion (semver range), dependsOn[], activation ("eager" | lazy event triggers), contributes {tools, providers, routes, commands, hooks, settings, migrations, scheduledJobs, services}, capabilities {fs, shell, network, secrets, db, spawn…}, settingsSchema`. - **Each extension's contract is two-sided (provides + expects):** what it *exposes* (its contributions/services) and what it *expects exposed to it* (its `dependsOn` services + `capabilities`). This two-sided contract is what the host uses to resolve load order and what makes an extension portable. - **Host API (what `activate(host)` receives):** - `host.defineTool/defineProvider/defineAuth(...)` - `host.defineRoute/defineCommand(...)` — for transports & UI actions - `host.on(hook, handler)` / `host.addFilter(hook, fn)` - `host.provideService(handle, impl)` / `host.getService(handle)` — typed DI via **typed service handles** (an exported symbol, NOT a raw string — so `lsp references` can compute a service's consumers; see §5) - `host.storage(namespace)` — scoped KV/SQL + migrations (interface; backed by the `storage-sqlite` core extension — see §2.8) - `host.config` / `host.settings` - `host.secrets` (capability-gated) - `host.permissions.check(request)` - `host.events.emit(...)` / `host.logger` - `host.scheduler.register(job)` - **Contribution points** (replacing today's wiring): | Point | Replaces today's | Examples | |---|---|---| | tools | per-turn assembly in `agent-manager` | read_file, run_shell, web_search | | providers | `llm/provider.ts`, `models/registry` | anthropic, opencode, google | | auth | `credentials/*` | claude OAuth, api-keys | | context filters | `buildSystemPrompt`, skills/agents injection | persona, skills, agent profiles | | hooks/observers | scattered wiring | notifications, usage accounting | | routes/commands | `api/routes/*` | `/chat`, `/tabs`, `/models` | | scheduled jobs | `wake-scheduler.ts` | cache-warm, wake probes | | migrations | `db/index.ts` table block | each extension owns its tables | | services | implicit singletons | LSP manager, model registry | - **Loading / lifecycle:** search paths (precedence high→low) = project `.dispatch/extensions` → global `~/.config/dispatch/extensions` → installed npm packages (naming convention) → bundled first-party. Resolve DAG → verify apiVersion → run migrations → activate topologically (lazy ones defer to their activation event) → ready. Hot-reload via watchers (config already does this); deactivate disposes everything the extension registered. ### 2.4 Extension catalog (current code → extensions, with tier) - **core tier (the minimum to complete one turn — see §2.8):** `storage-sqlite` (concrete backend behind `host.storage`), `conversation-store` (append-only turn/chunk persistence on top of `host.storage`; today's `db/chunks.ts` + `db/tabs.ts`), `transport` (accept message, stream events — HTTP/WS, or even stdio), `provider-×1` (one LLM provider), `auth-×1` (that provider's credentials), `session-orchestrator` (the turn-driver carved out of `agent-manager.ts`). - **standard tier — tools:** `tools-fs` (read_file, read_file_slice, write_file, list_files), `tool-shell` (run_shell + background store + shell-analyze), `tool-search` (search_code), `tool-web`, `tool-youtube`, `tool-todo`, `tool-key-usage`. - **standard tier — providers & auth beyond the minimum:** `provider-anthropic`, `provider-opencode`, `provider-google`, `provider-copilot`; `auth-claude` (OAuth), `auth-apikeys`, `models-catalog` (registry + capabilities). *(Note: the single provider/auth required to boot is "core"; additional ones are "standard". Which specific one is the core default is a §8 decision.)* - **standard tier — subsystems:** `lsp` (manager service + `lsp` tool + diagnostics-on-write filter), `agents` (sub/user-agent system + `summon`/ `retrieve`), `skills` (loader + context-filter), `session-features` (tabs, queue, deliverMessage, auto-wake budget, `send_to_tab`/`read_tab` — the parts beyond the minimal orchestrator), `compaction`, `notifications-ntfy`, `wake-scheduler`, `attachments` (multimodal validation/limits). > Result: **`agent-manager.ts` dissolves** into the kernel's turn loop + the > core `session-orchestrator` + standard-tier contributions. ### 2.5 Proposed package layout ``` packages/ kernel/ # the kernel ONLY (NOT named "core" — see P8 / §2.6) contracts/ # the KERNEL ABI ONLY (turn loop, HostAPI, hook/event # mechanism, conversation model) — versioned. # Per-extension contracts are NOT here — they live # co-located in each extension package (see §5). host/ # discovery/resolve/activate + registries runtime/ # the agent turn loop (incl. tool dispatch, §3.3) bus/ # events + filters services/ # config loader, logger, permission eval, storage IFACE + migration runner, secrets IFACE extensions/ core/ # core-tier: storage-sqlite, conversation-store, transport, # provider-×1, auth-×1, session-orchestrator standard/ # standard-tier: tools, agents, skills, lsp, compaction, … # each extension package owns its OWN contract # (what it exposes/requires + its hook & service # handles) co-located inside it — see §5 host-bin/ # thin bootstrapper: make kernel, point at ext dirs, activate sdk/ # helper toolkit + types for THIRD-PARTY ext authors frontend/ # reworked later ``` ### 2.6 Tiers: kernel → core → standard We classify extensions into tiers. **Tiers are labels over the dependency DAG, not a second enforcement mechanism** — the host resolves load order from each extension's declared deps, and the capability gate enforces access. Tiers describe *what ships in which distribution*. | Tier | Objective test | Distribution | |---|---|---| | **kernel** | the ABI + turn loop; *not* an extension | always | | **core** | required to complete one turn end-to-end | "minimal Dispatch" | | **standard** | ships on by default; defines Dispatch-as-known | "default Dispatch" | | *(external)* | not in this repo | community / custom | - **No "extras" tier yet.** Empty categories are over-planning. A fourth tier (bundled-but-off-by-default) earns existence only when a real feature is genuinely opt-in — not by demoting an existing feature to fill a slot. - **The one invariant that gives tiers teeth — no upward dependencies.** A `core` extension may depend on the kernel and other `core` extensions, never on `standard`. Checkable straight from manifests (a lint). This is what makes "the minimal distribution still boots" *true* rather than aspirational. - **Naming (P8):** "core" is the extension tier; the runtime primitive is the **kernel**. Never reuse "core" for the kernel. **Placement test in action — `read_file` is `standard`, not `core`.** Apply the test: remove `read_file` → the agent just replies with text; the turn still completes. So it fails the core test → it's `standard`. The surprise that validates the model: **tools are not the minimum.** A turn can happen with zero tools. `read_file` being *important* is why it ships on-by-default in `standard` — not why it's `core` (resisting "important ⇒ core" keeps `core` from regrowing into a god-object; P4). ### 2.7 Kernel vs core boundary + how a tool plugs in **Boundary rule (one sentence):** > **Kernel = the pure turn mechanism** (decides nothing, touches no I/O, names no > feature). **Core = the minimum glue** that wires real inputs into that > mechanism and handles the results — opinionated and effectful, which is exactly > why it can't live in the kernel. **Example — the `session-orchestrator` (core), carved out of `agent-manager.ts`:** ```ts host.on("message.received", async (msg) => { const conversation = await host.conversation.load(msg.tabId); // effect: read state const provider = host.providers.resolve(msg.model); // decision: pick LLM const tools = host.tools.resolveFor(msg.tabId); // decision: gather/filter const dispatch = resolveDispatchPolicy(msg); // decision: §3.3 toggle const result = await kernel.runTurn({ // ← call the kernel provider, messages: conversation.messages, tools, dispatch, emit: host.events.emit, }); await host.conversation.append(msg.tabId, result.messages); // effect: persist }); ``` Every line is a **decision** (which provider/tools/policy) or an **effect** (load/persist) — neither belongs in the kernel. **How a tool builds "on top of" the kernel (inversion of control).** The kernel never *finds* tools; it *receives* them. The dependency arrow points tool → contract → kernel, never the reverse: 1. A tool conforms to `ToolContract` (owned by the kernel) — importing only the contract, not the kernel internals or other tools. 2. It registers at activation: `host.defineTool(createReadFileTool(workdir))`. 3. The orchestrator gathers them: `host.tools.resolveFor(tabId)`. 4. They're handed into `runTurn`, which calls them blindly by shape (`byName.get(call.name).execute(...)`). The kernel never knows `read_file` exists. 0, 1, or 50 tools — the loop is identical. ### 2.8 The Minimum Viable Turn (what "core" must contain) Derived by tracing the **real** end-to-end path of a single message in today's code — `POST /chat` → `deliverMessage` → `processMessage` → `getOrCreateAgentForTab` (`new Agent`) → `for await (event of agent.run())` → `emit(event)` → `/ws` fan-out — and stripping everything not load-bearing. **Two readings of "send a message, get a response":** - **(A) Absolute minimum mechanism** — one stateless request→response; needs *no DB at all*. (Useful as the testing/embedded floor.) - **(B) Minimum useful chat** — real multi-turn, so turn 2 sees turn 1. Adds conversation persistence. **DECIDED: `core` targets (B).** "Minimal Dispatch" is a usable multi-turn chat. The single piece separating (B) from (A) is the **conversation store + storage backend** — drop those two and you have the stateless (A) floor (which is exactly the in-memory test configuration). **Stripped from the real path → all of these are `standard`, NOT core** (each confirmed removable without breaking a basic turn): key/model **fallback chain** (`buildFallbackSequence`, rate-limit retry), **tools** entirely (empty tool list → turn still completes as text), **interactive permission prompting** (only exercised *by* tools), **reasoningEffort / attachments / workingDirectory** overrides, **skills, agents/summon, lsp, notifications, compaction, queue / auto-wake, usage telemetry, prompt-cache warming**, and the system-prompt **TOOL_DESCRIPTIONS + task-management** assembly (minimal = a plain/empty system string). This concretely confirms §2.6's surprise: **tools, persona, and permissions are all riders — the turn loop needs none of them.** **KERNEL exposes (for the minimal turn):** | Thing | Why kernel | From today | |---|---|---| | Contracts (ABI): `ProviderContract`, `ToolContract`, `AuthContract`, `Extension`/`Manifest`, `HostAPI`, event taxonomy, conversation model (`Chunk`/`ChatMessage`) | shared types everything compiles against | `types/index.ts` | | Extension Host + registries | nothing runs without discover/resolve/activate | (new) | | `runTurn({ provider, messages, tools, dispatch, emit, signal })` | the pure turn loop (§3.3); takes `messages` as input, returns result messages, touches no DB | `agent.ts` | | Event bus | how the turn talks to the outside | `onEvent`/`emit` | | Config loader | needed at boot to find extensions (chicken-and-egg) | `config/` | | Logger | always-on, pre-extension | — | | Permission rule *evaluation* (pure) | rules in → decision out | `permission/evaluate.ts` | | `host.storage` / `host.secrets` *interfaces* | exposes the shape; backend injected | — | **CORE provides (the minimum extensions to complete one turn):** | Extension | Job on the minimal path | |---|---| | `storage-sqlite` | concrete backend behind `host.storage` (the (A)↔(B) piece; swap for in-memory in tests) | | `conversation-store` | append-only turn/chunk persistence on `host.storage` (so turn 2 sees turn 1) | | `transport` | accept the message; stream events back (HTTP/WS, or stdio) | | `provider-×1` | call an LLM and stream tokens | | `auth-×1` | supply that provider's credentials | | `session-orchestrator` | wire it together (below) | **The minimal turn, end to end (target):** ``` transport.receive(msg) → orchestrator: history = conversationStore.load(convId) // core (skip → (A) stateless) → orchestrator: provider = providers.resolve(model) // core ext + auth → kernel.runTurn({ provider, messages: [...history, msg], tools: [], dispatch, emit }) → emit(events) → transport.stream(events) // core ext → orchestrator: conversationStore.append(convId, result) // core ext ``` Note `tools: []` — a turn completes with zero tools (text reply). Every capability beyond this is a `standard` extension that contributes tools / filters / hooks. ### 2.9 Contract versioning (convention now, machinery deferred) **Reframe first (P4):** semver's machinery exists to coordinate **independent release timelines** (a producer ships v2; consumers upgrade whenever). That *temporal decoupling* is the problem it solves — and we mostly don't have it: - **Internal extensions** (bundled, in-repo): no decoupling. A contract change is found via `lsp references` (§5.3) and fixed atomically in one change set. **The type system IS the version check** — a breaking change is a compile error. - **External/custom extensions** (out-of-repo): decoupling is real — the compiler can't see their code. A declared version compatibility gate earns its place **only here.** *(And we don't support external extensions yet — see below.)* So versioning is **asymmetric**, like §3.6 / §3.7: *internal = the type system is the version; external = a declared version is the contract.* **Two different "versionings" — keep them separate:** - **Data/schema migration** (persisted-data evolution) — already decided (§2.2: each extension owns its migrations). NOT this section. - **Contract/API-surface versioning** — this section. Independent: a contract can change with no migration, and vice-versa. **DECISION — convention-only and dormant in 0.x.** Because everything is **developed in-house today** (no external extensions), we adopt the *vocabulary* of versioning, not the *bureaucracy*: - **Every package self-versions.** No enforced lockstep / single repo version: the kernel bumps when the ABI changes; an extension bumps when *its* contract changes. Independent versioning matches one-agent-per-unit (§5) — each owner manages its own. - **Semver *meaning* as disciplined changelog hygiene** (and the §5.3 fan-out signal), using the standard terms: - **major** — removing or modifying the contract surface (incl. a hook/service payload shape change). *Breaking.* This bump is the orchestrator's cue to fan out to **all** consumers (found via `lsp references`). - **minor** — adding to the contract surface. Existing consumers unaffected. - **patch** — internal change only; no surface/payload change. - **Right now the version is COMMUNICATION, not ENFORCEMENT.** With no external consumers, the type system + `lsp references` are the actual mechanism; the number is a changelog/fan-out signal for humans and agents — not load-bearing. - **Stay in `0.x`** (conventionally: "no stability promised") through the rewrite, while the ABI churns. `1.0.0` is reserved for "stable enough to invite external extensions" — and **that** decision is the trigger to build the deferred machinery below. We worry about it when we get there, not before. **Deliberately NOT built now (deferred until external extensions exist):** - A load-time **version-compat gate** (external manifest pins an `apiVersion` range; host disables+surfaces on mismatch per §3.7 fault containment). - A mechanical **`.d.ts`-surface-diff** in CI to flag breaking changes automatically (removes semver's human-judgment weakness). **Harness rule this generates (scoped to contract-defining agents only; written into agent files when those agents exist, not now):** "Follow semver on your contract: **major** = removed/renamed/retyped export or changed hook/service payload (and signals the orchestrator to fan out to all `lsp references` consumers); **minor** = additive; **patch** = no surface change. Internal consumers are caught by the compiler — the version is for the fan-out signal (and, later, external consumers)." *(The term "patch" is training-standard vocabulary, so it needs no glossary entry — P6.)* --- ### 2.10 Core-default provider/auth (the boot minimum + primary testbench) **Criterion (not "best provider" — leanest, most-testable core per §2.8/§3.6):** the one provider+auth that makes "minimal Dispatch" boot with the smallest auth surface and the lightest test setup. **DECISION: OpenAI-compatible provider + API-key auth is the core default** — `provider-openai-compat` + `auth-apikey`. This is *also* the primary testbench: **OpenCode Go (flash) IS this path.** - In today's code it is `createProvider`'s **default branch** (`createOpenAICompatible`, name `"opencode-zen"`) with the hardcoded defaults `model: "deepseek-v4-flash"`, `baseURL: "https://opencode.ai/zen/go/v1"`, and a plain **API key** — the simplest possible `AuthContract`. - **Why it's the right core default (grounded, P4):** 1. **Simplest auth = leanest core.** `apiKey` + `baseURL`, nothing else. Claude OAuth (token refresh, billing/beta headers, session id, account discovery) would bloat the *minimum* tier and contradict §2.8. 2. **Most generic contract shape.** OpenAI-compatible is a near-universal wire format (dozens of providers + local Ollama/LM Studio), so the core's one provider is really "the protocol most of the world implements." 3. **Already the literal default** in `createProvider` — core encodes a decision the codebase already made. 4. **Best for §3.6 testability.** API-key auth fakes trivially (a string + a base URL at a mock server); OAuth would force token-refresh mocking — the exact mock-sprawl we're fighting. - **Project fit (the deciding constraint):** the two available subscriptions are **Claude** and **OpenCode Go**. OpenCode Go has the most generous limits/API (especially the **flash** agents) → it is the **primary test bench**. The lean core default and the testbench are therefore the *same* path — no tension. **Tier placement that follows:** - **core:** `provider-openai-compat` + `auth-apikey` (boots minimal Dispatch; = OpenCode Go flash via `/zen/go/v1`). - **standard:** `provider-anthropic` + `auth-claude` (OAuth — your daily driver, rides on top), plus the **Anthropic-format OpenCode Go models** (MiniMax/Qwen via `isOpencodeGoAnthropicModel`, a different endpoint than flash), `provider-google`, `provider-copilot`, etc. - Mirrors every prior decision: the rich/preferred providers ride on top as standard extensions; core proves the architecture with the simplest path. **Naming (P8):** `provider-openai-compat`, `auth-apikey` — descriptive, training-adjacent; no glossary entry needed. --- ## 3. Runtime flow ### 3.1 Boot 1. Host process starts kernel with config + extension search paths. 2. Kernel opens DB, loads merged config, builds the capability gate. 3. Extension host discovers manifests → resolves DAG → checks apiVersion → runs migrations. 4. Activates extensions topologically; each registers tools / providers / hooks / routes / services / jobs. 5. `transport-http` listens; `session-orchestrator` subscribes to message intake; scheduler arms jobs. Ready. ### 3.2 A turn 1. Inbound message hits a `transport` route → emits `message.received`. 2. `session-orchestrator` resolves conversation, working dir, the **provider+model+key** (provider registry + auth vault), the agent profile, and the **tool-dispatch policy** (§3.3). 3. **Context-assembly filter chain** runs: persona + skills + agent profile contribute system prompt and a tool-name filter. 4. Tool set = tool registry filtered by the **capability gate** + agent whitelist. 5. **Agent runtime loop:** `provider.stream(messages, tools)` → dispatch tool calls per the policy (§3.3) → gate check → `tool.before` filter → execute (exec context: shell-output streaming, cancellation, queued-message injection) → `tool.after` filter → feed results back; repeat until done. 6. Events stream on the bus → transport pushes to clients; `notifications` reacts; conversation store appends chunks; usage recorded. 7. `turn.sealed` hook → `compaction` may trigger; scheduler may schedule cache-warm. ### 3.3 Kernel internals — tool dispatch (togglable: `maxConcurrent` + `eager`) **Mechanism.** The model streams tool calls *incrementally*: each `tool-call` event is fully formed (parsed `input`) **before** the step's `finish-step`. So the kernel can launch a call the moment it arrives. Tool calls batched in one step are **independent by construction** — the model sees no result until the next step — so running them concurrently/eagerly is *semantically safe*, not a reordering risk. **Today (for contrast):** `agent.ts` collects all `tool-call`s during the stream, then executes them **after** the loop, **sequentially** (`for … await execute`). That is `{ maxConcurrent: 1, eager: false }` — the safe baseline we keep available. **Two orthogonal axes — the toggle.** A single enum conflated two independent questions; we split them so every combination is coherent (no invalid states): - `maxConcurrent` (a number) = *how many tools run at once*: `0` → unlimited, `1` → sequential (a concurrency limit of 1 is exactly serial), `2+` → that cap. - `eager` (a boolean) = *when execution starts*: `true` → launch each call the instant its `tool-call` streams in (overlaps with the rest of generation); `false` → wait until the step's `finish-step`, then dispatch the batch. | `maxConcurrent` | `eager` | Meaning | |---|---|---| | 1 | false | One at a time, after the stream ends → **previous (pre-rework) behavior** | | 1 | true | **DEFAULT.** Start the first tool the instant it arrives (overlap with generation), but never run two tools at once — safe for any tool | | 2+ | false | Up to N in parallel, after the stream ends | | 2+ | true | Up to N in parallel, launched as they stream in | | 0 | false | All in parallel, after the stream ends | | 0 | true | All in parallel, launched as they stream in | **The policy is a KERNEL INPUT, never ambient (P3):** ```ts interface ToolDispatchPolicy { maxConcurrent: number; // 0 = unlimited, 1 = sequential, 2+ = cap eager: boolean; // true = launch on arrival; false = after finish-step } runTurn({ provider, messages, tools, dispatch /* : ToolDispatchPolicy */, emit }) ``` The kernel receives a *resolved* policy; it never reads config itself. **`eager` + a limit — exact semantics.** A streaming semaphore: launch on arrival until `maxConcurrent` is reached, then queue; as each tool finishes, the next (queued or newly-arrived) call starts. Well-defined for every combination above. **Resolution (who sets it)** — mirrors the existing `reasoningEffort` precedence: per-turn/tab override → agent definition → global config (`dispatch.toml`) → built-in default. The `session-orchestrator` (core) resolves this and hands the final value to the kernel. **Default — DECIDED: `{ maxConcurrent: 1, eager: true }`.** Never two tools at once (safe for any tool, incl. non-concurrency-safe ones), yet still overlaps the first tool's execution with the rest of generation — zero risk, free latency. Raising `maxConcurrent` (e.g. 4) is the opt-in throughput win; `0` (unlimited) is a deliberate, footgun-aware opt-in (see complication #2). **Contract requirements this forces (must be in `ToolContract`/`ctx` on day one — retrofitting later is painful):** - `ctx.onOutput(data, stream)` — streaming output the **kernel attributes by `toolCallId`**, so concurrent shell output doesn't interleave ambiguously (today's `shell-output` event carries no id — fine only because exec is sequential). - `ctx.signal` — cancellation, so an aborted turn doesn't leak in-flight tool work. - **`execute` must be safe to run concurrently** with other tools (no shared ambient state — this is just P3 paying off). **Optional refinement (note, don't build yet):** a tool may declare `concurrencySafe: false` in its contract; the kernel serializes *those* even when `maxConcurrent` allows parallelism — so one mutating tool doesn't force the whole batch sequential. This overrides the global setting **downward only** (never widens parallelism). **Complications checklist (carried from today's sequential code):** 1. **Shell-output attribution** → tag by `toolCallId` (above). 2. **Concurrency cap + dedup** → bound parallelism; populate the byte-identical- call dedup map in emission order (the "150 identical calls" incident — do not fire 150 effects at once). `maxConcurrent: 0` (unlimited) re-opens this footgun for *distinct* calls, so it must stay a deliberate opt-in, never the default. 3. **User-interrupt injection** → target the last call by **batch index**, not completion time (results return nondeterministically under concurrency). 4. **Abort / error cleanup** → await or cancel in-flight tools via `ctx.signal`; synthesize residual results for orphaned tool-call IDs (today's safety nets). 5. **Wasted effects on abort** → eager exec may complete a side-effecting tool (`run_shell`) *before* an abort; the effect already happened, result discarded. Accepted consciously for non-idempotent tools. **Scope boundary.** This is **within a step's batch only**. Next-step tools can't start early — they don't exist until the model sees this step's results. So "before the turn ends" = "across the multiple tool calls in one step," which is exactly the multi-tool-call case. ### 3.4 State, durability & crash recovery **The worry (context):** a chat must survive *any* interruption — random shutdown, token exhaustion, tool error — and the user just resumes with the same history, never facing a "wipe it clean and start over" broken state. **What today's code already gets right (keep this):** - `appendChunks` wraps a whole turn's rows in **one SQLite transaction** + WAL → **atomic**: a hard crash mid-write yields *all* those rows or *none*. No half rows, no DB corruption. This is the most important property and it already holds. - History is an **append-only chunk log** keyed by monotonic per-tab `seq`. Prior history is never mutated, so a crash can't corrupt what's already written. **The real danger window (what to fix):** the whole assistant turn is accumulated **in memory** (`chunks: Chunk[]`) and written **once at the end** (`flushAssistant` on seal). A mid-turn crash loses the *entire* assistant turn. Two latent issues compound it: 1. **Orphaned `running` status** — `status` is persisted to `tabs`; a crash leaves it `running` forever (no boot reconciliation resets stale `running → idle`). 2. **Orphaned tool-call IDs** — a crash between an assistant `tool_call` and its `tool_result` leaves a dangling call. Anthropic **rejects** such a history (`MissingToolResultsError`). Today's `synthesizeResidualToolResults` guards this *in memory* only — useless once the process is dead. **This is the exact "history the provider refuses to accept → start over" failure.** ``` user message ──► [persisted immediately ✓] │ ├─ assistant streams text/thinking/tool-calls ──► accumulates IN MEMORY ONLY │ (50 steps, tool runs, minutes…) │ ◄── CRASH HERE ──► entire assistant turn GONE; maybe a dangling tool_call │ └─ turn seals ──► flushAssistant() ──► [persisted ✓] ``` **The design — make broken state *unreachable*, not just recoverable.** Four rules, each tied to a real failure above: - **R1 — Persist incrementally, append-only (kill the in-memory window).** Write each step (not each delta) to the log as it completes, in its own transaction. A crash then loses at most the *last in-flight step*, not the whole turn. Granularity = per **step**, not per **delta** (a handful of writes per turn, not hundreds) — keeps IO modest. Make granularity configurable. - **R2 — Recovery is a pure function of the log (the keystone).** On load, run a pure **`reconcile(rows) → cleanHistory`** that deterministically repairs any partial turn: - `tool_call` with no matching `tool_result` → synthesize an error result ("interrupted by shutdown"). This is today's `synthesizeResidualToolResults` logic **moved to the READ path** so it runs on *every* load, not just live. - a turn with no terminal assistant content → mark interrupted; user simply sends the next message to continue. - **Functional-core (P2):** rows in → clean history out, no I/O, exhaustively unit-testable with crafted "crash-shaped" inputs. **Guarantee: whatever a crash leaves, `reconcile` always yields a provider-acceptable history.** "Broken state" becomes a state the rest of the system never observes — it's repaired at the boundary. - **R3 — Status is derived, never authoritative.** A persisted `running` flag is a lie waiting to happen. On boot, sweep all `running → interrupted`; AND treat live status as runtime-only (derive "is this tab live?" from "is there an in-process turn driving it?"). A crash can't leave a tab stuck running. - **R4 — Resume = load → reconcile → continue.** Because history is append-only and `reconcile` guarantees validity, resuming after *any* failure is identical and invisible to the user — no special "recovery mode". Token-exhaustion and tool-errors already end the turn cleanly and persist (the error becomes a chunk), so they are *already* resumable once R1 closes the crash window. **Where it lives (fits the architecture):** almost entirely in the `conversation-store` **core extension** (R1 incremental write, R2 reconcile-on-load) + a tiny **boot sweep** (R3). The **kernel stays pure** — `runTurn` still just takes `messages` and emits events; it knows nothing about crashes. `reconcile` is the canonical **functional-core** unit (P2) and the highest-value test target in the system (feed it every crash shape). **Cost / boundary (P4):** - R1 trades IO for safety (more, smaller transactions vs. one-per-turn — the current code chose one-fsync-per-turn for "constrained backends"). Per-step batching is the mitigation; granularity configurable. - **Out of scope here:** resuming a half-finished assistant message *mid-sentence* (wishlist #1 "resume mid-generation" — needs in-flight streaming state). The promise here is narrower and is what's actually wanted: **the history is never broken, and the user can always continue the conversation.** Mid-stream resumption can build on this foundation later. ### 3.5 The hook system (extensible without prediction) **The goal:** features react to actions in other features (e.g. *"user sent a message → reset the cache-warming timer"*). Hooks must be **part of the contracts** (typed, stable, exposed) *and* **easy to add later** without predicting features that may never exist. Those only conflict if hooks live in a central kernel registry — so they don't. **What today's code already does (the patterns to generalize):** - **Observer stream.** `NotificationDispatcher` depends not on `AgentManager` but on a minimal interface — `interface AgentEventSource { onEvent(listener): () => void }` — and wraps every handler so *"a transport bug can never propagate into the agent loop."* That's already a primitive hook contract (subscribe → react → unsubscribe, errors isolated). - **Semantic lifecycle calls (a hook in disguise).** Cache-warming exposes `onUserMessage(tabId)` (cancel timer) and `onTurnEnded(tabId)` (re-arm), *called explicitly* from `tabs.svelte.ts`. Hand-wired coupling we want to dissolve into subscriptions. **The keystone decision — decentralized hook catalog:** > The **kernel owns the hook *mechanism*** (`emit`, `on`, `applyFilters`). Each > **extension declares the hooks it emits** as part of its own contract. The hook > catalog is the *union* of all extensions' declarations — never a central list. The kernel never enumerates "the hooks that exist." This is what makes "add a hook as required" a **local, additive** change instead of a kernel edit. **The typed descriptor (the contract surface).** A hook is an exported, typed descriptor — not a loose string: ```ts // owned by the session-orchestrator (it performs message intake) export const MessageReceived = defineHook<{ tabId: string; text: string }>("session/message.received"); // owned by the KERNEL (it owns the turn loop) export const TurnSealed = defineHook<{ tabId: string; turnId: string }>("kernel/turn.sealed"); ``` Consumers get full type inference, no central enum to edit: ```ts // cache-warming extension (dependsOn session-orchestrator) host.on(MessageReceived, ({ tabId }) => cancelTimer(tabId)); // payload inferred host.on(TurnSealed, ({ tabId }) => armTimer(tabId)); ``` The descriptor **is** the contract: importing it gives the id + payload type. Adding a hook = exporting one more descriptor from its owner. **Two hook kinds (and one thing that is NOT a hook):** | Kind | Shape | Changes outcome? | Errors | Awaited by turn? | Example | |---|---|---|---|---|---| | **Event** | fire-and-forget, N listeners | No | **isolated per-handler — never breaks the turn** (today's rule) | No (optional bounded timeout) | `message.received`, `turn.sealed`, `tool.after` | | **Filter** | chain, value in → value out, ordered | Yes (in-band) | fail-open + log by default; owner may mark a chain fail-closed | Yes (in-band; a slow filter slows the turn, by design) | system-prompt assembly, tool-result transform | > **NOT a hook: request/response with exactly one responder** (e.g. "ask the > human for permission"). That's a **service** (`host.provideService` / > `getService`) — one responder, returns a value. Modeling it as a hook invites > "which of N handlers wins?" ambiguity. (Permission-prompting is the tempting > thing to mis-call a hook — it isn't one.) **The workflow you actually care about — "add a hook later":** 1. Find the **owner** (the extension that performs the action). 2. Export one descriptor from its contract: `defineHook("owner/the.action")`. 3. Emit at the action site: `host.emit(TheAction, payload)`. 4. The consumer `dependsOn` the owner and subscribes. **Kernel unchanged.** The kernel changes *only* when the action is a kernel-intrinsic turn-loop moment (e.g. a new `tool.before` phase) — and even then it's **+1 exported descriptor + 1 emit line**, never a structural change, because the mechanism is generic. **Decisions baked in now (all grounded, P4):** - **Namespacing (P8):** every hook id is `owner/name` (`kernel/turn.sealed`, `session/message.received`) — prevents third-party collisions. - **Event error isolation is a hard contract rule** (lifted from `NotificationDispatcher`): a thrown/rejected event handler is caught, logged, dropped — it can *never* fail the turn. - **Filter ordering is deterministic:** dependency-topological registration order, with an optional numeric `priority` escape hatch. - **Async semantics:** events are not awaited (fire-and-forget, optional bounded timeout); filters *are* awaited (in-band). **Deliberately NOT built yet (P4 / P6):** - No wildcard/pattern subscriptions (`turn.*`) until something needs them. - No hook-to-hook dependency graph — registration order + `priority` suffices. - **Don't hook every internal function.** A hook exists only where *cross- extension* reaction is a real need (mirrors P6 — expose only what's needed). Over-hooking turns the codebase into spaghetti-by-events. **The cache-warming example, fully mapped:** | Today (coupled) | Target (hooked) | |---|---| | `tabs.svelte.ts` calls `cacheWarming.onUserMessage(tabId)` | cache-warming does `host.on(MessageReceived, …)`; orchestrator emits it | | `tabs.svelte.ts` calls `cacheWarming.onTurnEnded(tabId)` | cache-warming does `host.on(TurnSealed, …)`; kernel emits it | | frontend hard-wires the dependency | cache-warming `dependsOn` session-orchestrator; zero call-site coupling | Both hooks it needs (`message.received`, `turn.sealed`) already have natural owners — **no prediction required**, which is the test that the model holds up. ### 3.6 Testability enforcement (design for tests, don't just write them) **The principle:** don't merely write tests for code — write code *specifically so it is testable*. Crucially, this is **not directly machine-enforceable**: a tool can catch the *symptoms* of untestable code, never the intent. So the strategy is two-pronged — **make the testable path the path of least resistance, then mechanically catch the worst regressions.** **Testability is an OUTPUT of principles we already adopted** — enforce the *causes*, not the slogan: - **P2 (inject effects)** → code becomes input→output → testable without mocks. - **P3 (no ambient state)** → nothing hidden to stub → testable in isolation. - **P1 (feature-as-a-library)** → small importable surface → testable standalone. **Evidence in today's code (the disease we enforce against):** `packages/api/tests/agent-manager.test.ts` is **2,142 lines** with a large `vi.mock("@dispatch/core")` block — which exists *solely because* `agent-manager.ts` reaches for its dependencies instead of receiving them. That is not a testing failure; it's a P2/P3 failure that *manifested* in the tests. **Mock count is a proxy metric for design quality** — that's the lever. (Today: ~14 test files use `vi.mock`; the kernel + each pure-core must reach **zero internal mocks**.) **The enforcement ladder (cheapest/strongest first):** - **Tier 1 — Structural (free, mechanical, highest leverage).** The package boundaries we're already building *are* testability enforcement. A feature's decision logic lives in a package with **zero effectful imports** (no `bun:sqlite`, `node:fs`, `node:child_process`) → it is *structurally impossible* to write untestable effectful code there; the imports don't exist. Proven by today's deliberately DB-free `chunks/transform.ts`. **Enforce via a dependency-direction lint** (Biome `noRestrictedImports` forbidding effect modules in pure files). The untestable version *doesn't typecheck* — this is the real answer to "how do we enforce it." - **Tier 2 — The no-mock smell test (the proxy metric).** Stated, reviewable rule: *a unit test that needs to mock OUR OWN modules is a design bug, not a test to write.* Allowed: mocking the **outermost edge** (real network, real clock). Banned: mocking `@dispatch/*` internals. Mechanical proxy: a CI grep hard-fails if a **kernel/core** test introduces an internal mock; the global count must trend toward zero. - **Tier 3 — Coverage as a FLOOR, not a target (with a caveat).** No coverage tooling exists today — add `@vitest/coverage-v8`. But (P4): coverage is a bad *target* (gameable — 100% of mock-heavy untestable code proves nothing) and a useful *floor* **only on pure-core/kernel packages**, where high coverage is cheap *because* the code is pure. **No global coverage gate** — it would incentivize mock-heavy shell tests, the exact thing we're fighting. - **Tier 4 — The harness layer (P5/P6 — teach the agents).** Encode the rule so future agents inherit it: a `rules/` safety reflex (below) + a **testable-by- default extension scaffold** in `sdk/` shipping the split pre-made: `logic.ts` (pure, no deps) + `adapter.ts` (effects) + `logic.test.ts` (mock-free). When the *template* is testable, the default output is testable. **THE KEY CAVEAT — asymmetric enforcement (strict core, lenient shell).** This is itself an application of the AI-harness thesis (P5/P6): **scoped rules beat general rules** — models already know "write testable code"; what they need is *"this kind of code, in this layer, gets tested this way."* - **Pure core / kernel:** strict — zero internal mocks, dependency-direction lint, coverage floor. High coverage is *cheap* here, so demand it. - **Imperative shell (orchestrator, transport, real SQLite adapter):** lenient — it will *never* hit high pure-unit coverage, and **forcing it to is the anti-pattern** (you'd do it by mocking everything, recreating today's mess). The shell gets a *thin layer of integration tests* against real / in-memory backends. A blanket rule would backfire — enforcement is asymmetric **by design**. **`rules/` safety reflexes to ship (Tier 4, scoped per the asymmetry):** - *Pure-core/kernel rule:* "Writing a unit test that mocks an internal module? The code is wrong, not the test. Move the decision logic to a pure function and inject the effect." - *Pure-core/kernel rule:* "This package must have zero effectful imports (`node:fs`, `bun:sqlite`, `node:child_process`, network). Need an effect? It belongs in the adapter/shell, injected." - *Shell rule:* "Don't chase pure-unit coverage here. Write a few integration tests against a real or in-memory backend; do NOT mock sibling extensions." - *General (all):* "Mocking the outermost edge (real network/clock) is fine; mocking `@dispatch/*` is a smell — fix the boundary." **The enforced standard (commit to this):** 1. Every extension has a **pure core with zero effect-imports**, lint-enforced (Tier 1) — *the load-bearing one.* 2. **No internal mocks in kernel/core tests** — CI grep; proxy metric → zero (T2). 3. **Coverage floor on pure packages only**, never global (Tier 3). 4. **Scoped `rules/` reflexes + a testable-by-default scaffold** (Tier 4). **Tooling actions (when we start):** add `@vitest/coverage-v8`; add the dependency-direction lint (Biome `noRestrictedImports`) scoped to pure packages; add the CI internal-mock grep for kernel/core; ship the `sdk/` scaffold. ### 3.7 Trust & isolation model (fault containment, not adversary sandboxing) **Threat model first (P4 — defend a real threat, not an imported one).** Dispatch is **personal, self-hosted, single-operator** today. So: - **Malicious extension** (data theft, host attack) — **NOT the current threat.** You run the host and choose the extensions; an installed extension is already as trusted as code you write. The "untrusted plugin marketplace" justification for sandboxing does not apply *yet* (revisit if Dispatch goes multi-tenant or ships a public registry). - **Buggy extension** (infinite loop, unhandled rejection, leak, bad migration) taking down every other tab/agent — **REAL and present**, especially since we want external/custom extensions. This directly threatens the §3.4 "never leave the system broken" guarantee. **So we defend against FAULTS, not ADVERSARIES** — until the project's nature changes. That distinction collapses the decision. **Options considered:** - **A — In-process, trusted (no isolation):** simplest/fastest, rich live-object API. But one throw / `process.exit` / leak hits everyone; capabilities are only advisory. *Too little — contradicts §3.4.* - **C — Hard isolation (worker/subprocess/VM per extension):** real fault *and* adversary isolation, enforceable capabilities. But **forces the entire Host API to be serializable** — no live `provider` handed to `runTurn`, no closure handlers, no streaming `ctx.onOutput` without marshalling — fighting *every* contract we designed, at real per-call IPC cost. *Too much, too early; defends a threat we don't have, and deforms the contracts (the P4 anti-pattern).* - **B — Soft isolation (in-process, defensively wrapped):** keep the rich in-process API, but the host wraps every extension boundary. **CHOSEN.** **DECISION: adopt B now; design contracts so C remains *possible* later without a rewrite.** Concretely: - **Host API stays rich/in-process** — live handlers, streams, objects. All prior design holds unchanged. - **Every extension boundary is defensively wrapped:** handler try/catch (already §3.5), **mandatory timeouts on awaited filters** (§3.5 makes filters in-band, so a runaway filter must be time-bounded), and **per-extension fault tracking → auto-disable a repeatedly-faulting extension** (contains the fault instead of letting it recur; ties to §3.4). - **Tier-aware auto-disable (mirrors the §3.6 asymmetry — strict core, graceful edge):** `standard`/`external` extensions *may* be auto-disabled on repeated faults; **`core`/`kernel` faults are fatal-and-surfaced, never silently degraded** — you want to know storage/transport is broken, not limp on. (Tools also get a deterministic residual result per §3.4 R2, so a tool fault never orphans a turn.) - **Capabilities are declared + gate-enforced at the Host-API surface** (advisory-but-checked), NOT OS-sandboxed. Honest scope: this catches accidental overreach and documents intent; it does not stop determined native code. - **Cheap future-proofing for optional C later:** keep contract payloads **structured and in-principle serializable** (the typed hook/service handles of §5.4 already push this way) — don't pass arbitrary live object *graphs* between extensions via services. Then moving one untrusted extension into a worker is a localized change, not an architecture rewrite. - **Manifest `trust` field** (`bundled` | `local` | `external`) recorded now even though all three behave identically under B — so the *policy hook* exists when we later want to treat `external` differently (e.g. worker isolation) without inventing the concept then. **Harness rules this decision generates (scoped per §5.1 layered knowledge; write into the agent files when those agents are built — NOT now, per §7.4):** - *All extension-author agents (shared knowledge):* "Your hook/filter handlers must never throw uncaught — the host wraps them, but a throw burns your fault budget and can auto-disable your extension." / "Filters are awaited and time-bounded — no unbounded work in a filter." / "Assume your extension can be disabled/reloaded independently; don't rely on ambient process state surviving (§3.4)." - *Service/contract-defining agents only:* "Keep service/contract payloads structured and serializable-friendly — no passing live object graphs across the extension boundary (preserves the option to isolate later)." - *Kernel/core agents only (strict):* "Core/kernel faults are fatal-and-surfaced, NOT auto-disabled — never write graceful-degradation code that hides a storage/transport failure." - *Tooling-enforced → deliberately NOT in agent files (P6):* the typed-handle rule (§5.4) is a compile error, and capability over-declaration is caught at manifest load — neither is written down as prose. --- ## 4. Cross-cutting decisions to lock down (when we start) - **Contract versioning:** convention-only & dormant in `0.x` (§2.9). Each package self-versions; semver *meaning* is changelog hygiene + the §5.3 fan-out signal. Internal safety = the type system; the compat gate / `.d.ts`-diff are deferred until external extensions exist. - **Trust & isolation:** **soft isolation (B)** — rich in-process Host API + defensively-wrapped extension boundaries (handler try/catch, filter timeouts, tier-aware auto-disable). Defends FAULTS not adversaries; contracts kept serializable-friendly so hard isolation (C) stays possible later (§3.7). - **System prompt / persona:** becomes a context-filter contribution, not a hard-coded string — so the assistant's "feel" is swappable. - **Migrations ownership:** each extension owns its tables; the kernel only runs the migration runner. Defines a clean uninstall story. - **Deterministic tool-set per turn:** reproducible from `(agent profile + capabilities + active extensions)` — this is P3 made concrete and kills wishlist bugs #16/#17. - **Tool-dispatch policy:** togglable per §3.3; default value is an open question (see §8). - **Durability / crash recovery:** incremental append + pure `reconcile()` on load + derived status (§3.4). Design rule: no persisted state a crash can leave may be unrepairable — recovery is deterministic and invisible to the user. - **Hooks:** decentralized catalog — kernel owns the mechanism, each extension declares the hooks it emits via typed descriptors (§3.5). Events are error-isolated; filters are in-band; single-responder request/response is a service, not a hook. - **Testability enforcement:** asymmetric — strict on pure core (zero effect-imports lint, no internal mocks, coverage floor), lenient on the shell (thin integration tests) (§3.6). Mock-of-internals count is the proxy metric. - **Agent workflow:** one owner-agent per extension/kernel; agents see only others' contracts, never implementation; contract changes fan out mechanically via `lsp references`; non-static cross-extension coupling is forbidden; glossary terms are human-gated (§5). --- ## 5. Repo & agent workflow conventions (one agent per unit) The repo's **agent-team structure is isomorphic to its module structure**: agents communicate through exactly the same contracts the code communicates through. This is Conway's Law made intentional, and it yields a diagnostic property: > **Friction between agents is a signal of bad architecture.** Constant > agent-to-agent messaging ⇒ the contract boundary is wrong. An agent needing to > read another's implementation ⇒ that contract is underspecified. The workflow > *surfaces* design smells instead of hiding them. It is not a bolt-on — every row below already exists in this plan: | This model needs… | …already provided by | |---|---| | Contracts as the only cross-agent surface | ABI (kernel) + two-sided per-extension contracts (§2.3) | | One agent per unit | P1 feature-as-a-library — one library, one owner | | Per-agent scoped knowledge | **P7 extension-scoped harness** — an extension's AGENTS.md/rules/glossary *is* its owner-agent's knowledge | | Layered knowledge (group → file) | P5 tiered-cache layering (§7.1) | | Persistent, messageable agents | Dispatch's own tabs + `send_to_tab` + `summon`/`retrieve` | | Bounded cross-agent chatter | the existing `MAX_AGENT_AUTO_WAKES` budget | | Orchestrator confirms without reading code | **§3.6 testability** — tests-at-boundaries are the trust mechanism | The last row is the deepest synthesis: **§3.6 is the orchestrator's verification protocol.** It can't read code, so it confirms "everything works" from *contracts + test results + build/diagnostics output* — which only works because we made the boundaries testable. The keystone equivalence: **P7 harness docs ARE the agents' scoped knowledge** — the same artifact, two views; you don't design knowledge-scoping separately. ### 5.1 The ownership model - **One owner-agent per unit** (each extension, and the kernel). Its file(s) are edited by no one else → single-writer, so a (future) sleeping agent wakes knowing its own code is current. - **Knowledge is scoped & layered** (P5/P7): shared group knowledge (e.g. all "frontend" agents) → per-extension knowledge → per-file specifics. An owner loads only its layer, so it is a narrow-domain expert with lean context. - **Visibility rule:** an agent sees **only what other extensions expose/require** (their contracts) — never their implementation. Implementation is **not provided by default** (P6/§3.6 caveat #3); *needing* it is a signal the contract is incomplete — fix the contract (or ask the owner), don't grant code access. Corollary: **a contract documents behavior & guarantees a consumer can rely on, not just types** (P6 applied to contracts). - **Phase note (P4):** start by **summoning fresh agents per task** — files aren't complex enough to justify warm/persistent agents yet. Persistent *waking* agents (and the wake-time "contract-delta since last active" sync they require) are deferred to **after the rewrite**. ### 5.2 The workflow (build a feature) 1. User asks the **orchestrator** for feature X. (Orchestrator sees all *contracts*, no implementation.) 2. **Overlap check first (anti-webhook-reimplementation, §7):** orchestrator consults the GLOSSARY + feature-docs to see whether the capability already exists under a canonical term. 3. **Boundary decision is the USER's, never silent (resolves §3.6 #5):** if X maps to a new capability, the orchestrator **surfaces "new extension vs. extend an existing one?" to the user** and waits — it never decides granularity itself (this is the exact failure the article warns about; the glossary/feature-docs are the defense, the user is the authority). 4. Orchestrator **summons the owner-agent(s)** to do the work and **messages any extensions needing changes** (via their owner-agents). 5. Owners report back; orchestrator confirms via contracts + tests + build. 6. Clarification questions agent↔agent are *allowed but rare* — everything an agent needs (contracts) is already exposed; a needed question usually means a contract gap. ### 5.3 Contract changes — mechanical blast radius (resolves §3.6 #2) A contract change is the one event that legitimately fans out. It is handled **mechanically, not by guessing**, via the existing `lsp` tool: 1. The contract's owner edits it, then runs **`lsp references`** on the changed symbol(s) → the complete set of consuming files. 2. The owner **reports that file list up to the orchestrator** (it can't see other extensions itself); the **orchestrator dispatches** the affected owner-agents to update to the new contract. - **Ownership:** kernel-intrinsic ABI → kernel agent (most conservative, changes rarely). Per-extension contracts → that extension's agent, **co-located in its package** (not a central dir — see §2.5). - **Prerequisite:** a **TypeScript language server** wired into `dispatch.toml` (today's LSP config only has the Luau example). ### 5.4 Static-reference rule — non-static cross-extension coupling is forbidden For §5.3 to be *sound*, `lsp references` must see every coupling. So: > **Every cross-extension coupling is anchored to an exported typed symbol.** > Dynamic/string-keyed cross-feature references are forbidden. - **Enforced by the type system, not a lint:** the Host API *accepts only typed handles* — `host.on(HookDescriptor, …)`, `host.getService(ServiceHandle)` — so a raw string at a consumer site is a **compile error** (surfaced via `lsp diagnostics`). The raw string exists in exactly one place: the owner's `defineHook`/`defineService` declaration. `lsp references` on that exported symbol therefore returns the true, complete blast radius. This is *why* typed descriptors (§3.5) + typed service handles (§2.3) beat string lookups — not aesthetics, but making the agent workflow mechanically sound. - **Scope (P4 — don't overclaim):** this bans cross-extension **code** coupling. Two dynamic lookups are *legitimate and stay*, because they are **data flow / discovery inside the kernel-host, not feature-to-feature references**: (a) the kernel routing a model's tool-call by name (`byName.get(name)`) — the name is the LLM's runtime choice, i.e. data; (b) the host loading extensions by scanning manifests (traced by the manifest DAG, not symbol refs). - **The one escape hatch (named, restricted):** generic observability (e.g. a logger wanting *every* hook) may use a single `host.onAny(listener)` firehose, explicitly marked "observability only, never feature code." ### 5.5 Integration bugs — the temporary multi-knowledge agent A bug where X and Y each honor the contract yet don't work together belongs to no single file. Resolution (resolves §3.6 #4): - The orchestrator dispatches a **temporary multi-knowledge agent** loaded with the **scoped knowledge AND read/write access to the 2–3 relevant files** — unlike normal agents it *does* see implementation, because fixing integration requires it. - It becomes the **temporary exclusive owner** of those files for its lifetime (the orchestrator must not let the normal owners edit them concurrently → preserves single-writer). - **Both trigger paths:** the orchestrator dispatches it proactively, OR a file-owner who spots the bug **requests one from the orchestrator** (reuses the §3.5 agent→orchestrator message path; no new mechanism). - It leverages the existing knowledge-scoping so the agent gets *exactly* the context to fix the seam and no more. ### 5.6 The glossary is a human-gated checkpoint (strengthens P8) This is the article's central anti-synonym-drift mechanism: the GLOSSARY's **"aliases to avoid" column** exists so the agent never reinvents a concept under a new name (the article's `WebhookEvent` / `WebhookHook` / `HookedWebhook` problem), and the §5.2 step-2 overlap check is *when* it runs ("mandatory feature overlap detection before any new feature"). The orchestrator may **never silently coin a term.** Two cases: **Case A — concept already exists (synonym-drift defense — the priority).** When a request *describes* an existing concept — even by behavior, under a different name — the orchestrator must **recognize the match and steer to the existing canonical term, creating NO new entry.** - *Example (the user's):* request = "implement a **web-notifier**: accept a request from an HTTP endpoint requiring no password, then log it." The orchestrator recognizes this *is* a **webhook** (already in the glossary) and responds "that's a `webhook` — I'll use that name," rather than adding "web-notifier". - Recognition is powered by the glossary's aliases + overlap check, and works on **behavioral descriptions**, not just name matches. - **Still suggest-then-confirm (P4):** recognition can misfire (the user may mean something subtly different). The orchestrator *proposes* the match ("this looks like a `webhook` — shall I call it that?"); the user has final say. It never silently collapses a possibly-distinct concept into an existing term. If the user confirms it's a new alias for an existing term, add it to that term's "aliases to avoid" column (don't make a new entry). **Case B — genuinely new concept (name it well).** When the concept is actually new, before adding the entry the orchestrator must: 1. State the new term and its understanding of what it means. 2. **Propose a name, defaulting to the standardized / training-baked term** (e.g. "patch" not "Bugfix"; "debounce" not "cooldown-wait"). Rationale (P6): a name models already know costs **zero agent-file/glossary space**, so the glossary only grows entries for genuinely project-specific concepts — it actively fights its own bloat. 3. **Ask the user** to approve or rename. The user is the final authority: if they prefer a different name, **always go with the user's choice** (record the standard term, if any, under "aliases to avoid"). The "suggest the standard name" rule applies only to a *not-yet-decided* term — never to override a name the user already set. This keeps the user the authority on the project's vocabulary and makes synonym drift impossible at the source — P8 with a mandatory human in the loop, biased toward (A) reusing existing terms and (B) names the model already knows. --- ## 6. Current-state map (as of this plan) Dependency direction is one-way: **`frontend → api → core`**. `core` is already framework-agnostic (no Hono/HTTP) — the cleanest existing seam. *(Note: "core" here is the **current** package name; under the new model the runtime primitive is the kernel and "core" becomes the extension tier — see §2.6.)* ``` packages/ │ ├── core/ → @dispatch/core — shared domain logic (the "brain"), framework-agnostic │ │ (exported via src/index.ts barrel) │ ├── agent/agent.ts agentic LLM loop (streamText + manual tool-call dispatch, │ │ dedup, per-line/spill truncation, user-interrupt injection, │ │ reasoning-effort, multimodal user content) │ ├── llm/ │ │ ├── provider.ts createProvider() — Anthropic + OpenAI-compatible factories, │ │ │ mcp_ tool-name prefix/unprefix │ │ ├── anthropic-oauth-transform.ts Claude OAuth request-body transform │ │ └── debug-logger.ts DISPATCH_DEBUG_LLM stream/loop/fetch logging │ ├── tools/ tool implementations (each createXTool → ToolDefinition) │ │ ├── registry.ts createToolRegistry; Zod→JSONSchema + Anthropic normalize │ │ ├── read-file.ts, read-file-slice.ts, write-file.ts, list-files.ts │ │ ├── run-shell.ts (+ BackgroundShellStore), shell-analyze.ts, bash-arity.ts │ │ ├── search-code.ts, web-search.ts, youtube-transcribe.ts (+ BackgroundTranscriptStore) │ │ ├── summon.ts, retrieve.ts sub-agent spawn / result collection │ │ ├── send-to-tab.ts, read-tab.ts tab-to-tab comms │ │ ├── task-list.ts (todo), key-usage.ts, lsp.ts │ │ ├── truncate.ts universal tool-output truncator + /tmp spill │ │ └── path-utils.ts canonicalize / workdir-containment guard │ ├── db/ SQLite (bun:sqlite, XDG data dir) │ │ ├── index.ts singleton DB + table DDL/migrations (credentials, api_keys, │ │ │ usage_cache, wake_schedule, tabs, chunks, settings) │ │ ├── tabs.ts tabs CRUD, short-prefix resolution, positions/status/title │ │ ├── chunks.ts append-only chunk log: explode/group rows ↔ messages, usage │ │ └── settings.ts key/value settings │ ├── chunks/ pure conversation-model transforms (no DB import — shared w/ frontend) │ │ ├── append.ts appendEventToChunks / applySystemEvent (stream → Chunk[]) │ │ └── transform.ts explode/group between Chunk[] and flat ChunkRow log │ ├── compaction/index.ts head/tail selection, summary prompt + transcript render │ ├── config/ dispatch.toml (global ~/.config + project merge) │ │ ├── loader.ts, schema.ts, watcher.ts, index.ts load/validate/hot-reload; configToRuleset │ ├── credentials/ claude.ts (OAuth identity/billing), api-keys.ts, opencode.ts, │ │ copilot.ts, google.ts, anthropic-betas.ts, store.ts, index.ts │ ├── models/ registry.ts (ModelRegistry, key states), catalog.ts, │ │ attachments.ts (image/pdf validation + limits), index.ts │ ├── skills/ parser.ts, loader.ts, index.ts (skill files → agent injection) │ ├── agents/ loader.ts, index.ts (global + .dispatch/agents defs, tool-group expand) │ ├── permission/ rules engine: evaluate.ts, service.ts, wildcard.ts, index.ts │ ├── lsp/ manager.ts, client.ts, server.ts, language.ts, diagnostic.ts, index.ts │ ├── notifications/ ntfy.sh: dispatcher.ts, ntfy.ts, config.ts, types.ts, index.ts │ ├── types/index.ts ALL shared contracts: Chunk/ChatMessage, AgentEvent, AgentConfig, │ │ ToolDefinition, ToolExecuteContext, DispatchConfig, ReasoningEffort… │ └── index.ts public barrel (entire core API surface) │ ├── api/ → @dispatch/api — backend HTTP + WebSocket server (Hono on Bun) │ ├── index.ts Bun.serve (+ EADDRINUSE port-fallback) + /ws WebSocket │ │ (statuses snapshot, event fan-out, permission replies) │ ├── app.ts Hono app + CORS; /health, /status, /chat (main entry), │ │ /chat/cancel, /chat/stop, /chat/warm; mounts routes; │ │ constructs agentManager + permissionManager + notificationDispatcher │ ├── agent-manager.ts THE orchestrator (~2.4k lines): per-tab turns, message queue, │ │ key/model fallback chain, system-prompt assembly (buildSystemPrompt │ │ + TOOL_DESCRIPTIONS), per-turn tool assembly (perm/whitelist gated), │ │ sub-agent spawning, LSP-on-write hook, auto-wake budget, compaction │ ├── permission-manager.ts tool-permission prompts/replies over WS │ ├── wake-scheduler.ts pure Claude wake-probe scheduling helpers (4 slots/hour, recovery) │ ├── types.ts thin re-export of AgentEvent/AgentStatus from core │ ├── routes/ /config, /tabs, /models (+ startWakeScheduler), /skills, │ │ /agents, /notifications (each uses a setXGetter injection seam) │ └── tests/ agent-manager, routes, permission-manager, wake-scheduler │ └── frontend/ → Svelte 5 SPA (Vite); morphable, reworked later ├── main.ts, App.svelte, app.css └── lib/ ├── tabs.svelte.ts central store: sendMessage + WS event handling ├── ws.svelte.ts WebSocket client (auto-reconnect) ├── router.svelte.ts, config.ts, types.ts, theme.ts, settings.svelte.ts ├── context-window.ts, attachment-tokens.ts, snapshot-sequencer.ts ├── cache-warming.svelte.ts, cache-warm-storage.ts, sidebar-storage.ts └── components/ ChatInput, ChatPanel, ChatMessage, ToolCallDisplay, TabBar, ModelSelector, ConfigPanel, AgentBuilder, SystemPromptPanel, SkillsBrowser, ToolPermissions, PermissionPrompt, TaskListPanel, KeyUsage, CacheRatePanel, ContextWindowPanel, SettingsPanel, MarkdownRenderer, … (23 total) ``` ### 6.1 Key facts that matter for the rework - **`agent-manager.ts` is the center of gravity** (~2,453 lines): per-turn tool assembly, system-prompt building, provider/key resolution, sub-agents, queueing all fused. This is what dissolves into kernel + core orchestrator + standard contributions. - **`types/index.ts` is the de-facto contract layer today** — `ToolDefinition`, `AgentConfig`, `AgentEvent`, `DispatchConfig` all live here. Natural seed for a real `contracts` package (kernel). - **Routes already use a `setXGetter` injection pattern** (`setSkillsGetter`, `setModelsGetter`, …) — a primitive form of the DI seam the extension host would formalize. - **Per-turn tool assembly is a giant duplicated if/else** in `agent-manager` (parent-perms path + child-whitelist path) — prime candidate for a registry populated by extensions. - **Tool execution today is post-stream + sequential** (`agent.ts` ~line 1426) — see §3.3 for the eager/concurrent redesign. --- ## 7. The AI Harness (meta-information layer) From "The AI Harness: why your AI coding agent is only as smart as the repo you put it in" (Louai Boumediene, Activepieces). Thesis: the model is rarely the bottleneck — the structured meta-information around the code is. Agent context is a **tiered cache**: tiny files always loaded, big files on demand. ### 7.1 The layering (governing test: P6 — only the non-inferable) | Layer | Size / load | Purpose | |---|---|---| | Root `AGENTS.md` — "constitution" | ~55 lines, **every session** | Non-obvious architecture rules only | | Per-package/extension `AGENTS.md` | ~30–55 lines, when working there | Package-specific patterns | | `rules/` — "safety reflexes" | 3–5 lines each, every session | Crystallized scar tissue (bugs you've reverted) | | `features/*` — "module encyclopedia" | ~60 lines each, on demand | Entity schemas, data flow, gotchas per module | | `skills/*` — codified workflows | slash commands, progressive disclosure | Fixed procedures for repeated tasks | | `GLOSSARY.md` | term table + "aliases to avoid" | Fights synonym drift | ### 7.2 Why it applies strongly to us (evidence, not fashion) - **The layering maps 1:1 onto minimal-kernel + extensions.** "One ~60-line doc per module" *is* "one doc per extension" — the extension boundary is the doc boundary. The architecture gives us the harness structure for free. - **We already have the scar tissue that becomes `rules/`:** Anthropic schema normalization in `registry.ts` ("Claude never sees the tool and thinks forever"), workdir-containment in `path-utils.ts`, tool-call dedup ("150+ identical calls"), `[USER INTERRUPT]` stripping, the no-`execute` tool pattern. These are postmortems-as-comments — promote them to 3–5 line rules. - **Real synonym-drift problem** (P8): tab/session/conversation, chunk/message/turn/step. A glossary with "aliases to avoid" is warranted. ### 7.3 The special angle for this project (synthesis) Dispatch is **recursive** — an AI-agent platform that itself *has* skills, agents, and permissions. Two consequences: - **The harness is extension-scoped (P7):** each extension carries its own constitution snippet, rules, feature doc, glossary terms, and skills, portable with the code. Feature-as-a-library applied to documentation. - **"Tiered context as a cache" is already Dispatch's product behavior** (prompt-caching, on-demand skills, compaction). The article describes from the outside the thing we build from the inside — a strong signal the layering is sound. ### 7.4 What we bound or reject (P4 applied) - **Volume (40+ docs, 9 skills) and the 5-features/week cadence** — scale artifacts of a 12-engineer, 1.6M-LOC monorepo. Our version: write a doc the moment we touch an extension that lacks one (doc-first as the plan brief), grow organically. - **Worktrees / parallel sessions / weekly rhythm / MCPs** — that's *workflow*, not *architecture*; out of scope for the structure we're designing. (Amusingly, Dispatch's parallel tabs are its own take on parallel sessions.) --- ## 8. Open questions / where we start (TBD) - **Starting point (proposed):** lock the **Contracts** + **Extension Host**, then prove the whole stack with one vertical slice — e.g. extract `read_file` into a standalone, independently-importable `standard` extension with pure-core / injected-shell tests. That single slice validates the architecture (P1, the contracts, the host, the tier model) and the engineering constraints (P2, P3) before scaling out. - **Open decisions before we begin:** none remaining — all resolved (see below). - **Deferred to after the rewrite (P4):** - Persistent *waking* agents + their wake-time "contract-delta since last active" sync (§5.1) — start with fresh-summoned agents. - TypeScript language server wired into `dispatch.toml` is a **prerequisite** for §5.3's `lsp references` workflow (today only Luau is configured). - **Vocabulary unification — `command` → `action` (P8; raised during the frontend design, `notes/frontend-design.md` §9):** the frontend names a backend-invokable action `action` / `action ref`; the backend's existing contribution point is `command`. Review renaming `command` → `action` so both sides share ONE term. Until this review the backend keeps `command` and the frontend uses `action`. Cheap today (the `command` contribution is design-stage, lightly built); if pursued, fan out via `lsp references`. - **Decided so far:** - ~~Tool-dispatch default policy~~ — **DECIDED** (§3.3): default `{ maxConcurrent: 1, eager: true }`. - ~~Who drives the multi-step loop~~ — **DECIDED**: the **kernel** drives it (the loop is the kernel's reason to exist); tools stay dumb objects it calls. - ~~Conversation-store boundary~~ — **DECIDED** (§2.2, §2.8): the kernel keeps only the conversation **model types** + pure transforms; the persistent store and SQLite backend are **`core` extensions** (fixes the §2.2/§2.7 I/O inconsistency). - ~~"Minimum viable turn" target~~ — **DECIDED** (§2.8): `core` targets **(B)** a usable multi-turn chat; the storage backend is the single swappable piece that drops it to the **(A)** stateless floor (= the in-memory test config). - ~~Crash-recovery strategy~~ — **DECIDED** (§3.4): incremental append-only persistence (R1), pure `reconcile(rows)` repair on load (R2), derived/boot- swept status (R3), resume = load→reconcile→continue (R4). Mid-stream resumption (wishlist #1) explicitly deferred. - ~~Hook system shape~~ — **DECIDED** (§3.5): decentralized typed-descriptor catalog (kernel owns mechanism, owners declare hooks); events vs filters; single-responder = service, not hook. Wildcards/pattern-subs deferred. - ~~Testability enforcement~~ — **DECIDED** (§3.6): structural (zero effect-imports in pure packages, lint-enforced) + no-internal-mocks proxy metric + coverage floor on pure packages only + scoped `rules/` reflexes; enforcement is **asymmetric** (strict core / lenient shell). - ~~Agent workflow / repo conventions~~ — **DECIDED** (§5): one owner-agent per unit; contracts are the only cross-agent surface (implementation hidden by default; needing it = contract gap); contract changes fan out via `lsp references` (orchestrator dispatches); **non-static cross-extension coupling forbidden** (typed handles, type-system-enforced, `onAny` escape hatch); temporary multi-knowledge agent for integration bugs; **glossary is human-gated** (orchestrator must ask before coining a term). - ~~Per-extension contract location~~ — **DECIDED** (§2.5, §5): co-located in each extension package; only the kernel ABI is centralized in `kernel/contracts/`. - ~~Boundary granularity (new ext vs extend)~~ — **DECIDED** (§5.2): the **user** decides; the orchestrator surfaces it after a glossary/feature-doc overlap check, never silently. - ~~Trust & isolation model~~ — **DECIDED** (§3.7): **soft isolation (B)** — rich in-process API + defensively-wrapped boundaries; defends faults not adversaries (single-operator threat model); tier-aware auto-disable (strict core / graceful edge); contracts kept serializable-friendly + manifest `trust` field so hard isolation (C) stays possible without a rewrite. - ~~Contract-versioning policy~~ — **DECIDED** (§2.9): convention-only & dormant in `0.x`; each package self-versions; semver meaning (major=break/fan-out, minor=additive, patch=internal) as changelog hygiene + §5.3 signal; type system is the internal mechanism; compat gate + `.d.ts`-diff deferred until external extensions exist. - ~~Core-default provider/auth~~ — **DECIDED** (§2.10): **OpenAI-compatible + API-key** (`provider-openai-compat` + `auth-apikey`) — leanest auth surface, most-testable, and = the **OpenCode Go flash** testbench. Claude/OAuth and the Anthropic-format OpenCode models are `standard` extensions. --- ## Appendix — Principle quick-reference - **P1** Feature-as-a-library (importable, minimal API; don't over-split) - **P2** Functional core / imperative shell (testability not purity; inject effects) - **P3** No ambient state (own and pass explicitly; reproducible tool-sets) - **P4** Don't adopt by reputation (earn each pattern against real evidence) - **P5** The repo is a harness (meta-info is a first-class, tiered deliverable) - **P6** Document only the non-inferable (tribal knowledge / scar tissue only) - **P7** The harness is extension-scoped (docs portable with the code) - **P8** One canonical vocabulary (glossary + aliases-to-avoid; no synonym drift)