diff options
| author | Adam Malczewski <[email protected]> | 2026-06-21 14:34:22 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-21 14:34:22 +0900 |
| commit | d56fe9cf64719bb330c17b2daee58c0bafa057c9 (patch) | |
| tree | b80a25aaee57f959454d468e03f100c38e224b82 | |
| parent | 8a4a624d16422467a8e85434c674bb591877e8ea (diff) | |
| download | dispatch-d56fe9cf64719bb330c17b2daee58c0bafa057c9.tar.gz dispatch-d56fe9cf64719bb330c17b2daee58c0bafa057c9.zip | |
feat(todo): per-conversation task list tool + surface
New standard tool extension with a single todo_write tool (opencode
todowrite pattern: full-list replace, returns JSON, no business-rule
enforcement — the description guides the model). Per-conversation in-memory
state + per-conversation surface (rendererId: todo, scope: conversation)
via subscriber-notify (message-queue pattern).
Wave 0 (kernel contract): added conversationId?: string to ToolExecuteContext
(additive, backward-compatible). Wired in dispatch.ts — the kernel already
had it but wasn't passing it through to tools.
Wave 1 (todo extension): pure core (validateTodos — shape only; getTodos/
setTodos/clearTodos; buildTodoSpec; formatTodoResult). Shell:
createTodoWriteTool + surface provider. Tool description matches opencode's
todowrite.txt depth (when-to-use, examples, task states). Priority field
removed (bloats the tool with little value). 25 tests.
Wave 2 (host-bin): registered todo in CORE_EXTENSIONS + dep + root tsconfig ref.
Verified: tsc EXIT 0, 1123 vitest, biome clean (314 files). Boot smoke clean.
FE handoff: frontend-todo-handoff.md.
| -rw-r--r-- | bun.lock | 12 | ||||
| -rw-r--r-- | frontend-todo-handoff.md | 91 | ||||
| -rw-r--r-- | packages/host-bin/package.json | 1 | ||||
| -rw-r--r-- | packages/host-bin/src/main.ts | 2 | ||||
| -rw-r--r-- | packages/kernel/src/contracts/tool.ts | 8 | ||||
| -rw-r--r-- | packages/kernel/src/runtime/dispatch.ts | 1 | ||||
| -rw-r--r-- | packages/todo/package.json | 13 | ||||
| -rw-r--r-- | packages/todo/src/extension.test.ts | 138 | ||||
| -rw-r--r-- | packages/todo/src/extension.ts | 81 | ||||
| -rw-r--r-- | packages/todo/src/format.test.ts | 18 | ||||
| -rw-r--r-- | packages/todo/src/index.ts | 24 | ||||
| -rw-r--r-- | packages/todo/src/pure.ts | 148 | ||||
| -rw-r--r-- | packages/todo/src/store.test.ts | 47 | ||||
| -rw-r--r-- | packages/todo/src/tool.test.ts | 101 | ||||
| -rw-r--r-- | packages/todo/src/tool.ts | 240 | ||||
| -rw-r--r-- | packages/todo/src/validate.test.ts | 66 | ||||
| -rw-r--r-- | packages/todo/tsconfig.json | 10 | ||||
| -rw-r--r-- | tasks.md | 31 | ||||
| -rw-r--r-- | tsconfig.json | 1 |
19 files changed, 1029 insertions, 4 deletions
@@ -70,6 +70,7 @@ "@dispatch/surface-loaded-extensions": "workspace:*", "@dispatch/surface-registry": "workspace:*", "@dispatch/throughput-store": "workspace:*", + "@dispatch/todo": "workspace:*", "@dispatch/tool-edit-file": "workspace:*", "@dispatch/tool-read-file": "workspace:*", "@dispatch/tool-shell": "workspace:*", @@ -193,6 +194,15 @@ "@dispatch/kernel": "workspace:*", }, }, + "packages/todo": { + "name": "@dispatch/todo", + "version": "0.0.0", + "dependencies": { + "@dispatch/kernel": "workspace:*", + "@dispatch/surface-registry": "workspace:*", + "@dispatch/ui-contract": "workspace:*", + }, + }, "packages/tool-edit-file": { "name": "@dispatch/tool-edit-file", "version": "0.0.0", @@ -340,6 +350,8 @@ "@dispatch/throughput-store": ["@dispatch/throughput-store@workspace:packages/throughput-store"], + "@dispatch/todo": ["@dispatch/todo@workspace:packages/todo"], + "@dispatch/tool-edit-file": ["@dispatch/tool-edit-file@workspace:packages/tool-edit-file"], "@dispatch/tool-read-file": ["@dispatch/tool-read-file@workspace:packages/tool-read-file"], diff --git a/frontend-todo-handoff.md b/frontend-todo-handoff.md new file mode 100644 index 0000000..0edfa60 --- /dev/null +++ b/frontend-todo-handoff.md @@ -0,0 +1,91 @@ +# FE handoff — todo task list surface + +Courier this to `../dispatch-web` (cross-repo contract change; `lsp references` does +not span repos — ORCHESTRATOR §7). All changes are ADDITIVE — nothing existing breaks. + +## What shipped (backend) + +A per-conversation **task list** the AI model maintains via a `todo_write` tool. The +list is exposed to the frontend as a per-conversation **surface** (read-only). The +model creates/updates the list during a turn; the surface updates live so the FE can +render the current state. + +- **`todo_write` tool** — the model passes the FULL list each call (replaces the + existing list). Returns the list as JSON. The tool description guides the model on + when to use it (3+ step tasks, planning, etc.). +- **State** — in-memory, per-conversation. No persistence (the list lives for the + process lifetime of the conversation). +- **No new wire types, no version bumps.** The todo surface uses the existing + `custom` surface field kind (`ui-contract` unchanged). The `TodoItem` type is + defined by the `todo` extension and carried in the surface payload — it is NOT + in `@dispatch/wire` or `@dispatch/transport-contract`. + +## The surface + +The `todo` extension contributes a per-conversation surface: + +- **Surface id:** `"todo"` +- **Scope:** `"conversation"` (subscribe with the `conversationId`) +- **Region:** `"side"` +- **Title:** `"Tasks"` +- **One `custom` field**, `rendererId: "todo"`, `payload: TodoPayload` + +```ts +interface TodoPayload { + todos: readonly TodoItem[]; +} + +interface TodoItem { + content: string; + status: "pending" | "in_progress" | "completed" | "cancelled"; +} +``` + +- **Read-only** — no `invoke` actions. The model mutates the list via the + `todo_write` tool; the FE only renders. +- **Updates** on every `todo_write` call (subscriber-notify → full new spec with the + updated `todos` array). +- **Empty list** — an idle conversation (no todo list created yet, or the model + cleared it with an empty array) renders `todos: []`. Hide the panel when empty. + +## What the FE needs to do + +1. **Subscribe** to the `todo` surface per conversation (same pattern as + `message-queue` and `cache-warming` — `scope: "conversation"`, pass + `conversationId` on subscribe). + +2. **Custom renderer** for `rendererId: "todo"` — render the `payload.todos` array + as a task list. Suggested UI: + - Each item shows `content` with a status indicator: + - `pending` — empty circle / checkbox + - `in_progress` — spinner / filled circle (highlight) + - `completed` — checkmark (strikethrough or dim the content) + - `cancelled` — X / dash (dim/strikethrough) + - Order is significant — items are in the order the model provided them (array + index = identity). + - Only one item should be `in_progress` at a time (the tool description enforces + this via guidance, not validation — but the model should comply). + +3. **Live updates** — the surface pushes a new spec on every `todo_write` call. No + polling needed. Just re-render from the new `payload.todos`. + +4. **Empty state** — when `todos` is `[]`, hide the panel (the model hasn't created + a list yet, or cleared it). + +## No other integration points + +- No new WS ops (no `chat.queue` equivalent — the model is the only writer). +- No new HTTP endpoints (the list is tool-driven, not API-driven). +- No new `AgentEvent` types (the list is not on the chat stream). +- No version bumps in `@dispatch/wire` or `@dispatch/transport-contract`. + +## Notes + +- **In-memory only** — the todo list does NOT persist across server restarts. If + the server restarts, the list is cleared. The model recreates it on the next + `todo_write` call. This mirrors the message-queue behavior. +- **Per-conversation** — each conversation has its own list. Switching conversations + means subscribing to a different `conversationId` and rendering that conversation's + list. +- **Model-driven** — the FE has no control over the list (read-only surface). The + model creates, updates, and clears items. The FE just displays the current state. diff --git a/packages/host-bin/package.json b/packages/host-bin/package.json index 5f7d0e7..be530b5 100644 --- a/packages/host-bin/package.json +++ b/packages/host-bin/package.json @@ -16,6 +16,7 @@ "@dispatch/session-orchestrator": "workspace:*", "@dispatch/skills": "workspace:*", "@dispatch/throughput-store": "workspace:*", + "@dispatch/todo": "workspace:*", "@dispatch/transport-http": "workspace:*", "@dispatch/tool-read-file": "workspace:*", "@dispatch/tool-shell": "workspace:*", diff --git a/packages/host-bin/src/main.ts b/packages/host-bin/src/main.ts index 1c122de..5a7fcae 100644 --- a/packages/host-bin/src/main.ts +++ b/packages/host-bin/src/main.ts @@ -29,6 +29,7 @@ import { createSqliteStorage, extension as storageSqliteExt } from "@dispatch/st import { createLoadedExtensionsExtension } from "@dispatch/surface-loaded-extensions"; import { createSurfaceRegistryExtension } from "@dispatch/surface-registry"; import { extension as throughputStoreExt } from "@dispatch/throughput-store"; +import { extension as todoExt } from "@dispatch/todo"; import { extension as toolEditFileExt } from "@dispatch/tool-edit-file"; import { extension as toolReadFileExt } from "@dispatch/tool-read-file"; import { extension as toolShellExt } from "@dispatch/tool-shell"; @@ -78,6 +79,7 @@ const CORE_EXTENSIONS: readonly Extension[] = [ toolWriteFileExt, toolWebSearchExt, throughputStoreExt, + todoExt, messageQueueExt, sessionOrchestratorExt, skillsExt, diff --git a/packages/kernel/src/contracts/tool.ts b/packages/kernel/src/contracts/tool.ts index f617f42..d5a835c 100644 --- a/packages/kernel/src/contracts/tool.ts +++ b/packages/kernel/src/contracts/tool.ts @@ -70,6 +70,14 @@ export interface ToolExecuteContext { * workdir. The kernel never interprets it. */ readonly cwd?: string; + + /** + * The conversation this tool-call belongs to. Tools that maintain + * per-conversation state (e.g. a todo list) key on this. Forwarded + * verbatim from `RunTurnInput.conversationId`. Optional: when omitted, + * a tool has no conversation scope (e.g. a global tool). + */ + readonly conversationId?: string; } /** diff --git a/packages/kernel/src/runtime/dispatch.ts b/packages/kernel/src/runtime/dispatch.ts index d168319..d1c46cb 100644 --- a/packages/kernel/src/runtime/dispatch.ts +++ b/packages/kernel/src/runtime/dispatch.ts @@ -32,6 +32,7 @@ export async function executeToolCall( emit(toolOutputEvent(conversationId, turnId, call.id, data, stream)); }, log: toolSpan?.log ?? createNoopLogger(), + conversationId, ...(cwd !== undefined ? { cwd } : {}), }; try { diff --git a/packages/todo/package.json b/packages/todo/package.json new file mode 100644 index 0000000..14b8e55 --- /dev/null +++ b/packages/todo/package.json @@ -0,0 +1,13 @@ +{ + "name": "@dispatch/todo", + "version": "0.0.0", + "type": "module", + "private": true, + "main": "dist/index.js", + "types": "dist/index.d.ts", + "dependencies": { + "@dispatch/kernel": "workspace:*", + "@dispatch/surface-registry": "workspace:*", + "@dispatch/ui-contract": "workspace:*" + } +} diff --git a/packages/todo/src/extension.test.ts b/packages/todo/src/extension.test.ts new file mode 100644 index 0000000..8b9e84b --- /dev/null +++ b/packages/todo/src/extension.test.ts @@ -0,0 +1,138 @@ +import { createLogger, type HostAPI, type ToolExecuteContext } from "@dispatch/kernel"; +import type { SurfaceProvider, SurfaceRegistry } from "@dispatch/surface-registry"; +import { describe, expect, it, vi } from "vitest"; +import { activate, extension, manifest } from "./extension.js"; + +function stubCtx(overrides?: Partial<ToolExecuteContext>): ToolExecuteContext { + return { + toolCallId: "test-call-1", + onOutput: () => {}, + signal: new AbortController().signal, + log: createLogger( + { extensionId: "test" }, + { emit: () => {} }, + { now: () => 0, newId: () => "id" }, + ), + ...overrides, + }; +} + +interface FakeHost { + readonly host: HostAPI; + readonly registry: SurfaceRegistry; + readonly defineTool: ReturnType<typeof vi.fn>; + readonly getProvider: () => SurfaceProvider | undefined; +} + +function makeFakeHost(): FakeHost { + const defineTool = vi.fn(); + let provider: SurfaceProvider | undefined; + const registry: SurfaceRegistry = { + register(p) { + provider = p; + return () => { + provider = undefined; + }; + }, + getCatalog() { + return provider === undefined ? [] : [provider.catalogEntry]; + }, + getSurface(id) { + if (provider === undefined) return undefined; + return provider.catalogEntry.id === id ? provider : undefined; + }, + }; + const host = { + defineTool, + getService: () => registry, + logger: { + debug: vi.fn(), + info: vi.fn(), + warn: vi.fn(), + error: vi.fn(), + span: vi.fn(() => ({ end: vi.fn() })), + }, + } as unknown as HostAPI; + return { host, registry, defineTool, getProvider: () => provider }; +} + +describe("todo manifest", () => { + it("declares todo_write contribution + surface-registry dependency", () => { + expect(manifest.id).toBe("todo"); + expect(manifest.activation).toBe("eager"); + expect(manifest.trust).toBe("bundled"); + expect(manifest.dependsOn).toEqual(["surface-registry"]); + expect(manifest.contributes).toEqual({ tools: ["todo_write"] }); + expect(manifest.capabilities).toEqual({}); + }); + + it("extension bundles the manifest + activate", () => { + expect(extension.manifest).toBe(manifest); + expect(typeof extension.activate).toBe("function"); + }); +}); + +describe("todo activation", () => { + it("activate registers the todo_write tool", () => { + const { host, defineTool } = makeFakeHost(); + activate(host); + expect(defineTool).toHaveBeenCalledTimes(1); + const tool = defineTool.mock.calls[0]?.[0]; + if (!tool) throw new Error("no tool registered"); + expect(tool.name).toBe("todo_write"); + expect(tool.concurrencySafe).toBe(false); + }); + + it("activate registers a surface with scope 'conversation'", () => { + const { host, getProvider } = makeFakeHost(); + activate(host); + const provider = getProvider(); + if (!provider) throw new Error("no surface provider registered"); + expect(provider.catalogEntry.id).toBe("todo"); + expect(provider.catalogEntry.scope).toBe("conversation"); + expect(provider.catalogEntry.region).toBe("side"); + expect(provider.catalogEntry.title).toBe("Tasks"); + }); + + it("surface getSpec returns todos for the conversation", async () => { + const { host, defineTool, getProvider } = makeFakeHost(); + activate(host); + const tool = defineTool.mock.calls[0]?.[0]; + if (!tool) throw new Error("no tool registered"); + await tool.execute( + { todos: [{ content: "a", status: "pending" }] }, + stubCtx({ conversationId: "c1" }), + ); + const provider = getProvider(); + if (!provider) throw new Error("no surface provider registered"); + const spec = await provider.getSpec({ conversationId: "c1" }); + expect(spec.id).toBe("todo"); + expect(spec.fields).toHaveLength(1); + const field = spec.fields[0]; + if (field === undefined || field.kind !== "custom") { + throw new Error("expected a custom field"); + } + const payload = field.payload as { todos: readonly { content: string }[] }; + expect(payload.todos).toHaveLength(1); + expect(payload.todos[0]?.content).toBe("a"); + }); + + it("surface subscribe notifies on todo_write", async () => { + const { host, defineTool, getProvider } = makeFakeHost(); + activate(host); + const provider = getProvider(); + if (!provider) throw new Error("no surface provider registered"); + const calls = { value: 0 }; + const unsub = provider.subscribe?.(() => { + calls.value += 1; + }); + const tool = defineTool.mock.calls[0]?.[0]; + if (!tool) throw new Error("no tool registered"); + await tool.execute( + { todos: [{ content: "x", status: "pending" }] }, + stubCtx({ conversationId: "c1" }), + ); + expect(calls.value).toBe(1); + if (unsub) unsub(); + }); +}); diff --git a/packages/todo/src/extension.ts b/packages/todo/src/extension.ts new file mode 100644 index 0000000..80af7aa --- /dev/null +++ b/packages/todo/src/extension.ts @@ -0,0 +1,81 @@ +/** + * todo extension — owns the per-conversation task list (state + surface + the + * `todo_write` tool). Plugs into the kernel host via a manifest + + * `activate(host)`; the model maintains the list via the tool, and the surface + * renders the live list for the frontend (subscriber-notify, same pattern as + * message-queue). State is in-memory (per-process, per-conversation) — no + * persistence, mirroring message-queue. + */ + +import type { Extension, HostAPI, Manifest } from "@dispatch/kernel"; +import type { SurfaceContext, SurfaceProvider } from "@dispatch/surface-registry"; +import { surfaceRegistryHandle } from "@dispatch/surface-registry"; +import type { SurfaceSpec } from "@dispatch/ui-contract"; +import { buildTodoSpec, getTodos, TODO_SURFACE_ID, type TodoState } from "./pure.js"; +import { createTodoWriteTool } from "./tool.js"; + +export const manifest: Manifest = { + id: "todo", + name: "Todo Tool", + version: "0.0.0", + apiVersion: "^0.1.0", + trust: "bundled", + activation: "eager", + dependsOn: ["surface-registry"], + capabilities: {}, + contributes: { + tools: ["todo_write"], + }, +}; + +export function activate(host: HostAPI): void { + const registry = host.getService(surfaceRegistryHandle); + + const state: TodoState = new Map(); + const subscribers = new Set<() => void>(); + + function notify(): void { + for (const sub of subscribers) { + sub(); + } + } + + host.defineTool(createTodoWriteTool({ state, notify })); + + function getSpec(context?: SurfaceContext): SurfaceSpec { + const convId = context?.conversationId; + const todos = convId === undefined ? [] : getTodos(state, convId); + return buildTodoSpec(todos); + } + + function invoke(_actionId: string, _payload?: unknown, _context?: SurfaceContext): void { + // The todo surface is read-only: the model mutates the list via the + // `todo_write` tool; no client-facing surface actions. + } + + const provider: SurfaceProvider = { + catalogEntry: { + id: TODO_SURFACE_ID, + region: "side", + title: "Tasks", + scope: "conversation", + }, + getSpec, + invoke, + subscribe(onChange) { + subscribers.add(onChange); + return () => { + subscribers.delete(onChange); + }; + }, + }; + + registry.register(provider); + + host.logger.info("todo: registered"); +} + +export const extension: Extension = { + manifest, + activate, +}; diff --git a/packages/todo/src/format.test.ts b/packages/todo/src/format.test.ts new file mode 100644 index 0000000..bffb8a3 --- /dev/null +++ b/packages/todo/src/format.test.ts @@ -0,0 +1,18 @@ +import { describe, expect, it } from "vitest"; +import { formatTodoResult, type TodoItem } from "./pure.js"; + +describe("formatTodoResult", () => { + it("formatTodoResult: returns JSON string of the todos", () => { + const todos: TodoItem[] = [ + { content: "alpha", status: "in_progress" }, + { content: "beta", status: "pending" }, + ]; + expect(formatTodoResult(todos)).toBe(JSON.stringify(todos, null, 2)); + // spot-check it is pretty-printed JSON (indented key) + expect(formatTodoResult(todos)).toContain('"content": "alpha"'); + }); + + it('formatTodoResult: empty array returns "[]"', () => { + expect(formatTodoResult([])).toBe("[]"); + }); +}); diff --git a/packages/todo/src/index.ts b/packages/todo/src/index.ts new file mode 100644 index 0000000..320b674 --- /dev/null +++ b/packages/todo/src/index.ts @@ -0,0 +1,24 @@ +/** + * @dispatch/todo — the per-conversation task list. Owns the list state + a + * per-conversation `custom` surface; the model maintains the list via the + * `todo_write` tool (full-list replace, opencode pattern). The list is + * transient (in-memory, per-conversation); the surface is the ONLY way the + * frontend reads todo state. + */ + +export { extension, manifest } from "./extension.js"; +export { + buildTodoSpec, + clearTodos, + formatTodoResult, + getTodos, + setTodos, + TODO_RENDERER_ID, + TODO_SURFACE_ID, + type TodoItem, + type TodoState, + type TodoStatus, + type ValidationResult, + validateTodos, +} from "./pure.js"; +export { createTodoWriteTool, type TodoWriteToolDeps } from "./tool.js"; diff --git a/packages/todo/src/pure.ts b/packages/todo/src/pure.ts new file mode 100644 index 0000000..b6a6a32 --- /dev/null +++ b/packages/todo/src/pure.ts @@ -0,0 +1,148 @@ +/** + * Pure core for todo — zero I/O, zero ambient state. + * + * Every function is input → output; testable without mocks. State is a plain + * `Map<conversationId, TodoItem[]>` OWNED by the caller (the extension + * shell); the pure functions mutate it in place and return snapshots (fresh + * array copies), so a caller can never reach into or mutate live state through + * a returned value. Mirrors message-queue's `pure.ts`. + * + * The `todo_write` tool VALIDATES SHAPE ONLY — it does not enforce business + * rules like "exactly one in_progress". The tool description guides the model; + * the validator checks that `args` is an object with a `todos` array whose + * items each have a non-empty `content` and valid `status`/`priority` enums. + * (Same posture as opencode's `todowrite`.) + */ + +import type { CustomField, SurfaceSpec } from "@dispatch/ui-contract"; + +/** A todo item's lifecycle state. */ +export type TodoStatus = "pending" | "in_progress" | "completed" | "cancelled"; + +/** + * A single todo item. Identity is the array index (no `id`), per the opencode + * pattern — the model passes the FULL list each call, so position is identity. + */ +export interface TodoItem { + readonly content: string; + readonly status: TodoStatus; +} + +/** The todo store: a per-conversation map of todo lists. */ +export type TodoState = Map<string, TodoItem[]>; + +/** Surface id this extension contributes (also the manifest + catalog id). */ +export const TODO_SURFACE_ID = "todo"; +/** The custom renderer id a frontend switches on to render the todo list. */ +export const TODO_RENDERER_ID = "todo"; + +const VALID_STATUSES: ReadonlySet<string> = new Set([ + "pending", + "in_progress", + "completed", + "cancelled", +]); + +/** Result of `validateTodos`: the validated list, or an error message. */ +export type ValidationResult = TodoItem[] | { readonly error: string }; + +/** + * Validate the `todo_write` tool's input SHAPE only — not business rules (the + * tool description guides the model: "one in_progress at a time" etc.). Accepts + * an empty array (the model clears the list). Pure: no I/O, no ambient state. + */ +export function validateTodos(args: unknown): ValidationResult { + if (args === null || typeof args !== "object" || Array.isArray(args)) { + return { error: "Error: todo_write args must be an object with a `todos` array." }; + } + const todos = (args as { todos?: unknown }).todos; + if (!Array.isArray(todos)) { + return { error: "Error: `todos` must be an array." }; + } + const validated: TodoItem[] = []; + for (let i = 0; i < todos.length; i++) { + const item = todos[i]; + if (item === null || typeof item !== "object" || Array.isArray(item)) { + return { error: `Error: todos[${i}] must be an object.` }; + } + const { content, status } = item as { + content?: unknown; + status?: unknown; + }; + if (typeof content !== "string" || content.trim().length === 0) { + return { error: `Error: todos[${i}].content must be a non-empty string.` }; + } + if (typeof status !== "string" || !VALID_STATUSES.has(status)) { + return { + error: `Error: todos[${i}].status must be one of pending|in_progress|completed|cancelled.`, + }; + } + validated.push({ + content, + status: status as TodoStatus, + }); + } + return validated; +} + +/** + * Current todo-list snapshot for a conversation — a fresh array copy. Empty + * array if the conversation has no list / is unknown. Mutating the returned + * array does not affect live state (items are readonly). + */ +export function getTodos(state: TodoState, conversationId: string): TodoItem[] { + const existing = state.get(conversationId); + if (existing === undefined) return []; + return [...existing]; +} + +/** + * Replace a conversation's todo list with `todos` (full-list replace — the + * opencode pattern: the model passes the complete list each call). Mutates + * `state` and returns a fresh array copy of the new list. The caller cannot + * mutate live state through the returned value. + */ +export function setTodos( + state: TodoState, + conversationId: string, + todos: readonly TodoItem[], +): TodoItem[] { + state.set(conversationId, [...todos]); + return getTodos(state, conversationId); +} + +/** Delete a conversation's todo list. No-op if the conversation has none. */ +export function clearTodos(state: TodoState, conversationId: string): void { + state.delete(conversationId); +} + +/** + * Build the per-conversation surface spec: a single `custom` field whose + * payload is the current todo snapshot (`{ todos }`). An empty `todos` array + * (idle / cleared conversation) renders as an empty list. Pure — no I/O; the + * surface-registry re-fetches this on every notify. Mirrors `buildQueueSpec`. + */ +export function buildTodoSpec(todos: readonly TodoItem[]): SurfaceSpec { + const payload: { todos: readonly TodoItem[] } = { todos }; + const field: CustomField = { + kind: "custom", + rendererId: TODO_RENDERER_ID, + payload, + }; + return { + id: TODO_SURFACE_ID, + region: "side", + title: "Tasks", + fields: [field], + }; +} + +/** + * Format the todo list as the `todo_write` tool's result content: raw + * pretty-printed JSON (the opencode pattern — `JSON.stringify(todos, null, 2)`), + * not a custom human-readable format. The model parses prior writes from + * conversation history, so it needs no separate read tool. + */ +export function formatTodoResult(todos: readonly TodoItem[]): string { + return JSON.stringify(todos, null, 2); +} diff --git a/packages/todo/src/store.test.ts b/packages/todo/src/store.test.ts new file mode 100644 index 0000000..92130a3 --- /dev/null +++ b/packages/todo/src/store.test.ts @@ -0,0 +1,47 @@ +import { describe, expect, it } from "vitest"; +import { clearTodos, getTodos, setTodos, type TodoState } from "./pure.js"; + +describe("getTodos", () => { + it("getTodos: returns fresh array copy (not the live array)", () => { + const state: TodoState = new Map(); + setTodos(state, "c1", [{ content: "a", status: "pending" }]); + const snap = getTodos(state, "c1"); + expect(snap).toHaveLength(1); + // mutating the snapshot array does not affect live state + snap.push({ content: "evil", status: "completed" }); + expect(getTodos(state, "c1")).toHaveLength(1); + }); + + it("getTodos: empty array for unknown conversation", () => { + const state: TodoState = new Map(); + expect(getTodos(state, "nope")).toEqual([]); + }); +}); + +describe("setTodos", () => { + it("setTodos: replaces the list and returns a copy", () => { + const state: TodoState = new Map(); + setTodos(state, "c1", [{ content: "first", status: "pending" }]); + // replace with a different list + const snap = setTodos(state, "c1", [{ content: "second", status: "in_progress" }]); + expect(snap).toHaveLength(1); + const first = snap[0]; + if (first === undefined) throw new Error("expected an item"); + expect(first.content).toBe("second"); + // the previous list is gone + expect(getTodos(state, "c1").map((t) => t.content)).toEqual(["second"]); + }); +}); + +describe("clearTodos", () => { + it("clearTodos: deletes the conversation's list", () => { + const state: TodoState = new Map(); + setTodos(state, "c1", [{ content: "x", status: "pending" }]); + expect(getTodos(state, "c1")).toHaveLength(1); + clearTodos(state, "c1"); + expect(getTodos(state, "c1")).toEqual([]); + // a second clear is a no-op + clearTodos(state, "c1"); + expect(getTodos(state, "c1")).toEqual([]); + }); +}); diff --git a/packages/todo/src/tool.test.ts b/packages/todo/src/tool.test.ts new file mode 100644 index 0000000..a125786 --- /dev/null +++ b/packages/todo/src/tool.test.ts @@ -0,0 +1,101 @@ +import { createLogger, type ToolExecuteContext } from "@dispatch/kernel"; +import { describe, expect, it, vi } from "vitest"; +import { getTodos, type TodoState } from "./pure.js"; +import { createTodoWriteTool } from "./tool.js"; + +function stubCtx(overrides?: Partial<ToolExecuteContext>): ToolExecuteContext { + return { + toolCallId: "test-call-1", + onOutput: () => {}, + signal: new AbortController().signal, + log: createLogger( + { extensionId: "test" }, + { emit: () => {} }, + { now: () => 0, newId: () => "id" }, + ), + ...overrides, + }; +} + +describe("todo_write", () => { + it("todo_write: replaces list + returns JSON result", async () => { + const state: TodoState = new Map(); + const notify = vi.fn(); + const tool = createTodoWriteTool({ state, notify }); + const todos = [ + { content: "a", status: "pending" }, + { content: "b", status: "in_progress" }, + ]; + const result = await tool.execute({ todos }, stubCtx({ conversationId: "c1" })); + expect(result.isError).toBeUndefined(); + expect(result.content).toBe(JSON.stringify(todos, null, 2)); + expect(getTodos(state, "c1")).toEqual(todos); + }); + + it("todo_write: calls notify after write", async () => { + const state: TodoState = new Map(); + const notify = vi.fn(); + const tool = createTodoWriteTool({ state, notify }); + expect(notify).not.toHaveBeenCalled(); + await tool.execute( + { todos: [{ content: "x", status: "pending" }] }, + stubCtx({ conversationId: "c1" }), + ); + expect(notify).toHaveBeenCalledTimes(1); + }); + + it("todo_write: validation error returns isError", async () => { + const state: TodoState = new Map(); + const notify = vi.fn(); + const tool = createTodoWriteTool({ state, notify }); + const result = await tool.execute( + { todos: [{ content: "x", status: "bogus" }] }, + stubCtx({ conversationId: "c1" }), + ); + expect(result.isError).toBe(true); + expect(result.content).toContain("Error:"); + expect(notify).not.toHaveBeenCalled(); + }); + + it("todo_write: uses conversationId from ctx", async () => { + const state: TodoState = new Map(); + const notify = vi.fn(); + const tool = createTodoWriteTool({ state, notify }); + await tool.execute( + { todos: [{ content: "x", status: "pending" }] }, + stubCtx({ conversationId: "conv-42" }), + ); + expect(getTodos(state, "conv-42")).toHaveLength(1); + // a different conversation is unaffected + expect(getTodos(state, "conv-other")).toEqual([]); + }); + + it("todo_write: errors when conversationId is absent", async () => { + const state: TodoState = new Map(); + const notify = vi.fn(); + const tool = createTodoWriteTool({ state, notify }); + const result = await tool.execute({ todos: [{ content: "x", status: "pending" }] }, stubCtx()); + expect(result.isError).toBe(true); + expect(result.content).toBe("Error: no conversation context for todo."); + expect(notify).not.toHaveBeenCalled(); + expect(state.size).toBe(0); + }); + + it("todo_write: accepts empty array (clears list)", async () => { + const state: TodoState = new Map(); + const notify = vi.fn(); + const tool = createTodoWriteTool({ state, notify }); + // seed + await tool.execute( + { todos: [{ content: "seed", status: "pending" }] }, + stubCtx({ conversationId: "c1" }), + ); + expect(getTodos(state, "c1")).toHaveLength(1); + // clear via empty list + const result = await tool.execute({ todos: [] }, stubCtx({ conversationId: "c1" })); + expect(result.isError).toBeUndefined(); + expect(result.content).toBe("[]"); + expect(getTodos(state, "c1")).toEqual([]); + expect(notify).toHaveBeenCalledTimes(2); + }); +}); diff --git a/packages/todo/src/tool.ts b/packages/todo/src/tool.ts new file mode 100644 index 0000000..d95e949 --- /dev/null +++ b/packages/todo/src/tool.ts @@ -0,0 +1,240 @@ +/** + * `todo_write` tool factory — the imperative shell that binds the pure + * validate/set/format functions to the per-conversation state + a `notify` + * callback (which fires surface subscribers). The `conversationId` comes from + * `ctx.conversationId` (kernel contract — Wave 0 wired it through). Errors + * surface as `{ isError: true }` rather than thrown, so the model can react to + * the message (same posture as tool-shell / tool-web-search). + * + * `concurrencySafe: false` — the tool mutates per-conversation shared state + * (the in-memory list), so the kernel serializes its calls even under parallel + * dispatch. + */ + +import type { ToolContract, ToolExecuteContext, ToolResult } from "@dispatch/kernel"; +import { formatTodoResult, setTodos, type TodoState, validateTodos } from "./pure.js"; + +export interface TodoWriteToolDeps { + /** Per-conversation todo store (owned by the extension shell). */ + readonly state: TodoState; + /** Fire surface subscribers after a successful write. */ + readonly notify: () => void; +} + +const TODO_WRITE_DESCRIPTION = `Use this tool to create and manage a structured task list for your current session. This helps you track progress, organize complex tasks, and demonstrate thoroughness to the user. +It also helps the user understand the progress of the task and overall progress of their requests. + +## When to Use This Tool +Use this tool proactively in these scenarios: + +1. Complex multistep tasks - When a task requires 3 or more distinct steps or actions +2. Non-trivial and complex tasks - Tasks that require careful planning or multiple operations +3. User explicitly requests todo list - When the user directly asks you to use the todo list +4. User provides multiple tasks - When users provide a list of things to be done (numbered or comma-separated) +5. After receiving new instructions - Immediately capture user requirements as todos. Feel free to edit the todo list based on new information. +6. After completing a task - Mark it complete and add any new follow-up tasks +7. When you start working on a new task, mark the todo as in_progress. Ideally you should only have one todo as in_progress at a time. Complete existing tasks before starting new ones. + +## When NOT to Use This Tool + +Skip using this tool when: +1. There is only a single, straightforward task +2. The task is trivial and tracking it provides no organizational benefit +3. The task can be completed in less than 3 trivial steps +4. The task is purely conversational or informational + +NOTE that you should not use this tool if there is only one trivial task to do. In this case you are better off just doing the task directly. + +## Examples of When to Use the Todo List + +<example> +User: I want to add a dark mode toggle to the application settings. Make sure you run the tests and build when you're done! +Assistant: I'll help add a dark mode toggle to your application settings. Let me create a todo list to track this implementation. +*Creates todo list with the following items:* +1. Create dark mode toggle component in Settings page +2. Add dark mode state management (context/store) +3. Implement CSS-in-JS styles for dark theme +4. Update existing components to support theme switching +5. Run tests and build process, addressing any failures or errors that occur +*Begins working on the first task* + +<reasoning> +The assistant used the todo list because: +1. Adding dark mode is a multi-step feature requiring UI, state management, and styling changes +2. The user explicitly requested tests and build be run afterward +3. The assistant inferred that tests and build need to pass by adding "Ensure tests and build succeed" as the final task +</reasoning> +</example> + +<example> +User: Help me rename the function getCwd to getCurrentWorkingDirectory across my project +Assistant: Let me first search through your codebase to find all occurrences of getCwd. +*Uses search tools to locate all instances of getCwd in the codebase* +Assistant: I've found 15 instances of getCwd across 8 files. Let me create a todo list to track these changes. +*Creates todo list with specific items for each file that needs updating* + +<reasoning> +The assistant used the todo list because: +1. First, the assistant searched to understand the scope of the task +2. Upon finding multiple occurrences across different files, it determined this was a complex task with multiple steps +3. The todo list helps ensure every instance is tracked and updated systematically +4. This approach prevents missing any occurrences and maintains code consistency +</reasoning> +</example> + +<example> +User: I need to implement these features for my e-commerce site: user registration, product catalog, shopping cart, and checkout flow. +Assistant: I'll help implement these features. First, let's add all the features to the todo list. +*Creates a todo list breaking down each feature into specific tasks based on the project architecture* +Assistant: Let's start with user registration. This will involve creating database models, API endpoints, and frontend forms that integrate with your existing authentication system. + +<reasoning> +The assistant used the todo list because: +1. The user provided multiple features to implement in a comma separated list +2. The todo list helps organize these large features into manageable tasks +3. This approach allows for tracking progress across the entire implementation +</reasoning> +</example> + +<example> +User: Can you help optimize my React application? It's rendering slowly and has performance issues. +Assistant: I'll help optimize your React application. First, let me examine your codebase to identify potential performance bottlenecks. +*Reviews component structure, render patterns, state management, and data fetching* +Assistant: After analyzing your codebase, I've identified several performance issues. Let me create a todo list to track our optimization efforts. +*Creates todo list with items like: 1) Implement memoization for expensive calculations in ProductList, 2) Add virtualization for long lists in Dashboard, 3) Optimize image loading in Gallery component, 4) Fix state update loops in ShoppingCart, 5) Review bundle size and implement code splitting* +Let's start by implementing memoization for the expensive calculations in your ProductList component. + +<reasoning> +The assistant used the todo list because: +1. First, the assistant examined the codebase to identify specific performance issues +2. Based on this analysis, it identified multiple optimization opportunities +3. Performance optimization is a non-trivial task requiring multiple steps +4. The todo list helps methodically track improvements across different components +5. This systematic approach ensures all performance bottlenecks are addressed +</reasoning> +</example> + +## Examples of When NOT to Use the Todo List + +<example> +User: How do I print 'Hello World' in Python? +Assistant: In Python, you can print "Hello World" with this simple code: + +python +print("Hello World") + +This will output the text "Hello World" to the console when executed. + +<reasoning> +The assistant did not use the todo list because this is a single, trivial task that can be completed in one step. There's no need to track multiple tasks or steps for such a straightforward request. +</reasoning> +</example> + +<example> +User: What does the git status command do? +Assistant: The git status command shows the current state of your working directory and staging area. It displays which changes have been staged, which haven't, and which files aren't being tracked by Git. This is a fundamental command for understanding the current state of your repository before making commits. + +<reasoning> +This is an informational request with no actual coding task to complete. The user is simply asking for an explanation, not for the assistant to perform multiple steps or tasks. +</reasoning> +</example> + +<example> +User: Can you add a comment to the calculateTotal function to explain what it does? +Assistant: Sure, let me add a comment to the calculateTotal function to explain what it does. +* Uses the Edit tool to add a comment to the calculateTotal function * + +<reasoning> +This is a single, straightforward task confined to one location in the code. Adding a comment doesn't require tracking multiple steps or systematic organization. +</reasoning> +</example> + +<example> +User: Run npm install for me and tell me what happens. +Assistant: I'll run the npm install command for you. + +*Executes: npm install* + +The command completed successfully. Here's the output: +[Output of npm install] + +All dependencies have been installed according to your package.json file. + +<reasoning> +This is a single command execution with immediate results. There are no multiple steps to track or organize, making the todo list unnecessary for this straightforward task. +</reasoning> +</example> + +## Task States and Management + +1. **Task States**: Use these states to track progress: + - pending: Task not yet started + - in_progress: Currently working on (limit to ONE task at a time) + - completed: Task finished successfully + - cancelled: Task no longer needed + +2. **Task Management**: + - Update task status in real-time as you work + - Mark tasks complete IMMEDIATELY after finishing (don't batch) + - Only have ONE task in_progress at any time + - Complete current tasks before starting new ones + - Cancel tasks that become irrelevant + +3. **Task Breakdown**: + - Create specific, actionable items + - Break complex tasks into smaller, manageable steps + - Use clear, descriptive task names + +When in doubt, use this tool. Being proactive with task management demonstrates attentiveness and ensures you complete all requirements successfully.`; + +/** Create the `todo_write` tool, closing over the shared state + notifier. */ +export function createTodoWriteTool(deps: TodoWriteToolDeps): ToolContract { + const { state, notify } = deps; + return { + name: "todo_write", + description: TODO_WRITE_DESCRIPTION, + parameters: { + type: "object", + properties: { + todos: { + type: "array", + description: "The updated todo list (replaces the existing list).", + items: { + type: "object", + properties: { + content: { + type: "string", + description: "Brief description of the task.", + }, + status: { + type: "string", + enum: ["pending", "in_progress", "completed", "cancelled"], + description: "Current status of the task.", + }, + }, + required: ["content", "status"], + }, + }, + }, + required: ["todos"], + }, + concurrencySafe: false, + async execute(args: unknown, ctx: ToolExecuteContext): Promise<ToolResult> { + const conversationId = ctx.conversationId; + if (conversationId === undefined) { + return { content: "Error: no conversation context for todo.", isError: true }; + } + const validated = validateTodos(args); + if (!Array.isArray(validated)) { + return { content: validated.error, isError: true }; + } + const snapshot = setTodos(state, conversationId, validated); + notify(); + ctx.log.debug("todo_write: replaced list", { + conversationId, + count: snapshot.length, + }); + return { content: formatTodoResult(snapshot) }; + }, + }; +} diff --git a/packages/todo/src/validate.test.ts b/packages/todo/src/validate.test.ts new file mode 100644 index 0000000..c518041 --- /dev/null +++ b/packages/todo/src/validate.test.ts @@ -0,0 +1,66 @@ +import { describe, expect, it } from "vitest"; +import { validateTodos } from "./pure.js"; + +describe("validateTodos", () => { + it("validateTodos: accepts valid list", () => { + const result = validateTodos({ + todos: [ + { content: "Do thing A", status: "pending" }, + { content: "Do thing B", status: "in_progress" }, + { content: "Do thing C", status: "completed" }, + ], + }); + expect(result).toEqual([ + { content: "Do thing A", status: "pending" }, + { content: "Do thing B", status: "in_progress" }, + { content: "Do thing C", status: "completed" }, + ]); + }); + + it("validateTodos: accepts empty array (clears the list)", () => { + const result = validateTodos({ todos: [] }); + expect(result).toEqual([]); + }); + + it("validateTodos: rejects invalid status", () => { + const result = validateTodos({ + todos: [{ content: "x", status: "bogus" }], + }); + expect(result).toHaveProperty("error"); + }); + + it("validateTodos: rejects empty content", () => { + const empty = validateTodos({ + todos: [{ content: "", status: "pending" }], + }); + expect(empty).toHaveProperty("error"); + const whitespace = validateTodos({ + todos: [{ content: " ", status: "pending" }], + }); + expect(whitespace).toHaveProperty("error"); + }); + + it("validateTodos: rejects non-array todos", () => { + const result = validateTodos({ todos: "not-an-array" }); + expect(result).toHaveProperty("error"); + }); + + it("validateTodos: rejects null/non-object args", () => { + expect(validateTodos(null)).toHaveProperty("error"); + expect(validateTodos(undefined)).toHaveProperty("error"); + expect(validateTodos("string")).toHaveProperty("error"); + expect(validateTodos(42)).toHaveProperty("error"); + expect(validateTodos([])).toHaveProperty("error"); + }); + + it("validateTodos: does NOT enforce one in_progress (allows multiple — description guides the model)", () => { + const result = validateTodos({ + todos: [ + { content: "a", status: "in_progress" }, + { content: "b", status: "in_progress" }, + ], + }); + expect(Array.isArray(result)).toBe(true); + expect(result).toHaveLength(2); + }); +}); diff --git a/packages/todo/tsconfig.json b/packages/todo/tsconfig.json new file mode 100644 index 0000000..102c8f0 --- /dev/null +++ b/packages/todo/tsconfig.json @@ -0,0 +1,10 @@ +{ + "extends": "../../tsconfig.base.json", + "compilerOptions": { "rootDir": "src", "outDir": "dist", "composite": true }, + "include": ["src/**/*.ts"], + "references": [ + { "path": "../kernel" }, + { "path": "../surface-registry" }, + { "path": "../ui-contract" } + ] +} @@ -5,7 +5,7 @@ > Keep this lean and current; do not let it re-accrete a step-by-step changelog. ## Status (current) -`tsc -b` EXIT 0 · biome clean · **1097 vitest + 199 transport bun green**. +`tsc -b` EXIT 0 · biome clean · **1123 vitest + 199 transport bun green**. Built and verified live (full-fidelity: every feature is a manifest-loaded extension through the host): @@ -468,6 +468,26 @@ for per-request timeout + caller cancellation. `concurrencySafe: true`, `capabil - **LIVE-VERIFIED:** the dev stack (umans-glm-5.2) called `web_search` → Firecrawl returned real results (Paris, France) — first live Umans API call too. +## todo tool — per-conversation task list + surface (DONE) +Standard tool extension with a single `todo_write` tool (opencode `todowrite` pattern: +full-list replace, returns JSON, no business-rule enforcement — the description guides +the model). Per-conversation in-memory state (`Map<conversationId, TodoItem[]>`). Per- +conversation surface (`rendererId: "todo"`, `scope: "conversation"`) via subscriber-notify +(message-queue pattern). `concurrencySafe: false` (mutates shared state). +- **Wave 0 (orchestrator, kernel contract):** added `conversationId?: string` to + `ToolExecuteContext` (additive, backward-compatible). Wired in `dispatch.ts` — the + kernel already had `conversationId` as a parameter, just wasn't passing it through to + the tool context. 170 kernel tests pass. +- **Wave 1 (todo extension):** pure core (`validateTodos` — shape only; `getTodos`/ + `setTodos`/`clearTodos` — fresh array copies; `buildTodoSpec`; `formatTodoResult` → + `JSON.stringify`). Shell: `createTodoWriteTool({ state, notify })` + surface provider. + 26 tests. Report: `reports/todo.md`. +- **Wave 2 (host-bin wiring):** registered `todo` in `CORE_EXTENSIONS` + dep + root tsconfig + ref. 28 host-bin tests. +- Verified: full-graph `tsc -b` EXIT 0, biome clean (314 files), **1123 vitest** pass. + **Boot smoke:** `"todo: registered"` + activated. +- [ ] Live-verify (model uses `todo_write` in a real turn — the dev stack has it loaded). + ## Open items - **Context window LIMIT (deferred, sibling of context size):** expose the selected model's max context-window token limit so the FE can render `contextSize / limit` (e.g. `1286 / 200000`). @@ -484,6 +504,11 @@ for per-request timeout + caller cancellation. `concurrencySafe: true`, `capabil §12). The scheduler mechanism (`host.scheduler.register`) already exists. - **D8 `prompt.assembly` segments:** deferred-by-design (await the context-filter chain). +- **In-memory state persistence (message queue + todo list):** both the message + queue and the todo list are in-memory only (`Map<conversationId, …>` in the + extension's `activate`). Neither persists across server restarts. If persistence + is needed later, both would write through `host.storage` (the conversation-store + pattern: separate key space per feature, append/write per conversation). ## Roadmap 1. **Web frontend** (in progress, SEPARATE repo `../dispatch-web`; Svelte + @@ -516,9 +541,7 @@ for per-request timeout + caller cancellation. `concurrencySafe: true`, `capabil conversation" push (only the inverse, `POST /conversations/:id/close`). Needs a new broadcast WS op + endpoint/flag here, and FE handling couriered to `../dispatch-web`. -5. **`todo` tool** — a per-conversation task-list tool the model maintains - (like opencode's todowrite/todoread), as a standard tool extension; likely a - surface so the FE can render the live list. + 5. ~~**`todo` tool**~~ — **DONE** (see milestone section above). 6. ~~**`web_search` tool**~~ — **DONE** (see milestone section above). 7. **Message queue — close-with-queued-messages (deferred product decision):** if a client closes a conversation (`POST /conversations/:id/close`) while the diff --git a/tsconfig.json b/tsconfig.json index d084acb..2b94438 100644 --- a/tsconfig.json +++ b/tsconfig.json @@ -16,6 +16,7 @@ { "path": "./packages/credential-store" }, { "path": "./packages/conversation-store" }, { "path": "./packages/throughput-store" }, + { "path": "./packages/todo" }, { "path": "./packages/session-orchestrator" }, { "path": "./packages/transport-http" }, { "path": "./packages/tool-read-file" }, |
