summaryrefslogtreecommitdiffhomepage
path: root/packages/kernel/src/runtime/run-turn.ts
AgeCommit message (Collapse)Author
3 daysfix(kernel): disable MAX_STEPS limit (0 = unlimited)Adam Malczewski
Agents were being cut off mid-task at 50 steps. The MAX_STEPS=50 hardcoded limit was silently terminating turns while the model was actively making tool calls, leaving conversations idle with a dangling tool-result as the last chunk. Setting MAX_STEPS to 0 disables the limit — the loop runs until the model stops making tool calls naturally or the abort signal fires. The max-steps code path is preserved for when MAX_STEPS > 0.
4 daysMerge branch 'dev' into feature/ssh-supportAdam Malczewski
Brings dev's retry-with-backoff (the transient `provider-retry` AgentEvent the web frontend consumes) + the LSP-dead-server per-edit-hang fix into the SSH feature branch, alongside the SSH waves 0-5c. All code files auto-merged cleanly (run-turn.ts, orchestrator.ts, runtime.ts, wire/index.ts, tool-edit-file/extension.ts, run-turn.test.ts — both computerId threading and retry-with-backoff coexist). Only tasks.md conflicted (status section — orchestrator-resolved; both feature sections kept). Verified post-merge: tsc -b EXIT 0, biome clean (391 files), 1730 vitest pass +6 sshd-integration skipped (was 1690; +40 from dev's retry/LSP tests). Wire dist rebuilt so the FE can re-sync the pinned @dispatch/wire dep and pick up BOTH provider-retry AND the SSH Computer/defaultComputerId types. No merge or push (into dev or otherwise).
4 daysfeat(kernel): retry-with-backoff on retryable provider errorsAdam Malczewski
When the upstream LLM API returns a retryable error (HTTP 429 / 5xx "overloaded"), the kernel now retries provider.stream() with a stepped backoff, visibly, until the 8h cumulative-sleep budget is exhausted — then emits the final error and seals the turn. Retries fire only when no content was emitted yet this step (safety invariant: never duplicate partial output). - wire: new transient TurnProviderRetryEvent AgentEvent variant (emitted before each sleep; not persisted to model history). - kernel contracts: RetryStrategy (pure delayFor + injected sleep) + optional retry? on RunTurnInput (omit = no retry, backward-compatible). - kernel run-turn: retry loop in executeStep; providerRetryEvent constructor. Kernel imports no timer (sleep injected). - session-orchestrator: concrete schedule (5s..30m, repeat 30m, 8h budget) + abortable setTimeout sleep, wired into RunTurnInput.retry. tsc -b EXIT 0; biome clean; 1574 vitest pass (+16 new: 11 kernel retry tests with injected fake sleep + pure delayFor, zero @dispatch/* mocks; 5 schedule tests). Transports unchanged (transport-ws forwards AgentEvent verbatim in chat.delta; transport-http is generic JSON.stringify). Plan: notes/retry-with-backoff-plan.md. tasks.md updated with milestone + optional CLI-renderer roadmap follow-up.
4 daysfeat(ssh): wave 1 — ExecBackend + computer data model + runtime threadingAdam Malczewski
Wave 1 of transparent SSH support (parallel owner-agents on disjoint packages, plus the orchestrator-authored kernel contract seam from wave 0): - packages/wire: + Computer/ComputerEntry (read-only view over ~/.ssh/config Host aliases) + Workspace.defaultComputerId (string|null, null=local). Types only; 3 conformance tests. - packages/exec-backend (NEW core extension): the ExecBackend abstraction (spawn + minimal fs surface) the bundled tools will program against instead of node:fs/child_process. LocalExecBackend wraps today's node calls (behavior-identical; node:fs-style .code errors). execBackendHandle + ExecBackendResolver (sync; computerId undefined -> local; set -> throws until the ssh package wires remote resolution in wave 5). 20 tests. - packages/kernel (runtime only): thread computerId through dispatch.ts + run-turn.ts exactly as cwd is threaded (opaque, forwarded to ToolExecuteContext; absent = local = byte-identical to today). +2 tests. - packages/conversation-store: computer (SSH alias) assignment + resolution mirroring cwd — WorkspaceRow.defaultComputerId + setWorkspaceDefaultComputerId + getComputerId/setComputerId/clearComputerId + getEffectiveComputer (override -> per-conv -> workspace default -> null/local). Fixes the 3 Workspace literal sites the new required wire field broke. +18 tests. - orchestrator: root tsconfig.json ref for exec-backend + bun install. Verified: tsc -b EXIT 0, biome clean, 1592 vitest pass (was 1549, +43). Refs: notes/ssh-support-plan.md (decisions §0.5/§13). No merge or push.
5 daysfix(kernel+tool-shell): abort hanging tool calls without bricking the ↵Adam Malczewski
conversation kernel: executeToolCall now races tool.execute against the abort signal via Promise.race; on abort resolves (not rejects) with an "Aborted" result so the step completes normally → finishReason "aborted" → turn seals cleanly (done event) → finally clears activeTurns → conversation freed, next message accepted. run-turn strips tool-call chunks from the assistant message on abort (keeps text/thinking) and omits tool-result messages to avoid persisting dangling tool calls that would 400 the provider next turn. tool-shell: realSpawn spawns detached (own process group); on abort AND timeout kills the entire group (process.kill(-pgid, SIGKILL)) and resolves immediately — no child.on("close") dependency, so a grandchild holding the pipes can't stall the spawn promise or leak. Also: ORCHESTRATOR.md migrated to dispatch CLI summon mechanism; .skills summary; bin/sync-env PATH injection; frontend handoff docs. 1453 vitest pass · tsc -b EXIT 0 · biome clean.
7 daysfeat: incremental seq assignment during generation (CR-6)Adam Malczewski
The backend now persists chunks at step boundaries during generation, not only at turn-seal. This enables the FE to syncTail mid-turn and pick up committed, seq'd chunks (eliminating the provisional state). Changes: - RunTurnInput: add onStepComplete callback (kernel contract) - runTurn: call onStepComplete after each step's messages are finalized - Orchestrator: persist userMsg at turn start + each step's messages via onStepComplete. Falls back to batch persist if callback isn't called (backward compatible with test fakes). The user message gets seq numbers before the first step generates. Each step's assistant + tool messages get seq numbers as they complete. The FE's existing syncTail (?sinceSeq=N) picks them up during generation. Also adds backend-to-fe-handoff.md with CR-6 response + full endpoint list.
9 daysfeat(message-queue): per-conversation queue + steering injectionAdam Malczewski
A per-conversation message queue (new message-queue extension) holds user messages enqueued while a turn generates; delivered mid-turn as steering at the tool-result boundary (or carried to a new turn if no tool call fires). - kernel: RunTurnInput.drainSteering callback (generic; kernel stays pure) - wire 0.7.0->0.8.0: QueuedMessage, QueuePayload, TurnSteeringEvent (additive) - transport-contract 0.11.0->0.12.0: POST /conversations/:id/queue + chat.queue WS op - message-queue ext: queue state + per-conversation custom surface (rendererId message-queue) - session-orchestrator: enqueue facade + drainSteering wiring + post-seal carry - transport-http/ws: queue endpoint + chat.queue op (fixes WsClientMessage exhaustive switch) - host-bin: register message-queue 1043 vitest + 199 transport bun pass; tsc/biome clean; boot smoke clean. FE courier: frontend-message-queue-handoff.md.
2026-06-12feat(metrics): expose current context size to the frontendAdam Malczewski
contextSize = the turn's FINAL step inputTokens+outputTokens (true context occupancy; NOT the aggregate usage, which sums per-step prompts and overcounts multi-step turns). Stamped on both the live done event (kernel) and persisted TurnMetrics (session-orchestrator); a client reads the latest turn's value. - @dispatch/wire 0.4.0->0.5.0: optional contextSize on TurnDoneEvent + TurnMetrics - @dispatch/transport-contract 0.5.0->0.6.0 (re-export only) - glossary: context size (reserve 'context window' for the model limit, later) - FE courier: frontend-context-size-handoff.md 881 vitest pass; tsc -b EXIT 0; biome clean.
2026-06-10kernel/run-turn: thread providerOpts (model) into provider.streamAdam Malczewski
executeStep built the stream opts with only the logger, so providerOpts.model (the selected model) never reached any provider — each fell back to its own default. Carry providerOpts through StepContext into the per-step stream opts, plus a regression test asserting the model is forwarded.
2026-06-10feat(metrics): durable per-turn/step token+timing metrics (observability ↵Adam Malczewski
spans + persisted replay) Two-part token-data improvement: #2 Observability spans (kernel run-turn): turn & step span-close now stamp ALL four Usage fields — added usage.cacheReadTokens/cacheWriteTokens (were silently dropped) and normalized usage_* -> usage.* to match the provider.request span (consistent D9 GROUP BY). No contract change. #3 Persisted replay metrics (conversation-store + read endpoint): new StepMetrics/TurnMetrics wire types; conversation-store persists per-turn metrics in a separate key space (appendMetrics/loadMetrics, turn-append order); session-orchestrator accumulates per-step+turn metrics from the event stream (pure metrics.ts) and persists after seal; transport-http serves GET /conversations/:id/metrics -> ConversationMetricsResponse. Contracts: @dispatch/wire + @dispatch/transport-contract bumped 0.3.0->0.4.0 (additive). GLOSSARY: turn metrics / step metrics. typecheck EXIT 0, biome clean, 546 vitest + 89 bun = 635 tests.
2026-06-07feat(wire,kernel,session-orchestrator): live turn metrics on the streamAdam Malczewski
Expose the backend's authoritative token+timing metrics on the live AgentEvent stream (observability-only -> now also client-facing). All additive/optional. - [email protected]: new TurnStepCompleteEvent (type:step-complete) with per-step ttftMs/decodeMs/genTotalMs; usage += stepId; tool-result += durationMs (exec); done += durationMs (turn wall-clock) + usage (turn total). RunTurnInput += now?. [email protected] (re-export bump). - kernel-runtime: when now injected, measures + emits the above (reuses the ttft/decode first-token detection); omits timing gracefully without a clock. - session-orchestrator: adds now? to deps, threads into RunTurnInput; extension activate injects () => Date.now(). - transport/cli/host-bin: untouched (verbatim pass-through; additive fields). FE handoff: frontend-metrics-handoff.md. typecheck clean; 520 vitest + 89 bun; biome 0/0. Replay/persistence = deferred Pass 2 (documented in tasks.md).
2026-06-07feat(kernel-runtime): per-step TTFT + decode timing spans (observability)Adam Malczewski
Split each step's generation into a ttft span (stream start -> first text|reasoning token) and a decode span (first token -> stream end), children of the step span. decode = generation total - TTFT; both retrievable from the trace-store. First token counts reasoning deltas; a step with no content token ends ttft with firstToken:false (no misleading decode). Span-based (no clock injection), no wire/contract change. +3 runtime tests. GLOSSARY: TTFT + decode time. typecheck clean; 512 vitest; biome 0/0.
2026-06-07feat(wire,kernel,conversation-store): step grouping via stepId for batched ↵Adam Malczewski
tool calls Expose a per-step grouping key so a client can render a model's batched/parallel tool calls (those emitted in one step) as one unit, on both the live stream and replayed history. Key = branded StepId, derived turnId#stepIndex (0-based). - [email protected]: required stepId on Turn{Tool,ToolResult}Event; optional stepId on Tool{Call,Result}Chunk (generation provenance on the chunk, not the StoredChunk envelope — StoredChunk unchanged). [email protected] (re-export bump). - kernel-runtime: mint stepId per step; stamp on tool chunks + tool events. - conversation-store: chunk-carried stepId round-trips append/load/loadSince for free; reconcile copies it onto synthesized (interrupted) results. - cli: stepId added to event test fixtures (renderer unchanged). typecheck clean; 509 vitest + 89 bun; biome 0/0. FE courier reply + reference snapshots regenerated in ../dispatch-web.
2026-06-06feat(kernel-runtime,session-orchestrator): emit turn lifecycle eventsAdam Malczewski
Close a gap found live: neither transport emitted turn-start/done/turn-sealed (the wire defined them; nothing fired them). turn-sealed is the FE's cache-commit signal (frontend-design §6.3); done ends the stream. - kernel-runtime: runTurn emits turn-start first and done (with finishReason) last, on every exit path (stop/tool-calls/max-steps/error/aborted). - session-orchestrator: emits turn-sealed after conversationStore.append succeeds (the kernel touches no DB, so the post-persist seal is the orchestrator's). Not emitted if append throws. No contract change (all three wire types already existed). Verified live: HTTP /chat and WS chat both stream turn-start … done turn-sealed. typecheck clean, 494 vitest + 80 bun, biome clean.
2026-06-05feat(kernel): listModels/ModelInfo + per-turn cwd contracts; add ↵Adam Malczewski
transport-contract wire package
2026-06-05fix(observability): nest turn/step/prompt/provider.request spans into a tree ↵Adam Malczewski
(+ buildSpanOpen parent propagation) run-turn: step is now turnSpan.child; prompt/provider.request/tool-call are step's children (stepSpan.log passed into provider.stream). logger.ts: buildSpanOpen now propagates the child's computed parentSpanId onto the span-open record — a latent bug where span.child(...) never set parentSpanId on open (close was already correct). Verified: tsc -b clean, 279 tests, biome 0/0. Live: span tree turn->step->{prompt,provider.request}; the trace CLI easy-view renders the nesting.
2026-06-05refactor(observability): pure-types contracts/logging + Span body channel; ↵Adam Malczewski
verbatim before/after -> LogRecord.body (273 tests) contracts/logging.ts reduced to pure types; createLogger (+ helpers) moved to kernel/src/logging/ — @dispatch/kernel still exports it (host-bin/tool-read-file unaffected). Span body channel (Option A): Logger.span / Span.child / Span.end accept an optional body string -> SpanOpenRecord.body / SpanCloseRecord.body. Large verbatim payloads now use body, not stringified attributes (store-fat-serve-thin; attributes stay thin/queryable for D9). before: run-turn emits a 'prompt' span with the verbatim messages+tools in body (small scalars in attrs). after: provider.request span carries the verbatim request in body; attrs thin, auth self-redacted. Verified: tsc -b clean, 273 tests, biome 0 warnings/0 infos. Live boot: prompt + provider.request bodies present and correlated (shared turnId); request.body no longer in attributes; auth-key leak count = 0.
2026-06-05feat(observability): Phase A.2 — verbatim provider.request "after" capture ↵Adam Malczewski
+ self-redaction (267 tests) Threads the step span's correlated logger into provider.stream (new optional ProviderStreamOptions.logger) so provider-openai-compat opens a child provider.request span at the fetch edge, capturing the verbatim post-transform request + response status/cache-tokens/raw-error. Auth header self-redacted in the provider's OWN code (graduated mask tiers; no shared helper). Capture is fail-safe (never throws into the turn). Adds the first hermetic provider HTTP test (stream.test.ts: fetch mocked, 15 cases). Large payloads use attributes for now; the LogRecord.body channel is a deferred ABI design (notes §10). Verified: tsc -b clean, 267 tests (250->+17), biome 0 warnings/0 infos. Live boot: provider.request shares turnId with prompt:before (before<->after diffable); auth-key leak count = 0 (self-redaction proven on a real request).
2026-06-05feat(observability): Phase A logging substrate — Logger/Span ABI + journal ↵Adam Malczewski
sink (250 tests) Structured, agent-first logging captured durably to an append-only journal file. Kernel (contracts/logging.ts): leveled/attributed Logger + Span, auto-scoped per extension (host stamps manifest.id, unspoofable), incremental span records (open/close) for crash-reconstructable traces, injected LogSink (pure record-builder). ctx.log on ToolContract; runTurn opens turn/step/tool-call spans and captures the verbatim pre-mutation prompt (the 'before') on the step span. journal-sink (new package, bootstrap dep — not an extension): LogSink appending NDJSON to a rotating journal; pure serialize + thin fs edge; fail-safe drop, never blocks a turn. host-bin injects it via HostDeps; session-orchestrator threads host.logger (childed per turn) into runTurn. Redaction is per-extension self-redaction (no shared helper — isolation over DRY). The out-of-process collector + SQLite store + the verbatim 'after' provider.request capture are Phase B / next (notes/observability-design.md §10/§11). Verified: tsc -b clean, 250 tests (218→+32), biome clean. Live boot: a turn's journal holds host logs + turn/step spans (open+close) + the prompt:before record with the verbatim messages array. Harness: ORCHESTRATOR §3 rule-scoping map; .dispatch/rules/isolation-over-dry.md; notes/observability-design.md (design D1–D10 + Phase A/B plan).
2026-06-05refactor(kernel): rename tabId → conversationId across contracts + ↵Adam Malczewski
consumers (218 tests) Step 4 of the post-MVP backlog: resolve the last vocab drift. The canonical term for a thread of turns is `conversationId` (GLOSSARY), but `AgentEvent` variants and `RunTurnInput` still used the legacy `tabId` from the old frontend "tab" concept, with session-orchestrator bridging `conversationId → tabId`. Atomic, type-driven rename across the full 10-file consumer set: - contracts/events.ts: all 11 AgentEvent variants tabId → conversationId - contracts/runtime.ts: RunTurnInput.tabId → conversationId - runtime/{events,run-turn,dispatch}.ts: factory params, ctx field, locals - session-orchestrator: drop the redundant `tabId: conversationId` bridge line - transport-http: emit wiring; external /chat field + X-Conversation-Id header unchanged (already canonical) — only the emitted NDJSON event field flips - tests (run-turn, app, logic): inputs + assertions now use conversationId Pure rename, zero behavior change: typecheck clean, 218 tests pass (unchanged count), biome clean, `grep tabId packages/` → zero matches. Verified live: multi-turn curl emits conversationId-keyed NDJSON and threads history correctly. GLOSSARY drift note removed. Closes the post-MVP backlog (Steps 1–4).
2026-06-04fix(kernel): expose getProviders/getTools on HostAPI (CR-2) + runTurn uses ↵Adam Malczewski
input tabId/turnId (CR-3); simplify orchestrator wiring (167 tests)
2026-06-04feat(kernel): runTurn turn loop — tool dispatch policy ↵Adam Malczewski
(eager/semaphore/dedup/concurrencySafe/abort), 16 tests