summaryrefslogtreecommitdiffhomepage
path: root/research
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-05-19 19:40:21 +0900
committerAdam Malczewski <[email protected]>2026-05-19 19:40:21 +0900
commitf78a91c20f658dd404277919a0b872b352c99bb6 (patch)
tree58cfffb655da4443f4b7a39543b86f988f15239f /research
downloaddispatch-f78a91c20f658dd404277919a0b872b352c99bb6.tar.gz
dispatch-f78a91c20f658dd404277919a0b872b352c99bb6.zip
Phase 1: single agent + basic UIHEADmain
- Bun monorepo with @dispatch/core, @dispatch/api, @dispatch/frontend - Agent runtime with Vercel AI SDK, streaming via WebSocket - Tools: read_file, write_file, list_files (scoped to working directory) - Hono API server with POST /chat, GET /status, GET /health, WS /ws - Svelte 5 + DaisyUI frontend with chat UI, theme switcher, copy button - OpenCode Go (Zen) as LLM provider, deepseek-v4-flash-free model - Docker setup (dev + prod) with bin/ scripts and gopass secrets - Biome v2 linting/formatting, Vitest tests (44 passing) - Debug info attached to error messages for diagnostics
Diffstat (limited to 'research')
-rw-r--r--research/ai-coding-assistants-established.md345
-rw-r--r--research/ai-coding-assistants-newer.md451
-rw-r--r--research/emerging-specialized.md473
-rw-r--r--research/general-agent-platforms.md299
-rw-r--r--research/multi-agent-orchestration.md235
-rw-r--r--research/multi-agent-roleplay.md439
-rw-r--r--research/pi-dev-harness.md322
7 files changed, 2564 insertions, 0 deletions
diff --git a/research/ai-coding-assistants-established.md b/research/ai-coding-assistants-established.md
new file mode 100644
index 0000000..7833798
--- /dev/null
+++ b/research/ai-coding-assistants-established.md
@@ -0,0 +1,345 @@
+# Subagent Report: Established AI Coding Assistants (Aider, OpenHands, SWE-agent)
+
+## Research Summary
+
+This report evaluates three open-source AI coding assistants — **Aider**, **OpenHands** (formerly OpenDevin), and **SWE-agent** — against the Dispatch agent harness requirements. Aider is a mature, single-agent CLI tool focused on AI pair programming within git repos. OpenHands is a comprehensive SDK and platform for building coding agents, offering the richest feature set (skills system, security policies, state persistence, provider abstraction). SWE-agent is an academic research tool optimized for autonomous GitHub issue resolution on SWE-bench, now superseded by mini-SWE-agent. None of the three frameworks natively support the three-layer dispatch->orchestrator->subagent hierarchy that Dispatch requires, but OpenHands' SDK architecture provides the most extensible foundation for building such a system.
+
+---
+
+## Findings
+
+## 1. Aider
+
+### Core Architecture
+
+Aider is a **single-agent** AI pair programming tool that operates as an interactive CLI. It supports no multi-agent hierarchy — the user talks to one instance, which uses one LLM (or two in architect mode). Its architecture consists of a `Coder` class that manages file editing, a `Model` class for LLM interaction, and an `InputOutput` class for user I/O. Aider runs in the user's terminal connected to their local git repository.
+
+- Language: **Python** (80%), with CSS, Shell, Tree-sitter Query, JavaScript, HTML
+- GitHub: **45k stars**, 4.4k forks, **13,135 commits**, 93 releases (latest v0.86.0, Aug 9, 2025)
+- Primary use case: AI pair programming in the terminal — edit code through natural language conversations
+- Architect mode: Uses a two-model sequential pipeline (architect proposes changes, editor applies them), but this is not a hierarchy — it's a sequential two-step within a single session
+- [Source: Aider GitHub repo](https://github.com/Aider-AI/aider)
+
+### Key Features Identified
+
+- **Provider-agnostic LLM**: Supports OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Ollama, local models, and 15+ more through model aliases and `--model` flag
+- **Git integration**: Automatically commits all AI changes with sensible commit messages, supports `/undo`
+- **Repository map**: Builds a compressed map of the codebase for better context in larger projects
+- **Lint/test integration**: Runs linters (per-language configurable via `--lint-cmd`) and tests after edits; automatically fixes detected errors
+- **Chat modes**: `code`, `ask`, `architect`, `help`, `context` modes for different interaction styles
+- **In-chat commands**: 30+ slash commands including `/model`, `/editor-model`, `/add`, `/drop`, `/clear`, `/reset`, `/run`, `/test`, `/save`
+- **IDE integration**: `--watch-files` mode watches for AI-prefixed comments in any editor
+- **Conventions system**: Markdown files loaded as read-only context (e.g., `CONVENTIONS.md`) to guide LLM behavior
+- **Scripting**: Python API via `Coder.create()` + `coder.run()` and CLI with `--message` flag
+- **Model switching mid-conversation**: `/model` and `/editor-model` commands allow switching LLMs mid-session
+
+### Checklist Evaluation
+
+| # | Requirement | Status | Evidence |
+|---|-------------|--------|----------|
+| 1 | **Three-layer hierarchy** | Not supported | Aider is a single-agent system. Architect mode uses 2 models sequentially, not as a hierarchy. No dispatch/orchestrator/subagent layers exist. |
+| 2 | **Config-driven orchestrators** | Not supported | Orchestrators do not exist. Configuration (`.aider.conf.yml`) is for tool-level settings, not agent type definitions. |
+| 3 | **Parallel subagent execution** | Not supported | Single agent, single-threaded. No concept of subagents or parallel execution. |
+| 4 | **Strict hierarchy communication** | Not supported | No hierarchy exists. Communication is user<->aider only. |
+| 5 | **User-to-agent messaging mid-execution** | Partial | The user can interrupt with Ctrl-C and send new messages, but cannot route messages to specific agents in a hierarchy (no hierarchy exists). Messages are delivered immediately to the single active agent. |
+| 6 | **Conflict prevention** | Not supported | Single-agent design — no parallel agents to conflict with. |
+| 7 | **Role-scoped tooling** | Not supported | All agents (single instance) have the same tool set. No role-based tool differentiation. |
+| 8 | **Skills system** | Partial | Conventions files (`CONVENTIONS.md`) provide markdown instructions loaded into context, but there is no formal directory-based organizational system (no `~/.skills/` or `<project>/.skills/` structure). Skills are manual file references. |
+| 9 | **LSP integration** | Partial | Aider has linting via `--lint-cmd` (per-language linters) but this is shell-based linting, not LSP protocol integration. No Language Server Protocol support for real-time diagnostics. [Source: Linting and testing docs](https://aider.chat/docs/usage/lint-test.html) |
+| 10 | **Shell access with directory permissions** | Not supported | `/run` command provides shell access but with no directory-level permission controls. No auto-allow lists or out-of-scope prompts. |
+| 11 | **Session management** | Partial | `/model` and `/editor-model` allow model switching mid-conversation. `/save` saves file list. But no chat forking, no loading/resuming old chats. History is ephemeral per session. |
+| 12 | **Human-in-the-loop checkpoints** | Partial | `--yes` flag auto-accepts all confirmation. `/undo` reverts changes. But no configurable checkpoints (e.g., "pause after planning for approval"). |
+| 13 | **State persistence** | Not supported | Git commits persist code changes. But session state, conversation history, and plans are NOT persisted across restarts. |
+| 14 | **Provider-agnostic LLM** | Full | Supports OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Ollama, 15+ others via `--model` flag and model aliases. Abstract model interface. [Source: LLMs docs](https://aider.chat/docs/llms.html) |
+| 15 | **Multiple interfaces** | Partial | CLI only. Can be scripted via Python API and `--message` flag for non-interactive use, but no API server, no TUI (beyond the terminal prompt), no web UI. |
+
+### Key Quotes
+
+> "Aider lets you pair program with LLMs to start a new project or build on your existing codebase." [Source: Aider README](https://github.com/Aider-AI/aider)
+
+> "Aider can connect to almost any LLM, including local models." [Source: Aider README](https://github.com/Aider-AI/aider)
+
+---
+
+## 2. OpenHands (formerly OpenDevin)
+
+### Core Architecture
+
+OpenHands has evolved from a monolithic application into a **four-package SDK architecture**. The Software Agent SDK (`openhands.sdk`) is the foundation, providing core agent framework, LLM abstraction, tool system, workspace management, conversation state, skills system, and security analysis. On top of this sit the OpenHands CLI, GUI (React), Cloud, and Enterprise offerings.
+
+The architecture is: User Interface (CLI/GUI/Cloud) -> Conversation -> Agent (reasoning-action loop) -> Tools/LLM. The system is **stateless and event-driven**, with each agent step being atomic and interruptible.
+
+- Language: **Python (62.3%)** + TypeScript (35.9%) for frontend
+- GitHub: **74.1k stars**, 9.4k forks, **6,737 commits**, 102 releases (latest v1.7.0, May 1, 2026)
+- Primary use case: AI-driven development platform for building and running coding agents at scale
+- Four packages: `openhands.sdk` (core), `openhands.tools` (pre-built tools), `openhands.workspace` (Docker/remote exec), `openhands.agent_server` (FastAPI + WebSocket server)
+- Two deployment modes: Local (in-process) and Production (containerized with agent-server)
+- [Source: OpenHands GitHub repo](https://github.com/OpenHands/OpenHands)
+- [Source: SDK Architecture docs](https://docs.openhands.dev/sdk/arch/overview)
+
+### Key Features Identified
+
+- **Skills system**: Sophisticated three-type system — Repository skills (always-active, from `AGENTS.md`), Knowledge skills (keyword-triggered), Task skills (triggered with structured inputs). Supports MCP tool integration and dynamic content via inline shell commands. Skills are markdown files with YAML frontmatter. [Source: Skill docs](https://docs.openhands.dev/sdk/arch/skill)
+- **Provider-agnostic LLM**: Via LiteLLM, supporting 100+ providers. Configurable via environment variables, JSON, or programmatic. Supports both Chat Completions and OpenAI Responses API. [Source: LLM docs](https://docs.openhands.dev/sdk/arch/llm)
+- **Security system**: Pluggable security analyzers (NoOp or LLM-based) with risk levels (LOW/MEDIUM/HIGH/UNKNOWN) and configurable confirmation policies (AlwaysConfirm, NeverConfirm, ConfirmRisky). Actions can be blocked, require confirmation, or auto-execute based on risk. [Source: Security docs](https://docs.openhands.dev/sdk/arch/security)
+- **Custom tools**: Typed Action/Observation/Executor pattern for creating custom tools. Factory functions support shared executors across tools. [Source: Custom Tools docs](https://docs.openhands.dev/sdk/guides/custom-tools)
+- **State persistence**: Auto-save and resume with debounced writes and incremental events. [Source: Conversation docs](https://docs.openhands.dev/sdk/arch/conversation)
+- **Conversation management**: Two conversation types — LocalConversation (in-process) and RemoteConversation (via HTTP/WebSocket). Factory pattern auto-selects based on workspace type.
+- **Event-driven architecture**: Typed event framework with ActionEvent, ObservationEvent, MessageEvent, StateUpdateEvent. Immutable append-only event log.
+- **Context management**: Condenser system for compressing conversation history when token limits are approached.
+- **MCP integration**: Model Context Protocol servers can be spawned and managed through repository skills.
+
+### Checklist Evaluation
+
+| # | Requirement | Status | Evidence |
+|---|-------------|--------|----------|
+| 1 | **Three-layer hierarchy** | Not supported | OpenHands has a single Conversation->Agent->Tool pipeline, not a dispatch->orchestrator->subagent hierarchy. The README mentions "Major tasks that involve multiple agents, like refactors and rewrites" as a use case, but this must be built manually using the SDK — it is not a native framework feature. |
+| 2 | **Config-driven orchestrators** | Partial | Agents are defined programmatically in Python via the SDK (code-driven, not config-driven). However, the config template file (`config.template.toml`) provides some structure. Orchestrators as config-defined entities do not exist. |
+| 3 | **Parallel subagent execution** | Not supported | The SDK executes one agent step at a time per conversation. No native mechanism for spawning parallel subagents. Would need to be built on top of the SDK. |
+| 4 | **Strict hierarchy communication** | Not supported | No hierarchy enforced. The SDK's event system could be used to build one, but it's not a native constraint. |
+| 5 | **User-to-agent messaging mid-execution** | Full | The Conversation system supports `send_message()` at any time. In the GUI, users can inject messages mid-execution. The RemoteConversation uses WebSocket for real-time communication. The agent's step() loop checks for pending messages on each iteration. [Source: Agent architecture](https://docs.openhands.dev/sdk/arch/agent) |
+| 6 | **Conflict prevention** | Not supported | No native mechanism for assigning non-overlapping file scopes to parallel agents. Would need custom implementation. |
+| 7 | **Role-scoped tooling** | Full | Each agent instance is created with its own tool set (`Agent(llm=llm, tools=[...])`). Different agents can have entirely different tools. Factory functions and shared executors support complex tool topologies. [Source: Custom Tools docs](https://docs.openhands.dev/sdk/guides/custom-tools) |
+| 8 | **Skills system** | Full | Rich skills system with three skill types (Repository/Knowledge/Task), YAML frontmatter in markdown files, keyword triggers, dynamic content execution, MCP integration. Supports `AGENTS.md`, `.cursorrules`, and `.agents/skills/*.md` formats. However, the directory structure (`~/.skills/`, `<project>/.skills/`) differs from the Dispatch spec — OpenHands uses `AGENTS.md` at repo root and `.agents/skills/` directory. [Source: Skill docs](https://docs.openhands.dev/sdk/arch/skill) |
+| 9 | **LSP integration** | Not supported | No LSP integration found in the SDK documentation. The tools package includes `FileEditorTool` and `BashTool` but no LSP-based diagnostic tools. |
+| 10 | **Shell access with directory permissions** | Partial | Full shell access via `BashTool` and `TerminalTool`. SecurityAnalyzer provides risk-based action validation (LOW/MEDIUM/HIGH risk levels with configurable confirmation thresholds). However, this is a general action risk system, NOT a directory-based permission system (no auto-allow lists for specific paths, no prompting for out-of-scope directories). [Source: Security docs](https://docs.openhands.dev/sdk/arch/security) |
+| 11 | **Session management** | Partial | State persistence with auto-save and resume exists. Model switching is configurable at agent creation but there is no `/model` command for mid-conversation switching. No chat forking feature documented. |
+| 12 | **Human-in-the-loop checkpoints** | Full | ConfirmationPolicy supports AlwaysConfirm, NeverConfirm, and ConfirmRisky modes. The Agent step() loop checks for pending confirmations before executing actions. SecurityAnalyzer provides risk assessment for each action. Configurable thresholds. [Source: Agent architecture](https://docs.openhands.dev/sdk/arch/agent) |
+| 13 | **State persistence** | Full | Persistence service with auto-save and resume. Debounced writes, incremental events. Conversation state, event history, and workspace state are persisted. [Source: Conversation docs](https://docs.openhands.dev/sdk/arch/conversation) |
+| 14 | **Provider-agnostic LLM** | Full | Via LiteLLM backend supporting 100+ providers. LLM class with environment variable, JSON, and programmatic configuration. Dual API support (Chat Completions + Responses API). Telemetry and cost tracking. [Source: LLM docs](https://docs.openhands.dev/sdk/arch/llm) |
+| 15 | **Multiple interfaces** | Full | CLI (OpenHands CLI), GUI (React single-page app), Cloud (hosted deployment), Enterprise (self-hosted VPC), SDK (Python + REST API via agent-server). WebSocket support for real-time communication. [Source: OpenHands README](https://github.com/OpenHands/OpenHands) |
+
+### Key Quotes
+
+> "The SDK is a composable Python library that contains all of our agentic tech. It's the engine that powers everything else below." [Source: OpenHands README](https://github.com/OpenHands/OpenHands)
+
+> "You can use the OpenHands Software Agent SDK for: ... Major tasks that involve multiple agents, like refactors and rewrites." [Source: SDK docs](https://docs.openhands.dev/sdk)
+
+> "The Software Agent SDK serves as the source of truth for agents in OpenHands." [Source: SDK Architecture docs](https://docs.openhands.dev/sdk/arch/overview)
+
+---
+
+## 3. SWE-agent
+
+### Core Architecture
+
+SWE-agent is a **single-agent** system designed for autonomous resolution of GitHub issues. Its architecture: `sweagent` CLI -> `Agent` (reasoning loop with LLM) + `SWEEnv` (environment manager wrapping SWE-ReX). SWE-ReX manages a Docker container running a shell session with custom tool implementations. The agent prompts an LLM, the LLM outputs tool calls, which are parsed and executed in the Docker sandbox.
+
+The tool system is organized as "tool bundles" — directories containing executable scripts, a `config.yaml`, and an `install.sh`. Tools are configured via YAML config files.
+
+- Language: **Python (94.8%)**
+- GitHub: **19.2k stars**, 2.1k forks, **2,158 commits**, 10 releases (latest v1.1.0, May 22, 2025)
+- Primary use case: Academic research benchmark for autonomous SWE-bench issue resolution
+- **Superseded**: The project now recommends mini-SWE-agent (65% on SWE-bench verified in 100 lines of Python)
+- [Source: SWE-agent GitHub repo](https://github.com/SWE-agent/SWE-agent)
+- [Source: Architecture docs](https://swe-agent.com/latest/background/architecture/)
+
+### Key Features Identified
+
+- **YAML-driven configuration**: Single config file defines tools, prompts, history processors, model settings, environment. Multiple config files can be merged with `--config`.
+- **Tool bundles**: Extensible tool system. Tools are executables in a `bin/` directory with a `config.yaml` defining signatures, arguments, and docstrings. State commands return JSON after each action.
+- **History processors**: Plugable context compression, including `cache_control` (last N messages) and `image_parsing` for multimodal support.
+- **Batch mode**: `sweagent run-batch` with `--num_workers` for parallel execution across multiple instances. Supports SWE-bench, HuggingFace, file-based, and expert instance sources.
+- **Multimodal support**: Config for processing images from GitHub issues, extended observation lengths, web browsing tools.
+- **Demonstrations**: Pre-recorded trajectories can guide agent behavior.
+- **Template system**: Three template types — system_template (initial prompt), instance_template (per-task context), next_step_template (per-turn prompt). Templates use Jinja-like syntax.
+- **SWE-ReX integration**: Separate package for managing remote execution environments (Docker, Modal, AWS).
+
+### Checklist Evaluation
+
+| # | Requirement | Status | Evidence |
+|---|-------------|--------|----------|
+| 1 | **Three-layer hierarchy** | Not supported | Single agent resolving one issue at a time. No dispatch/orchestrator/subagent layers. |
+| 2 | **Config-driven orchestrators** | Not supported | SWE-agent is configured via YAML but configures a single agent type. Orchestrators as dispatch-defined entities do not exist. |
+| 3 | **Parallel subagent execution** | Partial | Batch mode (`sweagent run-batch --num_workers N`) runs multiple independent agent instances in parallel on separate issues. This is horizontal parallelism of independent tasks, NOT hierarchical subagent parallelism. [Source: Batch mode docs](https://swe-agent.com/latest/usage/batch_mode/) |
+| 4 | **Strict hierarchy communication** | Not supported | No hierarchy exists. Each agent operates independently in its own Docker container. |
+| 5 | **User-to-agent messaging mid-execution** | Not supported | SWE-agent is designed for autonomous execution. No mechanism for injecting user messages to a running agent mid-task. |
+| 6 | **Conflict prevention** | Not supported | Single agent per task. Batch mode tasks are independent (different GitHub issues), so no file conflicts. |
+| 7 | **Role-scoped tooling** | Not supported | Single agent type per config file. All agents in a batch use the same tool set. |
+| 8 | **Skills system** | Partial | Template system supports system prompts, instance prompts, and next-step prompts defined in YAML config. Demonstrations provide trajectory-based guidance. However, there is NO directory-based markdown skills system with auto-loading (no `~/.skills/` or `<project>/.skills/`). [Source: Templates docs](https://swe-agent.com/latest/config/templates/) |
+| 9 | **LSP integration** | Not supported | No LSP integration found. Tools are shell-based (file viewer, editor, bash). No compiler diagnostic integration. |
+| 10 | **Shell access with directory permissions** | Not supported | Full shell access inside Docker container via bash tool. No directory-level permission system. Docker provides container-level isolation but not fine-grained directory permissions. |
+| 11 | **Session management** | Not supported | No chat forking, no model switching mid-conversation, no loading/resuming old conversations. Trajectories are saved as output files for post-hoc analysis. |
+| 12 | **Human-in-the-loop checkpoints** | Not supported | Designed for fully autonomous execution. No configurable checkpoints for user approval. |
+| 13 | **State persistence** | Partial | Trajectories and predictions are saved to files (`preds.json`, trajectory files). But no formal state persistence for resuming interrupted sessions. The `sweagent merge-preds` utility can recover partial batch results. |
+| 14 | **Provider-agnostic LLM** | Full | Model configured in YAML (`agent.model.name`). Supports any LM via config. Model config includes per-instance cost limits, temperature, and other parameters. [Source: Config docs](https://swe-agent.com/latest/config/config/) |
+| 15 | **Multiple interfaces** | Partial | CLI only (`sweagent run`, `sweagent run-batch`, `sweagent merge-preds`). No API, no TUI, no web UI. |
+
+### Key Quotes
+
+> "SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice." [Source: SWE-agent README](https://github.com/SWE-agent/SWE-agent)
+
+> "Most of our current development effort is on mini-swe-agent, which has superseded SWE-agent." [Source: SWE-agent README](https://github.com/SWE-agent/SWE-agent)
+
+> "Configurable & fully documented: Governed by a single yaml file." [Source: SWE-agent README](https://github.com/SWE-agent/SWE-agent)
+
+---
+
+## Key Questions
+
+### 1. What is each framework's core architecture? How many layers of agent hierarchy does it support?
+
+- **Aider**: Single-layer (User <-> Agent). No hierarchy. The "architect mode" uses two models sequentially but is still a single-agent session.
+- **OpenHands**: Single-layer (Conversation -> Agent -> Tools). The SDK can be used to build multi-agent systems programmatically, but no hierarchy is natively enforced. The four-package architecture provides building blocks.
+- **SWE-agent**: Single-layer (CLI -> Agent + Environment). One agent per instance. Batch mode runs independent agents in parallel but no hierarchy.
+
+### 2. How extensible/configurable is each framework without modifying source code?
+
+- **Aider**: Modular via command-line flags, YAML config file, environment variables, and `.env` files. Can be scripted via Python API. Adding new LLM providers requires no code changes (model aliases). No plugin system for tools or new behaviors.
+- **OpenHands**: Highly extensible via the SDK. Custom tools (typed Action/Observation/Executor pattern), custom agents, custom security analyzers, MCP server integration, workspace implementations. Requires Python code for configuration but has clean extension points. Config template file provides some non-code configuration.
+- **SWE-agent**: Extensible via YAML config files (tools, prompts, models, environments, history processors). Custom tools are executables in tool bundles with a config YAML. Adding tools requires creating executable scripts but no modification of core source code. Now in maintenance-only mode.
+
+### 3. What is the primary use case each framework was designed for?
+
+- **Aider**: Interactive AI pair programming — a developer chatting with an LLM to edit code in a git repository.
+- **OpenHands**: General-purpose AI-driven development platform — from simple one-off tasks to complex multi-agent workflows, with CLI, GUI, Cloud, and Enterprise deployments.
+- **SWE-agent**: Academic research benchmark for autonomous SWE-bench issue resolution. Designed to evaluate LLMs on real-world GitHub issues.
+
+### 4. How active is each project?
+
+| Metric | Aider | OpenHands | SWE-agent |
+|--------|-------|-----------|-----------|
+| GitHub Stars | 45k | 74.1k | 19.2k |
+| Forks | 4.4k | 9.4k | 2.1k |
+| Commits | 13,135 | 6,737 | 2,158 |
+| Releases | 93 | 102 | 10 |
+| Latest Release | v0.86.0 (Aug 2025) | v1.7.0 (May 2026) | v1.1.0 (May 2025) |
+| Status | **Active development** | **Active development** | **Maintenance-only** (superseded by mini-SWE-agent) |
+
+### 5. What language is each framework written in?
+
+- **Aider**: Python (80%), CSS, Shell, Tree-sitter Query, JavaScript, HTML
+- **OpenHands**: Python (62.3%), TypeScript (35.9%), Go Template, Jinja, Makefile, CSS
+- **SWE-agent**: Python (94.8%), JavaScript, CSS, Shell, C++, Perl
+
+---
+
+## Summary Comparison Table
+
+| # | Requirement | Aider | OpenHands | SWE-agent |
+|---|-------------|-------|-----------|-----------|
+| 1 | Three-layer hierarchy | ❌ Not supported | ❌ Not supported | ❌ Not supported |
+| 2 | Config-driven orchestrators | ❌ Not supported | ⚠️ Partial (SDK is code-driven) | ❌ Not supported |
+| 3 | Parallel subagent execution | ❌ Not supported | ❌ Not supported | ⚠️ Partial (batch mode parallelism) |
+| 4 | Strict hierarchy communication | ❌ Not supported | ❌ Not supported | ❌ Not supported |
+| 5 | User-to-agent messaging mid-execution | ⚠️ Partial (single agent) | ✅ Full (WebSocket, send_message) | ❌ Not supported |
+| 6 | Conflict prevention | ❌ Not supported | ❌ Not supported | ❌ Not supported |
+| 7 | Role-scoped tooling | ❌ Not supported | ✅ Full (per-agent tool sets) | ❌ Not supported |
+| 8 | Skills system | ⚠️ Partial (conventions files) | ✅ Full (3 skill types, triggers, MCP) | ⚠️ Partial (templates + demonstrations) |
+| 9 | LSP integration | ⚠️ Partial (shell linting) | ❌ Not supported | ❌ Not supported |
+| 10 | Shell access w/ directory permissions | ❌ Not supported | ⚠️ Partial (risk-based security) | ❌ Not supported |
+| 11 | Session management | ⚠️ Partial (/model, /save) | ⚠️ Partial (persistence, no forking) | ❌ Not supported |
+| 12 | Human-in-the-loop checkpoints | ⚠️ Partial (--yes, /undo) | ✅ Full (ConfirmationPolicy) | ❌ Not supported |
+| 13 | State persistence | ❌ Not supported | ✅ Full (auto-save & resume) | ⚠️ Partial (trajectory files) |
+| 14 | Provider-agnostic LLM | ✅ Full (15+ providers) | ✅ Full (100+ via LiteLLM) | ✅ Full (model config in YAML) |
+| 15 | Multiple interfaces | ⚠️ Partial (CLI + Python API) | ✅ Full (CLI, GUI, Cloud, API) | ⚠️ Partial (CLI only) |
+
+**Scoring:**
+- **Aider**: 0 Full / 5 Partial / 10 Not supported
+- **OpenHands**: 7 Full / 4 Partial / 4 Not supported
+- **SWE-agent**: 1 Full / 4 Partial / 10 Not supported
+
+---
+
+## Source List
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [Aider GitHub Repository](https://github.com/Aider-AI/aider) | GitHub |
+| 2 | [Aider Configuration Docs](https://aider.chat/docs/config.html) | Official docs |
+| 3 | [Aider Chat Modes Docs](https://aider.chat/docs/usage/modes.html) | Official docs |
+| 4 | [Aider In-Chat Commands](https://aider.chat/docs/usage/commands.html) | Official docs |
+| 5 | [Aider Linting and Testing](https://aider.chat/docs/usage/lint-test.html) | Official docs |
+| 6 | [Aider IDE Integration](https://aider.chat/docs/usage/watch.html) | Official docs |
+| 7 | [Aider Scripting API](https://aider.chat/docs/scripting.html) | Official docs |
+| 8 | [Aider Coding Conventions](https://aider.chat/docs/usage/conventions.html) | Official docs |
+| 9 | [OpenHands GitHub Repository](https://github.com/OpenHands/OpenHands) | GitHub |
+| 10 | [OpenHands SDK Overview](https://docs.openhands.dev/sdk) | Official docs |
+| 11 | [OpenHands SDK Architecture](https://docs.openhands.dev/sdk/arch/overview) | Official docs |
+| 12 | [OpenHands Agent Architecture](https://docs.openhands.dev/sdk/arch/agent) | Official docs |
+| 13 | [OpenHands Conversation Architecture](https://docs.openhands.dev/sdk/arch/conversation) | Official docs |
+| 14 | [OpenHands LLM Architecture](https://docs.openhands.dev/sdk/arch/llm) | Official docs |
+| 15 | [OpenHands Skill System](https://docs.openhands.dev/sdk/arch/skill) | Official docs |
+| 16 | [OpenHands Security System](https://docs.openhands.dev/sdk/arch/security) | Official docs |
+| 17 | [OpenHands Custom Tools Guide](https://docs.openhands.dev/sdk/guides/custom-tools) | Official docs |
+| 18 | [OpenHands Hello World](https://docs.openhands.dev/sdk/guides/hello-world) | Official docs |
+| 19 | [SWE-agent GitHub Repository](https://github.com/SWE-agent/SWE-agent) | GitHub |
+| 20 | [SWE-agent Architecture](https://swe-agent.com/latest/background/architecture/) | Official docs |
+| 21 | [SWE-agent Config Files](https://swe-agent.com/latest/config/config/) | Official docs |
+| 22 | [SWE-agent Templates](https://swe-agent.com/latest/config/templates/) | Official docs |
+| 23 | [SWE-agent Tools](https://swe-agent.com/latest/config/tools/) | Official docs |
+| 24 | [SWE-agent Batch Mode](https://swe-agent.com/latest/usage/batch_mode/) | Official docs |
+
+---
+
+## Verbatim Quotes
+
+- "aider is AI pair programming in your terminal" — [Source: Aider README](https://github.com/Aider-AI/aider)
+- "Aider can connect to almost any LLM, including local models." — [Source: Aider README](https://github.com/Aider-AI/aider)
+- "The SDK is a composable Python library that contains all of our agentic tech. It's the engine that powers everything else below." — [Source: OpenHands README](https://github.com/OpenHands/OpenHands)
+- "The Software Agent SDK serves as the source of truth for agents in OpenHands." — [Source: SDK Architecture docs](https://docs.openhands.dev/sdk/arch/overview)
+- "You can use the OpenHands Software Agent SDK for: ... Major tasks that involve multiple agents, like refactors and rewrites." — [Source: SDK docs](https://docs.openhands.dev/sdk)
+- "SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice." — [Source: SWE-agent README](https://github.com/SWE-agent/SWE-agent)
+- "Most of our current development effort is on mini-swe-agent, which has superseded SWE-agent." — [Source: SWE-agent README](https://github.com/SWE-agent/SWE-agent)
+- "Configurable & fully documented: Governed by a single yaml file." — [Source: SWE-agent README](https://github.com/SWE-agent/SWE-agent)
+- "OpenHands is also the leading open source framework for coding agents. It's MIT-licensed, and can work with any LLM." — [Source: SDK docs](https://docs.openhands.dev/sdk)
+- "The agent operates through a single-step execution model where each step() call processes one reasoning cycle." — [Source: Agent Architecture](https://docs.openhands.dev/sdk/arch/agent)
+
+---
+
+## Source Quality Flags
+
+- Source 5, 6, 7, 8 (Aider docs): Official documentation — high quality, maintained by project maintainers.
+- Source 10-18 (OpenHands docs): Official SDK documentation — comprehensive, well-structured, with architecture diagrams and runnable code examples.
+- Source 20-24 (SWE-agent docs): Official documentation — includes architecture diagrams and configuration references. However, the project is now in maintenance-only mode with recommendation to use mini-SWE-agent.
+
+---
+
+## Confidence: High
+
+All information was gathered from primary sources (GitHub repositories, official documentation sites). No AI-generated summaries or marketing materials were used. The frameworks' architectures and capabilities are well-documented and verifiable.
+
+---
+
+## Gaps and Open Questions
+
+1. **Multi-agent hierarchy in OpenHands**: The OpenHands SDK README mentions "Major tasks that involve multiple agents" as a use case, but the public SDK documentation does not yet show concrete multi-agent orchestration examples. The multi-agent example directory (`examples/02_multi_agent_hello_world/`) was attempted but returned a 404, suggesting it may not exist yet. A follow-up investigation should check whether multi-agent orchestration patterns exist in the SDK source code.
+
+2. **OpenHands SDK vs Application distinction**: The OpenHands ecosystem has been restructured. The monolithic GitHub repo (`OpenHands/OpenHands`) now points to the SDK, CLI, and GUI as separate packages. The relationship between the legacy OpenDevin application repo and the new SDK was not fully explored and could affect feature availability.
+
+3. **SWE-agent mini-SWE-agent relationship**: SWE-agent now recommends mini-SWE-agent, which achieves comparable performance in "100 lines of Python." This research covered SWE-agent 1.0, but mini-SWE-agent may have a substantially different architecture worth evaluating separately.
+
+4. **None of the three frameworks natively support the three-layer hierarchy Dispatch requires.** The closest foundation is OpenHands' SDK, which provides the building blocks (per-agent tool scoping, event system, security policies, skills system, state persistence) but lacks the hierarchical orchestration layer. Building Dispatch on top of OpenHands SDK would require implementing the orchestrator and dispatcher layers as custom Python code.
+
+---
+
+## Tool Calls Made
+
+1. `webfetch` https://github.com/Aider-AI/aider
+2. `webfetch` https://github.com/All-Hands-AI/OpenHands
+3. `webfetch` https://github.com/princeton-nlp/SWE-agent
+4. `webfetch` https://aider.chat/docs/config.html
+5. `webfetch` https://aider.chat/docs/usage.html
+6. `webfetch` https://swe-agent.com/latest/
+7. `webfetch` https://docs.openhands.dev/sdk
+8. `webfetch` https://swe-agent.com/latest/background/architecture/
+9. `webfetch` https://docs.openhands.dev/sdk/arch/overview
+10. `webfetch` https://aider.chat/docs/usage/modes.html
+11. `webfetch` https://swe-agent.com/latest/config/config/
+12. `webfetch` https://docs.openhands.dev/sdk/arch/agent
+13. `webfetch` https://docs.openhands.dev/sdk/arch/skill
+14. `webfetch` https://aider.chat/docs/scripting.html
+15. `webfetch` https://swe-agent.com/latest/usage/batch_mode/
+16. `webfetch` https://aider.chat/docs/usage/lint-test.html
+17. `webfetch` https://aider.chat/docs/usage/watch.html
+18. `webfetch` https://docs.openhands.dev/sdk/arch/conversation
+19. `webfetch` https://docs.openhands.dev/sdk/arch/llm
+20. `webfetch` https://aider.chat/docs/usage/conventions.html
+21. `webfetch` https://docs.openhands.dev/sdk/arch/security
+22. `webfetch` https://swe-agent.com/latest/config/tools/
+23. `webfetch` https://aider.chat/docs/usage/commands.html
+24. `webfetch` https://docs.openhands.dev/sdk/guides/hello-world
+25. `webfetch` https://swe-agent.com/latest/config/templates/
+26. `webfetch` https://raw.githubusercontent.com/OpenHands/software-agent-sdk/main/examples/02_multi_agent_hello_world/01_multi_agent_basic.py
+27. `webfetch` https://docs.openhands.dev/sdk/guides/custom-tools
+28. `webfetch` https://github.com/SWE-agent/SWE-agent?tab=readme-ov-file
diff --git a/research/ai-coding-assistants-newer.md b/research/ai-coding-assistants-newer.md
new file mode 100644
index 0000000..932e244
--- /dev/null
+++ b/research/ai-coding-assistants-newer.md
@@ -0,0 +1,451 @@
+# Subagent Report: AI Coding Assistants — Goose, Mentat, Cline, Continue
+
+## Research summary
+
+This report evaluates four open-source AI coding assistants — Goose (Block/AAIF), Mentat (AbanteAI, archived), Cline, and Continue — against a 15-point requirements checklist focused on agent hierarchy, configurability, parallelism, security, and integration capabilities. Goose and Cline are the most full-featured frameworks, with multi-agent support, provider-agnostic LLM interfaces, plugin/skills systems, and human-in-the-loop controls. Continue has pivoted from an IDE assistant to a CI-focused "AI checks" product. Mentat (the original AbanteAI CLI tool) has been archived since January 2025 and is no longer maintained. Confidence is high for Goose, Cline, and Continue based on current docs and repos; Mentat information reflects its archived state.
+
+---
+
+## Findings
+
+### 1. Goose (by Block / AAIF)
+
+**GitHub**: [github.com/aaif-goose/goose](https://github.com/aaif-goose/goose) — 45.5k stars, 4,541 commits
+**Language**: Rust (49.8%) + TypeScript (44.6%)
+**Latest release**: v1.34.1 (May 15, 2026)
+**License**: Apache 2.0
+**Governance**: Now under Agentic AI Foundation (AAIF) at Linux Foundation
+
+**Core Architecture**: Goose has a three-component architecture: (1) **Interface** (desktop app, CLI, API), (2) **Agent** (core loop managing LLM interaction), and (3) **Extensions** (MCP-based tools). It supports spawning **subagents** — independent agent instances that execute tasks with process isolation, running sequentially or in parallel. Subagents inherit extensions from the parent but can be restricted.
+
+Key features:
+- Desktop app + CLI + API modes
+- 15+ LLM providers via abstract interface (Anthropic, OpenAI, Google, Ollama, OpenRouter, Azure, Bedrock, etc.)
+- MCP (Model Context Protocol) extension system with 70+ extensions
+- ACP (Agent Client Protocol) support for interoperability
+- Subagent system with parallel/sequential execution, recipe-based reusable configs, and external subagents (Codex, Claude Code)
+- Skills system: `~/.agents/skills/` and `.agents/skills/` directories with `SKILL.md` files in named subdirectories; also compatible with `.claude/skills/`
+- `.goosehints` files for project context (global at `~/.config/goose/.goosehints`, local per-directory, hierarchical)
+- Config-driven via YAML config files (`config.yaml`, `permission.yaml`, `secrets.yaml`)
+- Session management: start, resume, search sessions; smart context management with auto-compaction
+- Permission modes: auto, approve, chat, smart_approve
+- Prompt injection detection, adversary mode, sandboxing for desktop app
+- Extension allowlist for access control
+
+**Primary Use Case**: General-purpose AI agent for code, research, writing, automation, data analysis.
+
+**Activity**: Very active — 4,541 commits, 134 releases, moved to AAIF in April 2026.
+
+---
+
+### 2. Mentat (by AbanteAI)
+
+**Original GitHub**: [github.com/AbanteAI/archive-old-cli-mentat](https://github.com/AbanteAI/archive-old-cli-mentat) — 2.6k stars (archived)
+**Language**: Python (87.9%) + TypeScript (8.4%)
+**Status**: **Archived January 7, 2025** — no longer maintained or supported
+
+The original Mentat was a command-line AI coding assistant with the following characteristics:
+
+- Single-agent CLI tool (no multi-agent hierarchy)
+- Used GPT-4 models via OpenAI API (later supported alternatives via litellm proxy)
+- Worked within git repositories, editing multiple files
+- Had a VS Code extension (`mentat-vscode/`)
+- Supported auto-context via RAG (retrieval-augmented generation) using universal-ctags
+- File exclusion via glob patterns
+- Configuration via config files
+- Used Python SDK's OpenAI client under the hood
+
+**Key limitations for requirements checklist**:
+- No multi-agent hierarchy (single agent, no orchestrator/subagent concept)
+- No config-driven orchestrators
+- No parallel execution
+- No hierarchical communication restrictions
+- No user-to-agent mid-execution messaging
+- No conflict prevention for parallel agents
+- No role-scoped tooling
+- No skills system (basic context via auto-context only)
+- No LSP integration
+- Shell access without directory permissions
+- Basic session management (no forking, model switching mid-conversation)
+- No human-in-the-loop checkpoints
+- State persistence limited to git context
+- Provider-agnostic via litellm proxy workaround, not native abstract interface
+- CLI only (no TUI, no API mode documented)
+
+**Note**: The name "Mentat" is now used by a different project (an AI-powered GitHub bot at mentat.ai), but this report evaluates the original AbanteAI Mentat per the task scope.
+
+**Activity**: Archived project with 560 commits, last release v1.0.18 (Apr 2024), last PyPI release v1.0.19 (Jan 2025 — marked as archived).
+
+---
+
+### 3. Cline (formerly Claude Dev)
+
+**GitHub**: [github.com/cline/cline](https://github.com/cline/cline) — 62k stars, 5,919 commits
+**Language**: TypeScript (97.7%)
+**Latest CLI release**: v3.0.7 (May 18, 2026)
+**License**: Apache 2.0
+**Organization**: Cline Bot Inc.
+
+**Core Architecture**: Cline is built on a layered SDK architecture:
+- `@cline/sdk` — Public SDK surface
+- `@cline/core` — Node runtime for sessions, built-in tools, persistence, hub support, automation
+- `@cline/agents` — Browser-compatible stateless agent execution loop
+- `@cline/llms` — Provider gateway and model catalogs
+- `@cline/shared` — Types, schemas, tool helpers
+
+Cline operates in multiple form factors: **CLI** (terminal with interactive or headless modes), **VS Code Extension**, **JetBrains Plugin**, and **Kanban** (web-based multi-agent task board).
+
+Key features:
+- **Multi-agent teams**: Coordinator agent breaks work into subtasks and delegates to specialist agents, each with their own tools and context. Team state persists across sessions.
+- **Kanban**: Run many agents in parallel from a web-based task board; each card gets its own worktree and auto-commit
+- **Rules system**: `.clinerules` files for project-specific guidance
+- **Skills system**: `.agents/skills/` directory with SKILL.md files (compatible with Goose/Claude skill format)
+- **Plugin system**: `AgentPlugin` with hooks into lifecycle stages (beforeRun, afterRun, beforeTool, afterTool, etc.)
+- **MCP support**: Load MCP settings through runtime/config extension path
+- **Provider-agnostic**: 200+ models via OpenRouter, plus Anthropic, OpenAI, Google, AWS Bedrock, Azure, Ollama, LM Studio, and any OpenAI-compatible API
+- **Tool policies**: Per-tool auto-approve/require-approval/disable
+- **Checkpoints and state persistence**: Sessions persist across restarts; snapshot/restore capability
+- **CLI commands**: `cline auth`, `cline config`, `cline mcp`, `cline history`, `cline schedule`, `cline hub`, `cline kanban`
+- **Scheduled agents**: Cron-based recurring automations
+- **Headless mode**: For CI/CD with JSON output and auto-approve
+- **Human-in-the-loop**: Approval required per action (configurable); Plan mode vs Act mode
+- **Chat integrations**: Slack, Telegram, Discord, Google Chat, WhatsApp, Linear
+- **Enterprise features**: SSO, RBAC, observability (OpenTelemetry, Datadog, Grafana)
+
+**Primary Use Case**: AI coding agent in IDE and terminal; also used as SDK for building custom agent applications.
+
+**Activity**: Very active — 5,919 commits, 260 releases, 62k stars, 3.3k dependents.
+
+---
+
+### 4. Continue (Continue Dev, Inc.)
+
+**GitHub**: [github.com/continuedev/continue](https://github.com/continuedev/continue) — 33.3k stars, 21,498 commits
+**Language**: TypeScript (84.4%) + Kotlin (3.8%) + Python (2.2%)
+**Latest release**: v1.2.22-vscode (Mar 27, 2026)
+**License**: Apache 2.0
+
+**Note**: Continue has undergone a significant pivot. The project originally was an open-source IDE code assistant (VS Code + JetBrains) with chat, edit, autocomplete, and agent modes. It has now pivoted to **Source-controlled AI checks, enforceable in CI** — a CI-focused product where checks are defined as markdown files in `.continue/checks/` and run as GitHub status checks on every PR.
+
+**Current Architecture** (post-pivot):
+- Checks defined as markdown files with YAML frontmatter in `.continue/checks/` or `.agents/checks/`
+- Each check file has `name`, `description`, optional `model`, and a markdown body prompt
+- CLI (`cn`) runs checks locally and in CI
+- GitHub integration for PR status checks
+- "Agents" for event-triggered automation (schedules, new issues, webhooks)
+- Cloud agent gallery with pre-configured agents
+
+**Legacy IDE extension** (still available):
+- VS Code Agent, Chat, Edit, Autocomplete modes
+- JetBrains plugin
+- MCP support
+- Multiple LLM provider support (via config)
+
+**Primary Use Case (current)**: AI-powered code review checks in CI/CD pipelines.
+
+**Activity**: Active — 21,498 commits (the highest commit count), 822 releases. However, recent focus appears to be on the CI/checks product rather than the agent framework.
+
+---
+
+## Requirements Checklist
+
+### Requirement 1: Three-layer hierarchy (dispatch -> orchestrator -> subagent)
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Partial** | Supports subagents (spawned by main agent) but no formal three-layer hierarchy. Has main agent → subagent (1 level deep). Subagents cannot spawn further subagents. Recipes enable reusable subagent configs. External subagents (Codex, Claude Code) supported via MCP. |
+| **Mentat** | **Not at all** | Single-agent CLI tool. No notion of orchestrator or subagents. |
+| **Cline** | **Partial** | Supports multi-agent teams: coordinator agent delegates to specialist agents. CLI supports `--team-name` flag. Kanban enables parallel agent execution. SDK enables building custom multi-agent systems. However, no formal three-layer dispatch→orchestrator→subagent hierarchy documented. |
+| **Continue** | **Not at all** | Current product is single-agent checks per PR. No multi-agent hierarchy. Each check runs independently. |
+
+### Requirement 2: Config-driven orchestrators
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Recipes (YAML files) define subagent behavior: instructions, extensions, parameters, prompts. `GOOSE_RECIPE_PATH` env var configures recipe locations. Subagent settings configurable via natural language, env vars, or recipes. |
+| **Mentat** | **Not at all** | No orchestrator concept. Basic config file for model settings. |
+| **Cline** | **Partial** | Orchestrator behavior is primarily code-driven via SDK (TypeScript). Plugin system allows packaging configuration. CLI flags (`--team-name`, `--plan`, `--auto-approve`) provide some config-driven behavior. No dedicated YAML orchestration config. |
+| **Continue** | **Not at all** | No orchestrator concept. Check files are markdown with YAML frontmatter, but these are individual check configs, not orchestrator definitions. |
+
+### Requirement 3: Parallel subagent execution
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Subagents can run in parallel using trigger keywords ("parallel", "simultaneously", "concurrently", "at the same time"). Parallel subagents supported natively. |
+| **Mentat** | **Not at all** | Single-threaded CLI, no parallel execution. |
+| **Cline** | **Yes** | Kanban enables parallel agent execution from a web-based task board. SDK multi-agent example demonstrates parallel agents streaming to web UI. Multi-agent teams with coordinator delegating work. |
+| **Continue** | **Not at all** | Checks run sequentially as part of CI pipeline. No parallel subagent execution. |
+
+### Requirement 4: Strict hierarchy communication
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Subagents have restricted operations: cannot spawn further subagents (prevents infinite recursion), cannot manage extensions, cannot manage schedules. Communication flows through the parent agent. |
+| **Mentat** | **Not at all** | No hierarchy exists. |
+| **Cline** | **Partial** | Multi-agent team communication pattern: coordinator → specialist agents. SDK plugin hooks enable lifecycle-based communication control. No explicit peer-to-peer blocking documented but the architecture implies parent-mediated communication. |
+| **Continue** | **Not at all** | No hierarchy exists. |
+
+### Requirement 5: User-to-agent messaging mid-execution
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Sessions are continuous conversations. Users can interrupt and provide input during execution. In-session actions allow sharing information mid-session. Subagent activity is visible in real-time. |
+| **Mentat** | **Partial** | Interactive CLI sessions allow text input during conversation, but no structured mid-execution message injection. |
+| **Cline** | **Yes** | Interactive CLI mode and VS Code extension support continuous chat. Users can interrupt agent execution. `ask_question` tool allows agent to request user input mid-execution. Plan/Act mode switching. |
+| **Continue** | **Not at all** | Checks run autonomously in CI. No user interaction mid-execution. The legacy VS Code extension supports chat but the current CI product does not. |
+
+### Requirement 6: Conflict prevention (non-overlapping file scopes)
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Partial** | Subagents operate with process isolation. Sandbox for desktop app controls file access. No explicit file-scoping mechanism for parallel subagents documented. |
+| **Mentat** | **Not at all** | No parallel execution, no conflict prevention needed. |
+| **Cline** | **Partial** | Kanban gives each agent its own worktree (Git worktree) with auto-commit. This provides file-level isolation. No explicit lock-based conflict prevention. |
+| **Continue** | **Not at all** | No parallel execution. |
+
+### Requirement 7: Role-scoped tooling
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Subagents can be given specific extension sets (e.g., "Create a subagent with only the developer extension"). Extension control via recipes and natural language prompts. `available_tools` field in extension config filters tools. |
+| **Mentat** | **Not at all** | No role-based tool differentiation. |
+| **Cline** | **Yes** | Plugin system allows registering tools per agent. Multi-agent teams: specialist agents get their own tools and context. Tool policies per agent (`autoApprove`, `enabled`). SDK enables fine-grained tool assignment. |
+| **Continue** | **Not at all** | Each check gets the same toolset (PR diff reading). No role-scoped tooling. |
+
+### Requirement 8: Skills system (injectable markdown/text instructions per agent type)
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Skills stored in `~/.agents/skills/<name>/SKILL.md` (global) or `.agents/skills/<name>/SKILL.md` (project-level). YAML frontmatter with `name` and `description`. Supports supporting files (scripts, templates). Also compatible with `.claude/skills/`. Skill Marketplace available. |
+| **Mentat** | **Not at all** | No skills system. Auto-context uses RAG to find relevant code snippets. |
+| **Cline** | **Yes** | Skills in `.agents/skills/` directory with SKILL.md files (same format as Goose). `.clinerules` files for project rules. Plugin system allows bundling rules. Skills system mentioned in built-in tools list. |
+| **Continue** | **Not at all** | Check files are similar to skills in format (markdown + frontmatter) but serve a different purpose (CI review prompts, not injectable agent instructions). No skills directory concept. |
+
+### Requirement 9: LSP integration
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Not at all** | No LSP integration documented. Uses shell commands for compilation checks. |
+| **Mentat** | **Not at all** | No LSP integration. |
+| **Cline** | **Yes** | VS Code extension integrates with editor LSP for compiler diagnostics. Monitors linter and compiler errors as it works. SDK plugin examples include TypeScript LSP tools. Clinerules can reference LSP diagnostics. |
+| **Continue** | **Partial** | Legacy VS Code extension integrates with editor LSP. Current CI-focused product does not use LSP (analyzes PR diffs, not live diagnostics). |
+
+### Requirement 10: Shell access with directory permissions
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Extension allowlist for controlling which extensions/tools are permitted. Sandbox for Desktop app (macOS sandbox). Permission modes: auto, approve, chat, smart_approve. `GOOSE_ALLOWLIST` env var. Prompt injection detection. Adversary mode for monitoring. |
+| **Mentat** | **Not at all** | Run commands directly via shell. No permission controls beyond git tracking. |
+| **Cline** | **Yes** | `CLINE_COMMAND_PERMISSIONS` env var with allow/deny lists for shell commands. Tool policies per tool (autoApprove/require approval/disable). Auto-approve can be toggled. VS Code extension every action requires explicit approval by default. |
+| **Continue** | **Not at all** | No shell access in current CI product. Legacy extension uses VS Code sandbox. |
+
+### Requirement 11: Session management (forking, model switching, loading/resuming)
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Session management with start, resume (`-r`), search, and history. Smart context management with auto-compaction. Sessions are single continuous conversations. Model switching supported via multi-model config. |
+| **Mentat** | **Partial** | Basic CLI sessions. No session forking, no model switching mid-conversation, no resume documented (beyond git context). |
+| **Cline** | **Yes** | `cline history` command for managing task history. Session state persists across restarts (snapshot/restore). Model can be overridden per run with `-m` flag. Task history queryable. |
+| **Continue** | **Not at all** | No session concept in current CI product. Each check run is stateless. |
+
+### Requirement 12: Human-in-the-loop checkpoints
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Permission modes: "approve" mode requires user approval for each action, "smart_approve" for selective approval. Configuration via `GOOSE_MODE`. Session-level approval flow. |
+| **Mentat** | **Not at all** | No checkpoint/approval system. Mentat edits files directly. |
+| **Cline** | **Yes** | Each file edit and terminal command requires approval by default. Plan mode vs Act mode. Auto-approve can be toggled. Checkpoints track all changes for easy undo. VS Code extension shows diffs for review. |
+| **Continue** | **Not at all** | Runs autonomously in CI. No human-in-the-loop during execution. Results are reviewed after the fact. |
+
+### Requirement 13: State persistence across restarts
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Sessions can be resumed across restarts (`goose session -r`). Configuration persists in `config.yaml`. Session history maintained with smart context management. OpenTelemetry for observability. |
+| **Mentat** | **No** | No session persistence. Each session starts fresh (though git provides code history). |
+| **Cline** | **Yes** | Session state persists across restarts. Snapshot/restore capability via SDK. Schedules persist across restarts. `cline history` for past sessions. Config in `.clinerules` and plugin configs. |
+| **Continue** | **Not at all** | Each check run is stateless. Check files are persistent (in git repo) but execution state is not. |
+
+### Requirement 14: Provider-agnostic LLM
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | 15+ providers: Anthropic, OpenAI, Google, Ollama, OpenRouter, Azure, Bedrock, SageMaker, etc. ACP providers (Claude Code, Codex). Provider abstraction in config. Multi-model configuration supported. |
+| **Mentat** | **Partial** | Primarily OpenAI GPT-4. Alternative models via litellm proxy (workaround, not native abstraction). |
+| **Cline** | **Yes** | Provider gateway via `@cline/llms` package. Anthropic, OpenAI, Google, OpenRouter (200+ models), AWS Bedrock, Azure, GCP Vertex, Cerebras, Groq, Ollama, LM Studio, any OpenAI-compatible API. Provider catalog with `getAllProviders()`, `getModelsForProvider()`. |
+| **Continue** | **Yes** | Multiple providers supported via config. Uses `@continuedev/openai-adapters` for provider abstraction. Supports Anthropic, OpenAI, Google, AWS Bedrock, Ollama, etc. Model configurable per check file. |
+
+### Requirement 15: Multiple interfaces (CLI, TUI, API)
+
+| Framework | Support | Explanation |
+|-----------|---------|-------------|
+| **Goose** | **Yes** | Desktop app (macOS, Linux, Windows), CLI, API (via ACP server mode). `goose acp` starts as ACP server. Multiple interfaces documented. |
+| **Mentat** | **Partial** | CLI only. Had a VS Code extension (mentat-vscode) but it's part of the archived project. No API, no TUI beyond basic CLI. |
+| **Cline** | **Yes** | CLI (interactive + headless), VS Code Extension, JetBrains Plugin, Kanban (web-based), SDK for custom integrations. ACP mode for other editors (Neovim, Zed). |
+| **Continue** | **Partial** | CLI (`cn`), VS Code Extension, JetBrains Plugin. API is cloud-based (continue.dev). No TUI. Current focus is CLI + CI integration. |
+
+---
+
+## Source list
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [Goose GitHub Repository](https://github.com/aaif-goose/goose) | official (GitHub) |
+| 2 | [Goose Architecture Docs](https://goose-docs.ai/docs/goose-architecture/) | official (docs) |
+| 3 | [Goose Subagents Guide](https://goose-docs.ai/docs/guides/context-engineering/subagents) | official (docs) |
+| 4 | [Goose Skills Guide](https://goose-docs.ai/docs/guides/context-engineering/using-skills) | official (docs) |
+| 5 | [Goose Goosehints Guide](https://goose-docs.ai/docs/guides/context-engineering/using-goosehints) | official (docs) |
+| 6 | [Goose Config Files Guide](https://goose-docs.ai/docs/guides/config-files) | official (docs) |
+| 7 | [Goose Sessions Guide](https://goose-docs.ai/docs/guides/sessions/) | official (docs) |
+| 8 | [Goose Security Guide](https://goose-docs.ai/docs/guides/security/) | official (docs) |
+| 9 | [Mentat Archived Repository](https://github.com/AbanteAI/archive-old-cli-mentat) | official (GitHub, archived) |
+| 10 | [Mentat PyPI Page](https://pypi.org/project/mentat/) | official (PyPI) |
+| 11 | [Mentat Web Archive README (Jan 2024)](https://web.archive.org/web/20240105225309/https://github.com/AbanteAI/mentat) | official (archived) |
+| 12 | [Cline GitHub Repository](https://github.com/cline/cline) | official (GitHub) |
+| 13 | [Cline SDK Overview Docs](https://docs.cline.bot/sdk/overview) | official (docs) |
+| 14 | [Cline SDK Architecture Docs](https://docs.cline.bot/sdk/architecture/overview) | official (docs) |
+| 15 | [Cline CLI Overview Docs](https://docs.cline.bot/usage/cli-overview) | official (docs) |
+| 16 | [Cline Tools Docs](https://docs.cline.bot/sdk/tools) | official (docs) |
+| 17 | [Cline Plugins Docs](https://docs.cline.bot/sdk/plugins) | official (docs) |
+| 18 | [Cline Building an Agent Guide](https://docs.cline.bot/sdk/guides/building-an-agent) | official (docs) |
+| 19 | [Cline Creating Custom Tools Guide](https://docs.cline.bot/sdk/guides/creating-custom-tools) | official (docs) |
+| 20 | [Continue GitHub Repository](https://github.com/continuedev/continue) | official (GitHub) |
+| 21 | [Continue Docs — What is Continue](https://docs.continue.dev/) | official (docs) |
+| 22 | [Continue Check File Reference](https://docs.continue.dev/checks/reference) | official (docs) |
+| 23 | [Continue Beyond Checks — Agents](https://docs.continue.dev/mission-control/beyond-checks) | official (docs) |
+| 24 | [Continue core/package.json](https://github.com/continuedev/continue/blob/main/core/package.json) | official (source) |
+
+---
+
+## Verbatim quotes
+
+- "goose is an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM" — [Goose GitHub](https://github.com/aaif-goose/goose)
+- "Subagents are independent instances that execute tasks while keeping your main conversation clean and focused." — [Goose Subagents Guide](https://goose-docs.ai/docs/guides/context-engineering/subagents)
+- "Skills are reusable sets of instructions and resources that teach goose how to perform specific tasks." — [Goose Skills Guide](https://goose-docs.ai/docs/guides/context-engineering/using-skills)
+- "⚠️ ARCHIVED PROJECT ⚠️ This repository contains an archived version of an old command-line tool that is no longer maintained or supported." — [Mentat Archived Repository](https://github.com/AbanteAI/archive-old-cli-mentat)
+- "Cline is an AI coding agent that lives in your editor and your terminal. It can read and write files, run terminal commands, use a browser, and help you build features through natural conversation." — [Cline Docs](https://docs.cline.bot/)
+- "The Cline SDK is an open source framework for building agentic applications, and is the same harness used in the Cline IDE extensions and CLI." — [Cline SDK Docs](https://docs.cline.bot/sdk/overview)
+- "Coordinate multiple agents working together on complex tasks. A coordinator agent breaks the work into subtasks and delegates to specialist agents." — [Cline GitHub README](https://github.com/cline/cline)
+- "Source-controlled AI checks, enforceable in CI. Powered by the open-source Continue CLI." — [Continue GitHub](https://github.com/continuedev/continue)
+- "Continue runs AI checks on every pull request. Each check is a markdown file in your repo that shows up as a GitHub status check." — [Continue Docs](https://docs.continue.dev/)
+
+---
+
+## Source quality flags
+
+- Source 10 (Mentat PyPI): The project is marked as archived. Useful for confirming status but not for evaluating current capabilities.
+- Source 11 (Mentat Wayback Machine): Archived snapshot from Jan 2024; reflects pre-archive state. Useful context but outdated.
+- Source 22-23 (Continue Docs): Primarily documents the new CI/checks product. The legacy VS Code extension features are not well documented in current docs.
+
+---
+
+## Confidence: High
+
+All four frameworks were researched from primary sources (GitHub repos, official documentation). Goose and Cline documentation is current and comprehensive. Continue's documentation accurately reflects its pivot to CI checks. Mentat's status as archived is confirmed by both GitHub and PyPI. The key limitation is that some features (especially edge cases around hierarchy and conflict prevention) were inferred from published capabilities rather than explicit documentation, but the overall assessment is reliable.
+
+---
+
+## Gaps and open questions
+
+1. **Cline three-layer hierarchy**: Cline's multi-agent teams imply a coordinator→specialist pattern, but whether this supports full 3+ layer nesting (dispatch→orchestrator→subagent→sub-subagent) is unclear from available docs.
+2. **Goose conflict prevention**: Whether Goose provides explicit file-locking or scope-assignment mechanisms for parallel subagents is not documented beyond process isolation and worktree separation.
+3. **Continue agent features**: Continue's "Agents" (beyond checks) are cloud-based and poorly documented for self-hosted use. What local agent capabilities remain from the original IDE assistant is unclear.
+4. **New Mentat (mentat.ai)**: The task specified "Mentat (by AbanteAI)" which refers to the archived CLI tool. The current Mentat at mentat.ai is a different product (AI GitHub bot) and was not evaluated. If this is what the lead researcher intended, follow-up investigation is needed.
+5. **LSP integration depth**: Cline's LSP integration is primarily through VS Code extension host. Whether SDK-level LSP tools exist outside the IDE is unclear.
+
+---
+
+## Summary Comparison Table
+
+| Requirement | Goose | Mentat (archived) | Cline | Continue |
+|---|---|---|---|---|
+| **1. Three-layer hierarchy** | Partial | Not at all | Partial | Not at all |
+| **2. Config-driven orchestrators** | Yes | Not at all | Partial | Not at all |
+| **3. Parallel subagent execution** | Yes | Not at all | Yes | Not at all |
+| **4. Strict hierarchy communication** | Yes | Not at all | Partial | Not at all |
+| **5. User-to-agent mid-execution** | Yes | Partial | Yes | Not at all |
+| **6. Conflict prevention** | Partial | Not at all | Partial | Not at all |
+| **7. Role-scoped tooling** | Yes | Not at all | Yes | Not at all |
+| **8. Skills system** | Yes | Not at all | Yes | Not at all |
+| **9. LSP integration** | Not at all | Not at all | Yes | Partial |
+| **10. Shell with dir permissions** | Yes | Not at all | Yes | Not at all |
+| **11. Session management** | Yes | Partial | Yes | Not at all |
+| **12. Human-in-the-loop checkpoints** | Yes | Not at all | Yes | Not at all |
+| **13. State persistence** | Yes | No | Yes | Not at all |
+| **14. Provider-agnostic LLM** | Yes | Partial | Yes | Yes |
+| **15. Multiple interfaces** | Yes (Desktop, CLI, API) | Partial (CLI only) | Yes (CLI, IDE, Kanban, SDK) | Partial (CLI, IDE) |
+
+---
+
+## Key Questions Summary
+
+### What is each framework's core architecture? How many layers of agent hierarchy?
+- **Goose**: Interface → Agent → Extensions (3-component). Subagents: main agent → subagent (1 level deep, cannot nest further).
+- **Mentat**: Single agent CLI. No hierarchy.
+- **Cline**: Layered SDK (shared → llms → agents → core). Multi-agent teams: coordinator → specialists. SDK allows custom hierarchies.
+- **Continue**: Single-agent check runner. No hierarchy.
+
+### How extensible/configurable without modifying source code?
+- **Goose**: Very — YAML config files, recipes, skills, extensions/MCP, env vars, custom distributions.
+- **Mentat**: Minimal — basic config file for model settings and file exclusions.
+- **Cline**: Very — SDK plugins, custom tools, hooks, rules, MCP servers, config CLI.
+- **Continue**: Limited — check files are markdown with YAML frontmatter; config via CLI/env.
+
+### Primary use case?
+- **Goose**: General-purpose AI agent (code, automation, research, writing)
+- **Mentat (archived)**: CLI-based AI coding assistant (context-aware code editing)
+- **Cline**: AI coding agent in IDE/terminal + SDK for custom agent apps
+- **Continue**: Source-controlled AI checks in CI/CD (pivoted from IDE assistant)
+
+### Activity level?
+- **Goose**: Very active — 45.5k stars, 4,541 commits, 134 releases, moved to Linux Foundation AAIF
+- **Mentat**: Archived — 2.6k stars, 560 commits, last code Jan 2024, archived Jan 2025
+- **Cline**: Very active — 62k stars, 5,919 commits, 260 releases, CLI v3.0.7 (May 18, 2026)
+- **Continue**: Active — 33.3k stars, 21,498 commits, 822 releases (but pivoted focus)
+
+### Language?
+- **Goose**: Rust + TypeScript
+- **Mentat (archived)**: Python (TypeScript for VS Code extension)
+- **Cline**: TypeScript (some Rust)
+- **Continue**: TypeScript (some Kotlin for JetBrains, Python)
+
+---
+
+## Tool calls made
+
+1. `webfetch` — github.com/block/goose (README)
+2. `webfetch` — github.com/AbanteAI/mentat (404)
+3. `webfetch` — github.com/cline/cline (README)
+4. `webfetch` — github.com/continuedev/continue (README)
+5. `webfetch` — github.com/AbanteAI/mentat (second attempt, 404)
+6. `webfetch` — github.com/continuedev/continue/blob/main/extensions/vscode/README.md
+7. `webfetch` — goose-docs.ai/docs/quickstart
+8. `webfetch` — docs.cline.bot (overview)
+9. `webfetch` — docs.cline.bot/sdk/overview
+10. `webfetch` — goose-docs.ai/docs/category/architecture-overview
+11. `webfetch` — docs.continue.dev (what is continue)
+12. `webfetch` — goose-docs.ai/docs/goose-architecture/
+13. `webfetch` — docs.cline.bot/sdk/architecture/overview
+14. `webfetch` — github.com/continuedev/continue/blob/main/core/README.md (404)
+15. `webfetch` — pypi.org/project/mentat/
+16. `webfetch` — npmjs.com/package/mentat (403)
+17. `webfetch` — goose-docs.ai/docs/guides/context-engineering/subagents
+18. `webfetch` — docs.cline.bot/sdk/tools
+19. `webfetch` — goose-docs.ai/docs/guides/context-engineering/using-skills
+20. `webfetch` — docs.cline.bot/sdk/plugins
+21. `webfetch` — goose-docs.ai/docs/guides/config-files
+22. `webfetch` — docs.continue.dev/checks/quickstart
+23. `webfetch` — goose-docs.ai/docs/guides/security/
+24. `webfetch` — docs.cline.bot/usage/cli-overview
+25. `webfetch` — github.com/AbanteAI/archive-old-cli-mentat
+26. `webfetch` — docs.cline.bot/sdk/guides/building-an-agent
+27. `webfetch` — docs.cline.bot/sdk/guides/creating-custom-tools
+28. `webfetch` — github.com/continuedev/continue/blob/main/core/package.json
+29. `webfetch` — raw.githubusercontent.com/AbanteAI/archive-old-cli-mentat/main/README.md
+30. `webfetch` — docs.continue.dev/mission-control/beyond-checks
+31. `webfetch` — raw.githubusercontent.com/AbanteAI/archive-old-cli-mentat/main/mentat/__init__.py
+32. `webfetch` — raw.githubusercontent.com/AbanteAI/archive-old-cli-mentat/main/pyproject.toml
+33. `webfetch` — web.archive.org/web/20240105225309/https://github.com/AbanteAI/mentat
+34. `webfetch` — mentat.ai (transport error)
+35. `webfetch` — docs.cline.bot/getting-started/authorizing-with-cline
+36. `webfetch` — raw.githubusercontent.com/AbanteAI/archive-old-cli-mentat/main/docs/overview.md (404)
diff --git a/research/emerging-specialized.md b/research/emerging-specialized.md
new file mode 100644
index 0000000..6be8f18
--- /dev/null
+++ b/research/emerging-specialized.md
@@ -0,0 +1,473 @@
+# Subagent Report: Emerging & Specialized AI Agent Harnesses
+
+## Research summary
+
+This report evaluates **OpenCode** (now **Crush**), **Plandex**, **GPT-Engineer**, **Claude Code**, and several other notable open-source AI coding agents against a 15-point requirements checklist for a multi-layered agent orchestration harness. Of all frameworks evaluated, **Claude Code** comes closest to the full requirements but is not fully open source and lacks a native three-layer dispatch->orchestrator->subagent hierarchy. **Plandex** is strong for large-scale plan-then-execute workflows but is single-agent. **Crush** (OpenCode's successor) is a well-architected terminal agent with LSP/MCP/skills but lacks hierarchy. **GPT-Engineer** is archived and unsuitable. **Bolt.diy** is a web app builder, not an agent harness. **Sweep** has pivoted to a JetBrains plugin.
+
+---
+
+## Findings
+
+### 1. OpenCode (archived) → Crush (successor)
+
+**GitHub**: [opencode-ai/opencode](https://github.com/opencode-ai/opencode) (archived Sep 18, 2025) → [charmbracelet/crush](https://github.com/charmbracelet/crush) (24.4k stars, active)
+
+**Language**: Go
+
+**Architecture**: OpenCode was a Go-based CLI/TUI coding agent built with Charm's Bubble Tea framework. It was archived in September 2025 and the project continued as **Crush** by the Charm team. Crush is actively maintained (v0.70.0, May 18, 2026) with 3,380+ commits.
+
+Crush's architecture is a **single-agent loop** with tool-calling capabilities. It supports spawning sub-tasks via the `agent` tool but does not have a built-in multi-layer hierarchy. It provides:
+
+- Interactive TUI + CLI + non-interactive prompt mode
+- Multi-provider LLM support (Anthropic, OpenAI, Google, Groq, OpenRouter, AWS Bedrock, Azure, local models)
+- LSP integration with configurable language server support
+- MCP protocol support (stdio, http, sse)
+- Skills via the Agent Skills open standard
+- Session management (save/load/switch sessions, SQLite-backed)
+- Plugin system through MCP and skills
+- Hooks (preliminary support)
+- Configurable permissions (`allowed_tools`, tool allow/deny)
+- Desktop notifications
+- Provider auto-updates from Catwalk database
+
+**Notable features for Dispatch comparison**: Crush reads `.claude/skills/` and `.cursor/skills/` directories for compatibility with Claude Code skills. It supports `.crushignore` files. It has `initialize_as` option to create AGENTS.md/CRUSH.md context files.
+
+**Confidence**: High — well-documented, active project, extensive README.
+
+---
+
+### 2. Plandex
+
+**GitHub**: [plandex-ai/plandex](https://github.com/plandex-ai/plandex) (15.4k stars, active, 1,483 commits)
+
+**Language**: Go (93.4%)
+
+**Architecture**: Plandex is a terminal-based AI development tool with a **plan-and-execute** workflow. It is a single-agent system with a cumulative diff review sandbox. Its architecture is:
+
+- **Plan branch**: AI proposes changes in a sandbox, kept separate from project files
+- **Apply phase**: User reviews and applies the cumulative diff
+- **Autonomous loop**: Can auto-execute commands, detect failures, debug, and retry
+
+Plandex handles up to 2M tokens of effective context via selective loading. It uses tree-sitter for project map generation (30+ languages). It has built-in version control for plans (branches, diffs). It integrates with git for commit message generation.
+
+**Key details**:
+- CLI + REPL with fuzzy auto-complete
+- Multi-provider: Anthropic, OpenAI, Google, OpenRouter, open source models
+- Context caching for all major providers
+- Configurable autonomy levels (full auto → step-by-step review)
+- Automated debugging of terminal commands and browser apps
+- Cloud hosted mode is winding down; self-hosted/local Docker mode is primary
+- No API mode, no TUI (terminal REPL only)
+
+**Confidence**: High — well-documented, active community, recent releases.
+
+---
+
+### 3. GPT-Engineer
+
+**GitHub**: [AntonOsika/gpt-engineer](https://github.com/AntonOsika/gpt-engineer) (55.2k stars, **archived** Apr 22, 2026)
+
+**Language**: Python (98.8%)
+
+**Architecture**: GPT-Engineer was a CLI platform for code generation experimentation. It used a simple single-agent prompt→generate loop. The project was archived in April 2026 and the last release was v0.3.1 (June 6, 2024). The README explicitly directs users to "aider" for a well-maintained CLI or "lovable.dev" for the commercial evolution.
+
+**Capabilities**:
+- Single-agent, one-shot code generation from natural language prompts
+- Support for OpenAI, Anthropic, and open source models
+- Benchmark support (APPS, MBPP)
+- Pre-prompt overrides for agent identity
+- Vision support via image inputs
+- Docker support
+
+**Confidence**: High — project is archived, low relevance for a new harness build.
+
+---
+
+### 4. Claude Code (Anthropic)
+
+**GitHub**: [anthropics/claude-code](https://github.com/anthropics/claude-code) (125k stars, very active, 627 commits)
+
+**Language**: Shell 47.1%, Python 29.2%, TypeScript 17.7% (Note: this repo contains the installer, plugins, examples, and documentation; the **core agent engine is proprietary** and not fully open source.)
+
+**Architecture**: Claude Code is a sophisticated agentic coding tool with a layered architecture:
+
+1. **Main session**: The primary agent loop that the user interacts with
+2. **Subagents**: Specialized agents spawned from the main session, each with:
+ - Its own context window
+ - Custom system prompts via YAML frontmatter in Markdown files
+ - Restricted/permissions-scoped tool sets
+ - Configurable models (sonnet/opus/haiku or full model IDs)
+ - Independent permission modes (default, acceptEdits, auto, dontAsk, bypassPermissions, plan)
+ - Persistent memory (user/project/local scope)
+ - Preloaded skills
+ - Optional git worktree isolation
+3. **Agent teams** (experimental, behind feature flag): Multiple full Claude Code sessions that coordinate via a shared task list, with peer-to-peer messaging via mailbox system
+
+**Surface area**: Terminal CLI, VS Code extension, JetBrains plugin, Desktop app, Web (claude.ai/code), Slack integration, GitHub Actions, GitLab CI/CD, iOS app
+
+**Key subsystems**:
+- **Skills system**: Full Agent Skills open standard support, directory-based (`.claude/skills/`), personal/project/plugin/managed scopes, YAML frontmatter with description/tools/context/agent fields, dynamic context injection via `` !`command` ``, argument substitution
+- **Memory system**: CLAUDE.md files (project/user/managed/org scopes), `.claude/rules/` for path-scoped instructions, auto memory (Claude writes learnings across sessions)
+- **Hooks**: PreToolUse, PostToolUse, SubagentStart/Stop, TeammateIdle, TaskCreated, TaskCompleted — shell commands at lifecycle events
+- **Permissions**: Tiered system (allow/ask/deny rules), wildcard matching, Bash/Read/Edit/WebFetch/MCP scoping, sandboxing (OS-level isolation), auto mode (ML classifier)
+- **MCP**: Model Context Protocol for external tool integration
+- **Agent SDK**: Python and TypeScript SDKs for building custom agents with Claude Code's tools and agent loop
+
+**Provider support**: Primarily uses Claude models but supports Amazon Bedrock, Google Vertex AI, Microsoft Azure AI Foundry as third-party backends.
+
+---
+
+### 5. Bolt.new / Bolt.diy
+
+**GitHub**: [stackblitz/bolt.new](https://github.com/stackblitz/bolt.new) (16.4k stars) / [stackblitz-labs/bolt.diy](https://github.com/stackblitz-labs/bolt.diy) (19.4k stars)
+
+**Language**: TypeScript
+
+**Architecture**: Bolt.new is a web-based AI-powered full-stack application generator using StackBlitz's WebContainers. bolt.diy is the community fork that supports any LLM. Both are **web application builders**, not agent harnesses or orchestrators.
+
+- Single-agent prompt→generate→preview loop
+- No agent hierarchy
+- No orchestrator or subagent concepts
+- Primarily browser-based (with Electron desktop app)
+- Supports 19+ LLM providers in bolt.diy
+- File locking system to prevent conflicts
+- Git integration for clone/import/deploy
+- MCP support
+- Diff view for AI changes
+
+**Confidence**: High — well-documented, but not applicable as an agent harness.
+
+---
+
+### 6. Sweep
+
+**GitHub**: [sweepai/sweep](https://github.com/sweepai/sweep) (7.7k stars)
+
+**Language**: Python, Jupyter Notebook, TypeScript
+
+**Note**: Sweep was originally a GitHub app for automated PR creation from issues. It has since pivoted to an AI coding assistant for JetBrains IDEs. The open source repo is largely dormant. Not applicable as a general agent harness.
+
+---
+
+### 7. Other Notable Frameworks Not Found
+
+**Kortix Suna**: Could not locate an active GitHub repository under sunaify/suna or similar names.
+
+**Devon (entelligence-ai/devon)**: Could not locate — the GitHub org/user was not found (404).
+
+---
+
+## Requirements Checklist
+
+### 1. Three-layer hierarchy (dispatch → orchestrator → subagent)
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ❌ Not supported | Single-agent loop; has `agent` tool for sub-tasks but no orchestrator layer |
+| **Plandex** | ❌ Not supported | Single-agent plan→execute loop |
+| **GPT-Engineer** | ❌ Not supported | Single-agent prompt→generate |
+| **Claude Code** | ⚠️ Partial | 2 layers: main agent → subagents. Agent teams (experimental) add peer coordination but not 3-tier hierarchy |
+| **Bolt.diy** | ❌ Not supported | Single-agent web app builder |
+| **Sweep** | ❌ Not supported | Single-agent GitHub PR generator (now JetBrains plugin) |
+
+### 2. Config-driven orchestrators
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ❌ Not supported | No orchestrator concept; config is for providers/model/agents |
+| **Plandex** | ❌ Not supported | Config is limited to model providers and autonomy level |
+| **GPT-Engineer** | ❌ Not supported | Pre-prompt override only |
+| **Claude Code** | ⚠️ Partial | Subagents defined via YAML frontmatter in `.md` files with tools/models/prompts/permissions. No subagent template spawning from orchestrator config |
+| **Bolt.diy** | ❌ Not supported | Provider config only |
+| **Sweep** | ❌ Not supported | — |
+
+### 3. Parallel subagent execution
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ❌ Not supported | Sub-agent tool is sequential |
+| **Plandex** | ❌ Not supported | Single-threaded plan→execute |
+| **GPT-Engineer** | ❌ Not supported | Single-threaded |
+| **Claude Code** | ✅ Supported | Subagents run in parallel; multiple background agents can run concurrently. Agent teams spawn multiple independent sessions |
+| **Bolt.diy** | ❌ Not supported | Single prompt→response |
+| **Sweep** | ❌ Not supported | — |
+
+### 4. Strict hierarchy communication (subagents only talk to parent)
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ⚠️ Partial | Agent tool calls return results; no peer-to-peer, but also no multi-layer parent chain |
+| **Plandex** | N/A | No subagents |
+| **GPT-Engineer** | N/A | No subagents |
+| **Claude Code** | ✅ Supported | Subagents report to parent only; no peer-to-peer for subagents. Note: Agent teams (experimental) intentionally allow peer-to-peer messaging |
+| **Bolt.diy** | N/A | No subagents |
+| **Sweep** | N/A | — |
+
+### 5. User-to-agent messaging mid-execution
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ❌ Not supported | User types at the main session; cannot message a sub-agent mid-task |
+| **Plandex** | ⚠️ Partial | User can interrupt and redirect during plan step, but not while AI is executing |
+| **GPT-Engineer** | ❌ Not supported | Batch process |
+| **Claude Code** | ✅ Supported | Users can interact with subagents directly via Shift+Down in in-process mode; can message agent team teammates directly |
+| **Bolt.diy** | ❌ Not supported | Single prompt-response |
+| **Sweep** | ❌ Not supported | — |
+
+### 6. Conflict prevention (non-overlapping file scopes)
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ❌ Not supported | No scope assignment |
+| **Plandex** | ⚠️ Partial | Cumulative diff sandbox prevents conflicts by keeping AI changes separate until user applies them |
+| **GPT-Engineer** | ❌ Not supported | — |
+| **Claude Code** | ⚠️ Partial | Git worktrees for subagent isolation; file locking in agent teams. No orchestrator-driven scope assignment |
+| **Bolt.diy** | ✅ Supported | File locking system prevents concurrent edits during AI code generation |
+| **Sweep** | ❌ Not supported | — |
+
+### 7. Role-scoped tooling
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ✅ Supported | Different agents can have different tool sets via `agents` config with `tools` field |
+| **Plandex** | ❌ Not supported | Single agent with all tools |
+| **GPT-Engineer** | ❌ Not supported | Single agent |
+| **Claude Code** | ✅ Supported | Full role-scoped tooling via `tools` and `disallowedTools` frontmatter in subagent definitions |
+| **Bolt.diy** | ❌ Not supported | Single agent |
+| **Sweep** | ❌ Not supported | — |
+
+### 8. Skills system (injectable markdown instructions, directory-based)
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ✅ Supported | Supports Agent Skills open standard. Reads from `.crush/skills/`, `.claude/skills/`, `.agents/skills/`, `.cursor/skills/`, plus user-level paths |
+| **Plandex** | ❌ Not supported | No skills system |
+| **GPT-Engineer** | ⚠️ Partial | Has pre-prompt override files but not a structured skills system |
+| **Claude Code** | ✅ Supported | Full Agent Skills open standard support, YAML frontmatter, personal/project/plugin/managed scopes, dynamic context injection, argument substitution, preloading into subagents |
+| **Bolt.diy** | ❌ Not supported | No skills system |
+| **Sweep** | ❌ Not supported | — |
+
+### 9. LSP integration
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ✅ Supported | LSP integration with multi-language support via configurable language server commands |
+| **Plandex** | ⚠️ Partial | Tree-sitter for syntax validation and project maps, but no LSP diagnostics tool |
+| **GPT-Engineer** | ❌ Not supported | — |
+| **Claude Code** | ✅ Supported | LSP through IDE integrations (VS Code, JetBrains); diagnostics tool available to agents |
+| **Bolt.diy** | ❌ Not supported | — |
+| **Sweep** | ❌ Not supported | — |
+
+### 10. Shell access with directory permissions
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ⚠️ Partial | Has `allowed_tools` permission allowlist but no directory-based permission scoping. Crush has `--yolo` flag to bypass |
+| **Plandex** | ❌ Not supported | Has shell access but no permission system |
+| **GPT-Engineer** | ❌ Not supported | Shell execution without permissions |
+| **Claude Code** | ✅ Supported | Full permission system: allow/ask/deny rules, wildcard matching, absolute/project-relative/home-relative path patterns, sandboxing (OS-level filesystem/network isolation), auto mode with ML classifier |
+| **Bolt.diy** | ❌ Not supported | — |
+| **Sweep** | ❌ Not supported | — |
+
+### 11. Session management (forking, model switching, resume)
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ✅ Supported | Session save/load/switch, model switching mid-session, session persistence via SQLite |
+| **Plandex** | ✅ Supported | Plan version control with branching; model switching supported |
+| **GPT-Engineer** | ❌ Not supported | No session persistence |
+| **Claude Code** | ✅ Supported | Full session management: resume (`-r`, `-c`), fork (`/fork`), model switching (`/model`), chat history persistence, auto memory, `/compact` for context management |
+| **Bolt.diy** | ⚠️ Partial | Chat history via file system, no forking or model switching |
+| **Sweep** | ❌ Not supported | — |
+
+### 12. Human-in-the-loop checkpoints
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ❌ Not supported | No checkpoint system; permission prompts are per-tool |
+| **Plandex** | ✅ Supported | Configurable autonomy from full-auto to step-by-step approval; user reviews cumulative diff before applying |
+| **GPT-Engineer** | ❌ Not supported | No HITL |
+| **Claude Code** | ✅ Supported | Permission prompts, permission modes (default/acceptEdits/plan/auto), plan mode for review-before-edit. Checkpoints via hooks for custom approval workflows |
+| **Bolt.diy** | ❌ Not supported | Auto-approves all AI changes |
+| **Sweep** | ❌ Not supported | — |
+
+### 13. State persistence (sessions, plans, artifacts across restarts)
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ✅ Supported | SQLite-based session persistence; project-specific context files |
+| **Plandex** | ✅ Supported | Plans persist and can be resumed; cumulative diff sandbox stores pending changes |
+| **GPT-Engineer** | ❌ Not supported | No persistence |
+| **Claude Code** | ✅ Supported | Sessions persist in `~/.claude/projects/`; auto memory persists across sessions; CLAUDE.md files are disk-based |
+| **Bolt.diy** | ⚠️ Partial | Project snapshots via browser storage and file system |
+| **Sweep** | ❌ Not supported | — |
+
+### 14. Provider-agnostic LLM
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ✅ Supported | 15+ providers including Anthropic, OpenAI, Google, Groq, OpenRouter, AWS Bedrock, Azure, local models, Ollama, LM Studio |
+| **Plandex** | ✅ Supported | Anthropic, OpenAI, Google, OpenRouter, open source providers; OpenRouter as primary gateway |
+| **GPT-Engineer** | ✅ Supported | OpenAI, Anthropic, open source/local models |
+| **Claude Code** | ⚠️ Partial | Primarily Claude models; supports Amazon Bedrock, Google Vertex AI, Microsoft Azure as third-party backends |
+| **Bolt.diy** | ✅ Supported | 19+ providers including OpenAI, Anthropic, Google, Ollama, OpenRouter, DeepSeek, Groq, etc. |
+| **Sweep** | ❌ Not applicable | — |
+
+### 15. Multiple interfaces (CLI, TUI, API)
+
+| Framework | Status | Notes |
+|-----------|--------|-------|
+| **OpenCode → Crush** | ✅ Supported | Interactive TUI, CLI (non-interactive mode with `-p`), scripting support |
+| **Plandex** | ⚠️ Partial | REPL (TUI-like interactive mode) and CLI commands. No API mode |
+| **GPT-Engineer** | ⚠️ Partial | CLI only |
+| **Claude Code** | ✅ Supported | Terminal CLI (interactive + `-p`), VS Code extension, JetBrains plugin, Desktop app, Web UI, Slack, Agent SDK (Python/TypeScript API), GitHub Actions, GitLab CI/CD |
+| **Bolt.diy** | ⚠️ Partial | Web UI, Electron desktop app, no API |
+| **Sweep** | ❌ Not applicable | — |
+
+---
+
+## Summary Comparison Table
+
+| Requirement | OpenCode→Crush | Plandex | GPT-Engineer | Claude Code | Bolt.diy |
+|---|---|---|---|---|---|
+| **1. Three-layer hierarchy** | ❌ | ❌ | ❌ | ⚠️ Partial | ❌ |
+| **2. Config-driven orchestrators** | ❌ | ❌ | ❌ | ⚠️ Partial | ❌ |
+| **3. Parallel subagent execution** | ❌ | ❌ | ❌ | ✅ | ❌ |
+| **4. Strict hierarchy communication** | N/A | N/A | N/A | ✅ | N/A |
+| **5. Mid-execution user messaging** | ❌ | ⚠️ | ❌ | ✅ | ❌ |
+| **6. Conflict prevention** | ❌ | ⚠️ | ❌ | ⚠️ | ✅ |
+| **7. Role-scoped tooling** | ✅ | ❌ | ❌ | ✅ | ❌ |
+| **8. Skills system** | ✅ | ❌ | ⚠️ | ✅ | ❌ |
+| **9. LSP integration** | ✅ | ⚠️ | ❌ | ✅ | ❌ |
+| **10. Shell + directory permissions** | ⚠️ | ❌ | ❌ | ✅ | ❌ |
+| **11. Session management** | ✅ | ✅ | ❌ | ✅ | ⚠️ |
+| **12. HITL checkpoints** | ❌ | ✅ | ❌ | ✅ | ❌ |
+| **13. State persistence** | ✅ | ✅ | ❌ | ✅ | ⚠️ |
+| **14. Provider-agnostic LLM** | ✅ | ✅ | ✅ | ⚠️ | ✅ |
+| **15. Multiple interfaces** | ✅ | ⚠️ | ⚠️ | ✅ | ⚠️ |
+| **Open source** | ✅ (MIT) | ✅ (MIT) | ✅ (MIT) | ⚠️ (Partial) | ✅ (MIT) |
+| **Language** | Go | Go | Python | Multi | TypeScript |
+| **Activity** | ✅ Very Active | ✅ Active | ❌ Archived | ✅ Very Active | ✅ Active |
+| **Stars** | 24.4k (Crush) | 15.4k | 55.2k | 125k | 19.4k |
+
+---
+
+## Key Questions Answered
+
+### 1. Core architecture and agent hierarchy depth
+
+- **Crush**: Single-agent TUI/CLI with sub-task spawning (2 layers max). No built-in orchestrator concept.
+- **Plandex**: Single-agent plan→execute loop. No hierarchy.
+- **GPT-Engineer**: Single-agent prompt→generate. No hierarchy.
+- **Claude Code**: Main agent → subagents (2 layers). Experimental agent teams add peer-level coordination. No native 3-tier dispatch→orchestrator→subagent.
+- **Bolt.diy**: Single-agent web app builder. No hierarchy.
+
+### 2. Extensibility without source modification
+
+- **Crush**: JSON config files, skills via markdown, MCP servers, LSP config. Moderate extensibility.
+- **Plandex**: Model/provider config, autonomy level settings. Limited extensibility.
+- **GPT-Engineer**: Pre-prompt overrides only. Very limited.
+- **Claude Code**: Most extensible — YAML frontmatter subagents, skills, hooks, MCP, permission rules via JSON config files, managed settings for org-wide policy, plugin system. No source modification needed.
+- **Bolt.diy**: Provider config, some environment variables. Limited.
+
+### 3. Primary use case
+
+- **Crush**: General-purpose AI coding assistant for terminal users
+- **Plandex**: Large-scale, multi-step coding tasks requiring plan review and diff sandboxing
+- **GPT-Engineer**: (Archived) Experimental code generation platform
+- **Claude Code**: Production AI coding assistant with enterprise features, team collaboration, and multi-surface support
+- **Bolt.diy**: Rapid full-stack web application prototyping in the browser
+
+### 4. Project activity
+
+- **Crush**: Very active — v0.70.0 (May 18, 2026), 24.4k stars, 3,380+ commits, Charm ecosystem backing
+- **Plandex**: Active — v2.2.1 (Jul 16, 2025), 15.4k stars, 1,483 commits, active Discord
+- **GPT-Engineer**: Archived — last release v0.3.1 (Jun 6, 2024), 55.2k stars, read-only
+- **Claude Code**: Very active — 125k stars, 627 commits, active development, Anthropic backing
+- **Bolt.diy**: Active — v1.0.0 (May 12, 2025), 19.4k stars, community-driven
+
+### 5. Language
+
+- **Crush**: Go
+- **Plandex**: Go
+- **GPT-Engineer**: Python
+- **Claude Code**: Shell/Python/TypeScript (installer + plugins); core engine is proprietary binary
+- **Bolt.diy**: TypeScript
+
+### 6. Other notable open-source agent harnesses (2024-2025)
+
+- **Crush** (charmbracelet/crush) — The most notable new entrant, as successor to OpenCode. Go-based, well-architected, Charm ecosystem.
+- **Bolt.diy** (stackblitz-labs/bolt.diy) — Significant as a community fork enabling any LLM with Bolt.new's WebContainer architecture.
+- **Claude Code Agent SDK** — Released mid-2025, enables building custom agents with Claude Code's tool set. Available in Python and TypeScript.
+
+Could not locate active repositories for **Kortix Suna** or **Devon** as of May 2026.
+
+---
+
+## Source list
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [OpenCode GitHub](https://github.com/opencode-ai/opencode) | GitHub |
+| 2 | [Crush GitHub](https://github.com/charmbracelet/crush) | GitHub |
+| 3 | [Plandex GitHub](https://github.com/plandex-ai/plandex) | GitHub |
+| 4 | [GPT-Engineer GitHub](https://github.com/AntonOsika/gpt-engineer) | GitHub |
+| 5 | [Claude Code GitHub](https://github.com/anthropics/claude-code) | GitHub |
+| 6 | [Claude Code Overview](https://code.claude.com/docs/en/overview) | Official docs |
+| 7 | [Claude Code Subagents](https://code.claude.com/docs/en/sub-agents) | Official docs |
+| 8 | [Claude Code Skills](https://code.claude.com/docs/en/skills) | Official docs |
+| 9 | [Claude Code Memory](https://code.claude.com/docs/en/memory) | Official docs |
+| 10 | [Claude Code Agent SDK](https://code.claude.com/docs/en/agent-sdk/overview) | Official docs |
+| 11 | [Claude Code Settings](https://code.claude.com/docs/en/settings) | Official docs |
+| 12 | [Claude Code Permissions](https://code.claude.com/docs/en/permissions) | Official docs |
+| 13 | [Claude Code Agent Teams](https://code.claude.com/docs/en/agent-teams) | Official docs |
+| 14 | [Claude Code Quickstart](https://code.claude.com/docs/en/quickstart) | Official docs |
+| 15 | [Bolt.new GitHub](https://github.com/stackblitz/bolt.new) | GitHub |
+| 16 | [Bolt.diy GitHub](https://github.com/stackblitz-labs/bolt.diy) | GitHub |
+| 17 | [Sweep GitHub](https://github.com/sweepai/sweep) | GitHub |
+| 18 | [Dispatch Requirements](file:///home/tradam/projects/dispatch/requirements.md) | Project file |
+
+---
+
+## Verbatim quotes
+
+- "This repository is no longer maintained and has been archived for provenance. The project has continued under the name Crush" — [OpenCode GitHub](https://github.com/opencode-ai/opencode)
+- "Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster" — [Claude Code GitHub](https://github.com/anthropics/claude-code)
+- "Subagents are specialized AI assistants that handle specific types of tasks." — [Claude Code Subagents](https://code.claude.com/docs/en/sub-agents)
+- "Agent teams let you coordinate multiple Claude Code instances working together. One session acts as the team lead, coordinating work, assigning tasks, and synthesizing results." — [Claude Code Agent Teams](https://code.claude.com/docs/en/agent-teams)
+- "Plandex is a terminal-based AI development tool that can plan and execute large coding tasks that span many steps and touch dozens of files." — [Plandex GitHub](https://github.com/plandex-ai/plandex)
+- "Crush: Your new coding bestie, now available in your favourite terminal. Your tools, your code, and your workflows, wired into your LLM of choice." — [Crush GitHub](https://github.com/charmbracelet/crush)
+- "Claude Code supports fine-grained permissions so that you can specify exactly what the agent is allowed to do and what it cannot." — [Claude Code Permissions](https://code.claude.com/docs/en/permissions)
+- "GPT-Engineer lets you: Specify software in natural language, Sit back and watch as an AI writes and executes the code" — [GPT-Engineer GitHub](https://github.com/AntonOsika/gpt-engineer)
+- "This repository was archived by the owner on Apr 22, 2026. It is now read-only." — [GPT-Engineer GitHub](https://github.com/AntonOsika/gpt-engineer)
+- "Skills extend what Claude can do. Create a SKILL.md file with instructions, and Claude adds it to its toolkit." — [Claude Code Skills](https://code.claude.com/docs/en/skills)
+
+---
+
+## Source quality flags
+
+- **Claude Code GitHub repo**: The repository (anthropics/claude-code) contains primarily installer scripts, plugins, examples, and documentation — the core agent engine is proprietary and distributed via `npm`/native installers. The docs at code.claude.com are official Anthropic documentation.
+- **GPT-Engineer**: Archived project; README explicitly directs users to other tools. Not representative of current capabilities.
+- **Sweep**: README indicates project has pivoted to a JetBrains plugin. The open source repo is not actively maintained for the use case being evaluated.
+
+---
+
+## Confidence: High
+
+All major frameworks evaluated have been thoroughly researched via their GitHub repositories and official documentation. Claude Code's proprietary core limits the depth of source-code-level analysis, but its public documentation is extensive and detailed. Frameworks like Kortix Suna and Devon could not be located, indicating they may have been renamed, deprecated, or are not publicly available under those names.
+
+---
+
+## Gaps and open questions
+
+1. **Kortix Suna** and **Devon** could not be located — they may exist under different GitHub orgs, have been renamed, or are not publicly available.
+2. **Claude Code's core architecture** is not fully open source; the proprietary agent engine cannot be audited for architectural details. The analysis is based on publicly documented behavior.
+3. **Claude Code's Agent SDK** (released mid-2025) represents a new category — it provides programmatic access to Claude Code's agent loop. A follow-up investigation could evaluate it as an alternative to building from scratch.
+4. **Crush** is still in v0.x (v0.70.0) — some features like hooks are marked as "preliminary" and may not be production-ready.
+5. **None of the evaluated frameworks** provide a native three-layer (dispatch→orchestrator→subagent) architecture. This appears to be a novel design not present in existing open-source tools.
+
+---
+
+## Tool calls made
+
+Web fetches: 20 (OpenCode, Plandex, GPT-Engineer, Claude Code, Sweep, Devon attempted, Bolt.new, Plandex docs, Claude Code overview, sub-agents, skills, memory, Agent SDK, settings, permissions, agent teams, quickstart, Crush, Bolt.diy, Sweep README)
diff --git a/research/general-agent-platforms.md b/research/general-agent-platforms.md
new file mode 100644
index 0000000..a7f1bd4
--- /dev/null
+++ b/research/general-agent-platforms.md
@@ -0,0 +1,299 @@
+# Subagent Report: General Agent Platforms (SuperAGI, Agency Swarm, Semantic Kernel)
+
+## Research summary
+
+This report evaluates three open-source general AI agent platforms — **SuperAGI**, **Agency Swarm**, and **Microsoft Semantic Kernel** — against 15 specific requirements for an agent harness architecture. SuperAGI is a single-agent autonomous agent framework with a GUI and tool marketplace, not designed for hierarchical multi-agent orchestration. Agency Swarm is a Python multi-agent orchestration framework built on the OpenAI Agents SDK, supporting structured communication flows with parallel execution. Microsoft Semantic Kernel is an enterprise-grade SDK (C#/Python/Java) for building AI agents and multi-agent systems, now succeeded by Microsoft Agent Framework. None of the three fully support all 15 requirements; Agency Swarm comes closest for hierarchical orchestration needs, while Semantic Kernel provides the richest SDK for plugin-extensible agent systems.
+
+**Confidence**: High — all findings are based on official GitHub repositories, documentation sites, and framework README files.
+
+---
+
+## Findings
+
+### 1. SuperAGI
+
+#### Core Architecture
+
+SuperAGI is a **dev-first open-source autonomous AI agent framework** written in Python. It is designed for building, managing, and running single autonomous agents with tool augmentation. The architecture uses a Celery task queue, PostgreSQL database, Redis for caching, and multiple vector store options (Pinecone, Weaviate, Chroma, Qdrant). Agents operate as independent ReAct (Reasoning + Acting) loops with goals, instructions, constraints, and tools.
+
+- **Language**: Python
+- **GitHub stars**: ~17.5k, 2,342 commits
+- **Latest release**: v0.0.11 (appears stale — README states "Under Development!")
+- **Primary use case**: Running individual autonomous agents with tool augmentation
+- **Architecture diagrams** show a single agent workflow with tool integration, not multi-agent hierarchy
+
+Key architectural components visible in the codebase:
+- `superagi/agent/` — single-agent execution loop with prompt building, tool execution, task queue
+- `config_template.yaml` — YAML-based configuration for system settings (API keys, model, DB, storage)
+- `gui/` — React-based graphical user interface
+- `tools.json` — tool definitions
+
+[Source: SuperAGI GitHub README](https://github.com/TransformerOptimus/SuperAGI)
+[Source: SuperAGI config_template.yaml](https://raw.githubusercontent.com/TransformerOptimus/SuperAGI/main/config_template.yaml)
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Explanation |
+|---|-------------|--------|-------------|
+| 1 | **Three-layer hierarchy** | **Not supported** | SuperAGI is a single-agent framework. No orchestrator-subagent hierarchy exists. Multiple agents can run but are independent. |
+| 2 | **Config-driven orchestrators** | **Not supported** | System config (API keys, model, DB) is YAML-driven, but agent definitions, tools, and workflows are defined in Python code, not config files. |
+| 3 | **Parallel subagent execution** | **Partially supported** | README states "You can run concurrent agents seamlessly" but these are independent agent runs, not orchestrated subagents. No parent-child coordination of parallelism. |
+| 4 | **Strict hierarchy communication** | **Not supported** | No inter-agent communication at all. Each agent is independent. |
+| 5 | **User-to-agent messaging mid-execution** | **Partially supported** | "Action Console" allows user interaction by giving agents input and permissions, but this is at the GUI level, not arbitrary injection to running agents. |
+| 6 | **Conflict prevention** | **Not supported** | No mechanism for assigning non-overlapping file scopes to parallel agents. |
+| 7 | **Role-scoped tooling** | **Partially supported** | Agents can be configured with different toolkits from the marketplace, but this is done through the GUI/database, not through a code-level role system. |
+| 8 | **Skills system (markdown instructions)** | **Partially supported** | Uses prompt templates with variable substitution (`{goals}`, `{instructions}`, `{constraints}`), but not a markdown file-based skills directory system. |
+| 9 | **LSP integration** | **Not supported** | No Language Server Protocol integration. |
+| 10 | **Shell access with permissions** | **Not supported** | No built-in shell tool. Tools are pre-built plugins from the marketplace. |
+| 11 | **Session management** | **Partially supported** | "Agent Memory Storage" for learning/adaptation, "Performance Telemetry" for insights. No chat forking or model switching mid-conversation. |
+| 12 | **Human-in-the-loop checkpoints** | **Partially supported** | Action Console provides some user input/permissions, but no configurable checkpoint mechanism. |
+| 13 | **State persistence** | **Partially supported** | Uses PostgreSQL for agent state and file storage for workspace artifacts (`RESOURCES_OUTPUT_ROOT_DIR`). Sessions persist via DB. |
+| 14 | **Provider-agnostic LLM** | **Supported** | Supports OpenAI, Google PaLM, Replicate, HuggingFace. Configurable via `config_template.yaml`. Local LLM support via Text Generation Web UI. |
+| 15 | **Multiple interfaces** | **Supported** | Has GUI (React web interface), CLI (`cli2.py`), and REST API (via FastAPI/Postman docs). |
+
+---
+
+### 2. Agency Swarm
+
+#### Core Architecture
+
+Agency Swarm is a **Python multi-agent orchestration framework** built on the OpenAI Agents SDK. It organizes agents into structured communication flows that mirror real-world organizational structures. Agents are defined with roles, instructions (from markdown files), tools, and MCP server support. Communication flows are directional tuples that define which agents can message which.
+
+- **Language**: Python (97.8%), some JavaScript/TypeScript for TUI
+- **GitHub stars**: ~4.4k, 2,412 commits
+- **Latest release**: v1.9.8 (May 6, 2026 — very active)
+- **Primary use case**: Multi-agent orchestration with structured communication patterns
+- **Python**: 3.12+
+
+Key features from the documentation:
+- **Communication flows**: Directional paths using `(sender, receiver)` tuples. Supports Handoff and Orchestrator-Worker patterns.
+- **Parallel execution**: "the CEO can assign tasks to both Developer and Virtual Assistant, so they will run in parallel in different threads and come back with their results to the CEO"
+- **Per-agent configuration**: Each agent has its own `instructions.md`, `files/`, `tools/`, `schemas/` directories
+- **State persistence**: `load_threads_callback` and `save_threads_callback` for thread persistence
+- **Multiple run modes**: TUI, CopilotKit/AG-UI web interface, FastAPI HTTP API, async programmatic
+
+Key architecture code pattern:
+```python
+agency = Agency(
+ ceo, dev, # Entry points
+ communication_flows=[
+ (ceo, dev), # Director can delegate to developer
+ (ceo, va), # Director can delegate to virtual assistant
+ ],
+ shared_instructions="./agency_manifesto.md",
+ shared_tools=[get_current_time],
+)
+```
+
+[Source: Agency Swarm GitHub README](https://github.com/VRSEN/agency-swarm)
+[Source: Agency Swarm Docs - Overview](https://agency-swarm.ai/core-framework/agencies/overview)
+[Source: Agency Swarm Docs - Communication Flows](https://agency-swarm.ai/core-framework/agencies/communication-flows)
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Explanation |
+|---|-------------|--------|-------------|
+| 1 | **Three-layer hierarchy** | **Partially supported** | Supports 2-layer hierarchy (entry point orchestrator → workers). The Orchestrator-Worker pattern has an orchestrator delegating to workers. A 3rd layer would require nesting agencies or custom coding — not natively supported. |
+| 2 | **Config-driven orchestrators** | **Not supported** | Agents and communication flows are defined in Python code. Instructions can be loaded from markdown files, but the orchestration structure itself is code-defined, not config-driven. |
+| 3 | **Parallel subagent execution** | **Supported** | Explicitly supported: "the CEO can assign tasks to both Developer and Virtual Assistant, so they will run in parallel in different threads and come back with their results to the CEO." [Source](https://agency-swarm.ai/core-framework/agencies/communication-flows) |
+| 4 | **Strict hierarchy communication** | **Supported** | Communication flows are directional tuples. By defining only parent→child flows and omitting peer flows, communication is restricted to parent-child only. |
+| 5 | **User-to-agent messaging mid-execution** | **Partially supported** | TUI supports `@mentions` to direct messages to specific agents. FastAPI supports `recipient_agent` parameter. However, mid-execution injection to running agents is not documented. |
+| 6 | **Conflict prevention** | **Not supported** | No built-in mechanism for assigning non-overlapping file scopes to parallel agents. |
+| 7 | **Role-scoped tooling** | **Supported** | Each agent has its own `tools=[]`, `tools_folder`, `schemas_folder`. Tools are role-specific. Shared tools can be applied via `shared_tools`. |
+| 8 | **Skills system (markdown instructions)** | **Partially supported** | Instructions can be loaded from markdown files (`instructions.md`). Shared instructions via `shared_instructions` (file path). Per-agent folder structure includes `tools/`, `files/`, `schemas/`. However, it's not a directory-based skills system with global + project level override. |
+| 9 | **LSP integration** | **Not supported** | No Language Server Protocol integration. |
+| 10 | **Shell access with permissions** | **Not supported** | No native shell execution tool. Relies on OpenAI function calling for tool execution. |
+| 11 | **Session management** | **Partially supported** | Thread persistence via `load_threads_callback`/`save_threads_callback`. `/cost` command in TUI. No chat forking or model switching mid-conversation documented. |
+| 12 | **Human-in-the-loop checkpoints** | **Partially supported** | Has input and output guardrails that can check/modify messages. No explicit configurable checkpoint mechanism for pausing execution. |
+| 13 | **State persistence** | **Partially supported** | Thread state can be persisted via callbacks. User context accessible by agents. No built-in plan/artifact persistence mechanism. |
+| 14 | **Provider-agnostic LLM** | **Supported** | Native OpenAI (GPT-5, GPT-4o). Via LiteLLM router: Anthropic, Google Gemini, Grok, Azure OpenAI, OpenRouter. |
+| 15 | **Multiple interfaces** | **Supported** | TUI (`agency.tui()`, `npx @vrsen/agentswarm`), CopilotKit/AG-UI web interface (`agency.copilot_demo()`), FastAPI HTTP API, async/sync programmatic API. |
+
+---
+
+### 3. Microsoft Semantic Kernel
+
+#### Core Architecture
+
+Semantic Kernel is an **enterprise-grade SDK** for integrating LLMs into applications and building AI agents and multi-agent systems. It is written in C# (primary), Python, and Java. The kernel acts as a Dependency Injection container that manages AI services and plugins. The Agent Framework (within SK) provides agent types and orchestration patterns. **Important**: Semantic Kernel has been succeeded by **Microsoft Agent Framework** (MAF), which is the enterprise-ready successor at v1.0.
+
+- **Languages**: C# (66.8%), Python (31.3%), Java
+- **GitHub stars**: ~27.9k, 4,991 commits
+- **Latest release**: python-1.42.0 (May 14, 2026 — very active)
+- **Primary use case**: Building AI-powered applications with LLM integration, plugin extensibility, and multi-agent orchestration
+- **Python**: 3.10+, .NET: .NET 10.0+, Java: JDK 17+
+
+Key architectural components:
+- **Kernel**: Central DI container holding services (AI, logging, HTTP) and plugins
+- **Plugins**: Collections of functions with semantic descriptions, supporting native code, OpenAPI, and MCP
+- **Agent Framework**: `ChatCompletionAgent`, `OpenAIAssistantAgent`, `AzureAIAgent`, `OpenAIResponsesAgent`
+- **Orchestration Framework** (experimental): `ConcurrentOrchestration`, `SequentialOrchestration`, `HandoffOrchestration`, `GroupChatOrchestration`, `MagenticOrchestration`
+
+[Source: Semantic Kernel GitHub README](https://github.com/microsoft/semantic-kernel)
+[Source: SK Agent Framework Docs](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/)
+[Source: SK Orchestration Docs](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/)
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Explanation |
+|---|-------------|--------|-------------|
+| 1 | **Three-layer hierarchy** | **Not supported** | Orchestration patterns support 2 layers (orchestrator ↔ agents). GroupChat supports collaborative patterns but not a strict 3-tier dispatch→orchestrator→subagent chain. The experimental orchestration framework is flat. |
+| 2 | **Config-driven orchestrators** | **Not supported** | Semantic Kernel is code-first SDK. Orchestration patterns are instantiated via code (e.g., `ConcurrentOrchestration(members=[...])`). Prompt templates can be loaded from files but orchestration structure is not config-driven. |
+| 3 | **Parallel subagent execution** | **Supported** | `ConcurrentOrchestration` runs multiple agents in parallel on the same task. Results are collected independently. [Source](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/concurrent) |
+| 4 | **Strict hierarchy communication** | **Not supported** | Agents in GroupChat/Orchestration all communicate via shared history. No mechanism to restrict communication to parent-child only. |
+| 5 | **User-to-agent messaging mid-execution** | **Partially supported** | Messages can be added to `ChatHistory` at any point, but not designed for injecting into a running orchestration mid-execution. |
+| 6 | **Conflict prevention** | **Not supported** | No built-in file scope assignment mechanism. |
+| 7 | **Role-scoped tooling** | **Supported** | Each agent can be configured with different plugins/function tools. Plugins can be per-agent using the kernel's DI system. |
+| 8 | **Skills system (markdown instructions)** | **Partially supported** | Has a powerful plugin system with semantic kernel functions. Prompt templates support variable substitution. Not a markdown/text-based skills directory system with global/project override. |
+| 9 | **LSP integration** | **Not supported** | No Language Server Protocol integration. |
+| 10 | **Shell access with permissions** | **Not supported** | No built-in shell execution tool. Code execution would need a custom plugin. |
+| 11 | **Session management** | **Not supported** | No built-in session management, chat forking, or model switching. ChatHistory is in-memory. |
+| 12 | **Human-in-the-loop checkpoints** | **Partially supported** | Human-in-the-loop is mentioned for "task automation functions" but there is no explicit configurable checkpoint mechanism in the agent framework. |
+| 13 | **State persistence** | **Not supported** | No built-in state persistence. ChatHistory, plans, artifacts are in-memory. Requires custom implementation for persistence. |
+| 14 | **Provider-agnostic LLM** | **Supported** | Built-in support for OpenAI, Azure OpenAI, Hugging Face, NVIDIA. Local deployment with Ollama, LMStudio, ONNX. |
+| 15 | **Multiple interfaces** | **Not supported** | SDK only. No CLI, TUI, or built-in API server. Must build your own interfaces. |
+
+---
+
+## Comparison Summary Table
+
+| # | Requirement | SuperAGI | Agency Swarm | Semantic Kernel |
+|---|-------------|----------|--------------|-----------------|
+| 1 | Three-layer hierarchy | ❌ Not supported | ⚠️ Partial (2-layer) | ❌ Not supported |
+| 2 | Config-driven orchestrators | ❌ Not supported | ❌ Not supported | ❌ Not supported |
+| 3 | Parallel subagent execution | ⚠️ Partial (independent runs) | ✅ Supported | ✅ Supported |
+| 4 | Strict hierarchy communication | ❌ Not supported | ✅ Supported | ❌ Not supported |
+| 5 | User-to-agent messaging mid-execution | ⚠️ Partial (Action Console) | ⚠️ Partial (@mentions, recipient_agent) | ⚠️ Partial (ChatHistory injection) |
+| 6 | Conflict prevention | ❌ Not supported | ❌ Not supported | ❌ Not supported |
+| 7 | Role-scoped tooling | ⚠️ Partial (toolkit marketplace) | ✅ Supported | ✅ Supported |
+| 8 | Skills system (markdown instructions) | ⚠️ Partial (prompt templates) | ⚠️ Partial (instructions.md, shared_instructions) | ⚠️ Partial (plugin system) |
+| 9 | LSP integration | ❌ Not supported | ❌ Not supported | ❌ Not supported |
+| 10 | Shell access with permissions | ❌ Not supported | ❌ Not supported | ❌ Not supported |
+| 11 | Session management | ⚠️ Partial (memory storage, telemetry) | ⚠️ Partial (thread callbacks) | ❌ Not supported |
+| 12 | Human-in-the-loop checkpoints | ⚠️ Partial (Action Console) | ⚠️ Partial (guardrails) | ⚠️ Partial (mentioned for plugins) |
+| 13 | State persistence | ⚠️ Partial (PostgreSQL, file storage) | ⚠️ Partial (thread callbacks, user context) | ❌ Not supported |
+| 14 | Provider-agnostic LLM | ✅ Supported | ✅ Supported | ✅ Supported |
+| 15 | Multiple interfaces | ✅ Supported (GUI, CLI, API) | ✅ Supported (TUI, Web, API, programmatic) | ❌ Not supported (SDK only) |
+
+---
+
+## Key Questions Answered
+
+### 1. What is each framework's core architecture? How many layers of agent hierarchy does it support?
+
+- **SuperAGI**: Single autonomous agent architecture. No hierarchy — each agent is independent. Maximum layers: 1.
+- **Agency Swarm**: Multi-agent orchestration with directional communication flows. Supports 2 layers (entry point orchestrator → worker agents). Can be stretched by nesting but not natively supported.
+- **Semantic Kernel**: SDK for building AI agents with orchestration patterns. Supports 2 layers (orchestrator → participating agents). Orchestration framework is experimental.
+
+### 2. How extensible/configurable is each framework without modifying source code?
+
+- **SuperAGI**: System-level YAML config for API keys, models, DB. Tools via marketplace/JSON. Agent prompt templates with variable substitution. Limited config without modifying Python code.
+- **Agency Swarm**: Instructions from markdown files. Tools from folder discovery. MCP server support. Communication flows via code. No config-file-based agent definition.
+- **Semantic Kernel**: Plugin system via native code, OpenAPI specs, or MCP servers. Prompt templates. Everything requires code (C#, Python, Java).
+
+### 3. What is the primary use case each framework was designed for?
+
+- **SuperAGI**: Running individual autonomous AI agents with tool augmentation for task completion.
+- **Agency Swarm**: Multi-agent collaboration with structured organizational patterns (CEO, developer, virtual assistant roles).
+- **Semantic Kernel**: Integrating LLMs into enterprise .NET/Python applications with plugin extensibility and multi-agent orchestration.
+
+### 4. How active is each project?
+
+- **SuperAGI**: Stale. v0.0.11 release, README states "Under Development!" with 182 open issues. Community: Discord, Reddit. Creator active on Twitter.
+- **Agency Swarm**: Very active. v1.9.8 (May 6, 2026). 61 releases. 2,412 commits. Active maintainer (VRSEN). Discord community. YouTube channel.
+- **Semantic Kernel**: Very active. python-1.42.0 (May 14, 2026). 269 releases. 4,991 commits. Microsoft-maintained. Discord community. Note: now superseded by Microsoft Agent Framework.
+
+### 5. What language is each framework written in?
+
+- **SuperAGI**: Python
+- **Agency Swarm**: Python (primary), JavaScript/TypeScript (TUI UI)
+- **Semantic Kernel**: C# (primary, 66.8%), Python (31.3%), Java
+
+---
+
+## Source list
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [SuperAGI GitHub Repository](https://github.com/TransformerOptimus/SuperAGI) | GitHub / code |
+| 2 | [SuperAGI config_template.yaml](https://raw.githubusercontent.com/TransformerOptimus/SuperAGI/main/config_template.yaml) | Config file |
+| 3 | [SuperAGI agent_prompt_builder.py](https://github.com/TransformerOptimus/SuperAGI/blob/main/superagi/agent/agent_prompt_builder.py) | Source code |
+| 4 | [Agency Swarm GitHub Repository](https://github.com/VRSEN/agency-swarm) | GitHub / code |
+| 5 | [Agency Swarm Docs - Overview](https://agency-swarm.ai/core-framework/agencies/overview) | Official docs |
+| 6 | [Agency Swarm Docs - Communication Flows](https://agency-swarm.ai/core-framework/agencies/communication-flows) | Official docs |
+| 7 | [Agency Swarm Docs - Running an Agency](https://agency-swarm.ai/core-framework/agencies/running-agency) | Official docs |
+| 8 | [Agency Swarm Docs - Agents Overview](https://agency-swarm.ai/core-framework/agents/overview) | Official docs |
+| 9 | [Agency Swarm Docs - Observability](https://agency-swarm.ai/additional-features/observability) | Official docs |
+| 10 | [Agency Swarm Docs - FastAPI Integration](https://agency-swarm.ai/additional-features/fastapi-integration) | Official docs |
+| 11 | [Semantic Kernel GitHub Repository](https://github.com/microsoft/semantic-kernel) | GitHub / code |
+| 12 | [SK Agent Framework Docs](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/) | Official docs (Microsoft Learn) |
+| 13 | [SK Orchestration Docs](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/) | Official docs (Microsoft Learn) |
+| 14 | [SK Concurrent Orchestration Docs](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/concurrent) | Official docs (Microsoft Learn) |
+| 15 | [SK Kernel Concepts Docs](https://learn.microsoft.com/en-us/semantic-kernel/concepts/kernel) | Official docs (Microsoft Learn) |
+| 16 | [SK Plugins Docs](https://learn.microsoft.com/en-us/semantic-kernel/concepts/plugins/) | Official docs (Microsoft Learn) |
+
+---
+
+## Verbatim quotes
+
+- "A dev-first open source autonomous AI agent framework enabling developers to build, manage & run useful autonomous agents. You can run concurrent agents seamlessly" — [SuperAGI GitHub README](https://github.com/TransformerOptimus/SuperAGI)
+- "The CEO can assign tasks to both Developer and Virtual Assistant, so they will run in parallel in different threads and come back with their results to the CEO" — [Agency Swarm Docs - Communication Flows](https://agency-swarm.ai/core-framework/agencies/communication-flows)
+- "Communication flows are defined using tuples in the communication_flows parameter: (sender, receiver) defines a directional communication path" — [Agency Swarm Docs - Communication Flows](https://agency-swarm.ai/core-framework/agencies/communication-flows)
+- "Concurrent orchestration enables multiple agents to work on the same task in parallel. Each agent processes the input independently, and their results are collected and aggregated" — [SK Concurrent Orchestration Docs](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/concurrent)
+- "Semantic Kernel is now Microsoft Agent Framework! Microsoft Agent Framework (MAF) is the enterprise‑ready successor to Semantic Kernel" — [Semantic Kernel GitHub README](https://github.com/microsoft/semantic-kernel)
+- "Agent Orchestration features in the Agent Framework are in the experimental stage" — [SK Orchestration Docs](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/)
+
+---
+
+## Source quality flags
+
+- Source 3 (SuperAGI agent_prompt_builder.py): source code file, high quality for understanding architecture but shows framework is single-agent focused
+- Source 1 (SuperAGI README): marketing language — "dev-first", "build, manage & run useful autonomous agents" — but backed by code
+- Source 2 (SuperAGI config_template.yaml): primary source showing config capabilities
+- Sources 5-10 (Agency Swarm docs): official documentation, well-maintained, comprehensive
+- Sources 12-16 (SK Microsoft Learn): official Microsoft documentation, high quality
+- Source 11 (SK GitHub): official Microsoft repo, highly active
+
+---
+
+## Confidence: High
+
+All findings are based on direct examination of official GitHub repositories, official documentation sites, source code files, and configuration files. No marketing or third-party summaries were relied upon.
+
+---
+
+## Gaps and open questions
+
+1. **SuperAGI's current maintenance status**: The README says "Under Development!" and the last release tag appears to be v0.0.11, but the commit history shows recent activity. Confirming the exact current maintenance pace would require deeper investigation.
+2. **Agency Swarm's 3rd layer**: Whether nesting `Agency` instances or using custom code could achieve a 3-layer dispatch→orchestrator→subagent hierarchy is not documented. This would need experimental verification.
+3. **Semantic Kernel's successor relationship**: Microsoft Agent Framework (MAF) is now the recommended path. It's unclear how long Semantic Kernel will receive updates. MAF may address some gaps listed here.
+4. **None of the three frameworks support**: LSP integration, shell access with directory permissions, chat forking/model switching, or explicit conflict prevention. These would need custom implementation regardless of framework choice.
+5. **Config-driven orchestration**: None of the three frameworks support defining orchestrator types via config files (not code). This appears to be a gap unique to the current landscape of agent frameworks.
+
+---
+
+## Tool calls made
+
+1. `webfetch` — https://github.com/TransformerOptimus/SuperAGI (GitHub README)
+2. `webfetch` — https://github.com/VRSEN/agency-swarm (GitHub README)
+3. `webfetch` — https://github.com/microsoft/semantic-kernel (GitHub README)
+4. `webfetch` — https://superagi.com/docs/ (failed — transport error)
+5. `webfetch` — https://agency-swarm.ai/core-framework/agencies/overview
+6. `webfetch` — https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/
+7. `webfetch` — https://agency-swarm.ai/core-framework/agencies/communication-flows
+8. `webfetch` — https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/
+9. `webfetch` — https://superagi.com/docs/architecture/ (failed — transport error)
+10. `webfetch` — https://github.com/TransformerOptimus/SuperAGI/blob/main/README.MD
+11. `webfetch` — https://agency-swarm.ai/additional-features/observability
+12. `webfetch` — https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-orchestration/concurrent
+13. `webfetch` — https://raw.githubusercontent.com/TransformerOptimus/SuperAGI/main/config_template.yaml
+14. `webfetch` — https://agency-swarm.ai/core-framework/agents/overview
+15. `webfetch` — https://agency-swarm.ai/core-framework/agencies/running-agency
+16. `webfetch` — https://github.com/TransformerOptimus/SuperAGI/blob/main/superagi/agent/agent_prompt_builder.py
+17. `webfetch` — https://learn.microsoft.com/en-us/semantic-kernel/concepts/kernel
+18. `webfetch` — https://agency-swarm.ai/additional-features/fastapi-integration
+19. `webfetch` — https://learn.microsoft.com/en-us/semantic-kernel/concepts/plugins/
+20. `webfetch` — https://raw.githubusercontent.com/microsoft/semantic-kernel/main/FEATURE_MATRIX.md
+21. `webfetch` — https://github.com/VRSEN/agency-swarm/blob/main/AGENTS.md
+22. `webfetch` — https://github.com/TransformerOptimus/SuperAGI/tree/main/superagi/agent
+23. `webfetch` — https://learn.microsoft.com/en-us/semantic-kernel/concepts/prompt-engineering/ (failed — 404)
diff --git a/research/multi-agent-orchestration.md b/research/multi-agent-orchestration.md
new file mode 100644
index 0000000..6c5c5f5
--- /dev/null
+++ b/research/multi-agent-orchestration.md
@@ -0,0 +1,235 @@
+# Subagent Report: Multi-Agent Orchestration Frameworks
+
+## Research summary
+This report evaluates three open-source multi-agent orchestration frameworks — **CrewAI**, **Microsoft AutoGen (AG2)**, and **LangGraph** — against a 15-point requirements checklist for an AI agent harness. CrewAI is the most opinionated with a built-in hierarchy and YAML-driven configuration, LangGraph is the lowest-level graph-based orchestration runtime with the most flexible persistence model, and AutoGen is a maintenance-mode framework with a layered design and no-code GUI. No framework fully satisfies all 15 requirements; each has gaps in the areas of LSP integration, shell access with directory permissions, and skills systems as defined.
+
+---
+
+## Findings
+
+### 1. CrewAI
+
+#### Core Architecture
+
+CrewAI is an open-source Python framework for orchestrating role-based AI agents. It is built entirely from scratch — independent of LangChain. It provides two complementary paradigms: **Crews** (autonomous agent teams with role-based collaboration) and **Flows** (event-driven, stateful workflows). Version 1.14.5 (latest as of May 2026). Language: Python (99%+). Stars: ~51.7k on GitHub. Commits: ~2,414. Recent releases: 191 total, active.
+
+**Architecture layers:**
+- **Agent** — role, goal, backstory, tools, LLM config
+- **Crew** — orchestrates a team of agents with a Process (sequential, hierarchical, or hybrid)
+- **Flow** — higher-level orchestration with `@start`, `@listen`, `@router` decorators, state management, persistence
+- **Config** — YAML files for agents and tasks (recommended approach)
+
+**Primary use case:** General-purpose multi-agent automation for enterprise workflows. Not specifically focused on coding — more general business process automation.
+
+**Project status:** Very active. Backed by a company (CrewAI Inc.). 100k+ certified developers in the community.
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Detail |
+|---|------------|--------|--------|
+| 1 | **Three-layer hierarchy** | **Partial** | CrewAI has a native hierarchical process where a manager agent coordinates sub-agents. However, it's a 2-level hierarchy (manager → agent), not 3-level (dispatch → orchestrator → subagent). Flows can chain multiple crews, achieving multi-level composition programmatically but not as a built-in dispatch architecture. |
+| 2 | **Config-driven orchestrators** | **Fully** | Agents and tasks are defined in YAML (`agents.yaml`, `tasks.yaml`) loaded via `@CrewBase` decorators. This is the recommended approach. Source: CrewAI docs "YAML Configuration (Recommended)" |
+| 3 | **Parallel subagent execution** | **Fully** | Tasks support `async_execution=True` for parallel execution. Multiple tasks can run concurrently when using the context mechanism for dependency management. Source: CrewAI Tasks docs on Asynchronous Execution |
+| 4 | **Strict hierarchy communication** | **Partial** | The hierarchical process assigns a manager that delegates tasks and validates results, providing structured parent-child communication. However, there is no built-in mechanism to prevent peer-to-peer agent messaging when delegation is enabled (`allow_delegation`). |
+| 5 | **User-to-agent messaging mid-execution** | **Partial** | CrewAI supports `@human_feedback` decorator (v1.8.0+) for human-in-the-loop in Flows, and `human_input=True` on tasks. However, this is for configured approval points, not arbitrary mid-execution injection to any running agent. |
+| 6 | **Conflict prevention** | **Not at all** | No built-in mechanism for assigning non-overlapping file scopes to parallel agents. Code execution is deprecated (uses external sandboxes like E2B). |
+| 7 | **Role-scoped tooling** | **Fully** | Agents can have different tool sets based on role. Tools are assigned per-agent via the `tools` parameter. Tasks can also override tools. Source: CrewAI Agents documentation on tools |
+| 8 | **Skills system** | **Partial** | Supports custom system templates, prompt templates, and response templates per agent. The project template uses `agents.yaml` for defining agent behaviors. Has a "skills" feature via MCP/skills.sh for AI coding assistants, but no directory-based markdown instruction system for agent definition. |
+| 9 | **LSP integration** | **Not at all** | No Language Server Protocol integration for compiler diagnostics. |
+| 10 | **Shell access with directory permissions** | **Not at all** | Code execution is deprecated in favor of external sandboxes. No shell access with permission controls. |
+| 11 | **Session management** | **Partial** | Flows support `@persist` decorator for state persistence across restarts using SQLite. Supports "fork" via `restore_from_state_id`. No chat forking or model switching mid-conversation in the traditional sense. |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** | `@human_feedback` decorator on Flow methods pauses execution and collects feedback. `human_input=True` on tasks enables human review. Source: CrewAI Flows docs on human_feedback |
+| 13 | **State persistence** | **Fully** | `@persist` decorator on Flows persists state to SQLite automatically. Supports resume and fork patterns. Source: CrewAI Flows persistence docs |
+| 14 | **Provider-agnostic LLM** | **Fully** | Supports OpenAI, Azure, Anthropic, Ollama, Gemini, and many more via LiteLLM integration. The `llm` parameter accepts model strings or `LLM` instances. Source: CrewAI docs "Connecting Your Crew to a Model" |
+| 15 | **Multiple interfaces** | **Partial** | Has a CLI (`crewai create`, `crewai run`, `crewai flow kickoff`) and a Python API. No native TUI or API server in the open-source version (CrewAI AMP provides enterprise management console). |
+
+---
+
+### 2. Microsoft AutoGen (AG2)
+
+#### Core Architecture
+
+AutoGen is a Python framework for building multi-agent AI applications. Developed by Microsoft Research. Currently in **maintenance mode** — no new features, community-managed only. Latest release: `python-v0.7.5` (Sep 2025). Stars: ~58.2k. Forks: ~8.8k. Commits: ~3,782. Microsoft recommends migrating to **Microsoft Agent Framework** for new projects.
+
+**Architecture layers (3-tier design):**
+- **Core API** — message passing, event-driven agents, distributed runtime, cross-language (Python + .NET)
+- **AgentChat API** — higher-level opinionated API for rapid prototyping with agents, teams, group chats
+- **Extensions API** — model clients, tools, code execution backends
+
+**Team patterns:** RoundRobinGroupChat, SelectorGroupChat, Swarm, MagenticOneGroupChat, GraphFlow
+
+**Primary use case:** Conversational multi-agent AI applications, research, prototyping. Magentic-One for web/file tasks.
+
+**Project status:** Maintenance mode. No new features. Users directed to Microsoft Agent Framework.
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Detail |
+|---|------------|--------|--------|
+| 1 | **Three-layer hierarchy** | **Partial** | AutoGen has a flat agent model with "teams" orchestrating agents. It supports Swarm and SelectorGroupChat for multi-agent coordination, and GraphFlow for workflows. No native 3-layer dispatch → orchestrator → subagent hierarchy exists. Subordination must be implemented manually via `AgentTool` wrapping. |
+| 2 | **Config-driven orchestrators** | **Partial** | AutoGen Studio provides a no-code GUI for prototyping. The framework itself is code-first — teams, agents, and termination conditions are defined in Python code. Component serialization exists (`.dump_component()`) but is not a YAML-based config system. |
+| 3 | **Parallel subagent execution** | **Partial** | The `AgentTool` pattern allows one agent to call another as a tool, but this is sequential delegation, not parallel execution. RoundRobinGroupChat is sequential (turn-based). No native parallel agent execution within a team. |
+| 4 | **Strict hierarchy communication** | **Partial** | In SelectorGroupChat and Swarm, speaker selection is controlled. `AgentTool` wrapping creates a tool-call boundary. However, no strict parent-child communication restriction mechanism is built in. |
+| 5 | **User-to-agent messaging mid-execution** | **Fully** | `UserProxyAgent` allows injecting user input during team execution (blocking). `ExternalTermination` can stop teams mid-execution. `HandoffTermination` enables handoff to user. Source: AutoGen Human-in-the-Loop docs |
+| 6 | **Conflict prevention** | **Not at all** | No built-in mechanism for non-overlapping file scopes in parallel agents. |
+| 7 | **Role-scoped tooling** | **Fully** | Each `AssistantAgent` can have its own set of tools. Agent descriptions define their role for the selector. Source: AutoGen AgentChat docs on agents and tools |
+| 8 | **Skills system** | **Not at all** | No skills system for injecting markdown/text instructions per agent type. Agents are configured via `system_message` string and `description` string in code. |
+| 9 | **LSP integration** | **Not at all** | No Language Server Protocol integration. |
+| 10 | **Shell access with directory permissions** | **Not at all** | Code execution requires external sandboxes or MCP tools. No built-in shell access with permissions. |
+| 11 | **Session management** | **Partial** | Teams support `save_state()` and `load_state()` for persisting conversation state. State can be serialized to JSON. No chat forking or model switching mid-conversation. |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** | `UserProxyAgent` for inline feedback, `HandoffTermination` for async feedback, `max_turns` for turn-based pausing. Source: AutoGen Human-in-the-Loop tutorial |
+| 13 | **State persistence** | **Fully** | `save_state()` / `load_state()` on agents and teams. State dictionaries can be serialized to file or database. Source: AutoGen Managing State docs |
+| 14 | **Provider-agnostic LLM** | **Fully** | Supports OpenAI, Azure OpenAI, Azure AI Foundry, Anthropic (experimental), Ollama (experimental), Gemini (via API), Llama API, plus Semantic Kernel adapter for even more providers. Source: AutoGen Models docs |
+| 15 | **Multiple interfaces** | **Fully** | Python API, CLI (`autogenstudio ui`), AutoGen Studio (no-code GUI web app), and FastAPI/ChainLit/Streamlit integration samples. Source: AutoGen README and FastAPI sample |
+
+---
+
+### 3. LangGraph
+
+#### Core Architecture
+
+LangGraph is a low-level orchestration framework for building stateful, long-running agents. Developed by LangChain Inc. Built as a graph-based runtime inspired by Google's Pregel and Apache Beam. Latest release: `langgraph==1.2.0` (May 2026). Stars: ~32.4k. Commits: ~6,862. Active development (534 releases total).
+
+**Architecture:**
+- **StateGraph** — defines state schema (TypedDict/Pydantic), nodes (functions), edges (conditional/static)
+- **Subgraphs** — graphs used as nodes in other graphs (supports multi-agent patterns)
+- **Checkpointer** — persistence layer for state snapshots at every super-step
+- **Store** — cross-thread memory for long-term knowledge
+- **Persistence modes:** per-invocation (default), per-thread, stateless
+
+**Primary use case:** Low-level agent orchestration for complex, stateful, long-running workflows. Used by Klarna, Replit, Elastic, Uber, J.P. Morgan. Higher-level abstraction available via Deep Agents and LangChain agents.
+
+**Project status:** Very active. Backed by LangChain Inc with commercial LangSmith platform.
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Detail |
+|---|------------|--------|--------|
+| 1 | **Three-layer hierarchy** | **Fully** | LangGraph's subgraph architecture supports arbitrary nesting. A parent graph can contain subgraphs, which can contain further subgraphs. `Command(goto=..., graph=Command.PARENT)` enables navigation between levels. Each level has its own state schema. Source: LangGraph Subgraphs documentation |
+| 2 | **Config-driven orchestrators** | **Not at all** | LangGraph is purely code-defined — graphs, nodes, edges, and state schemas are all Python code. No YAML or config file support for defining orchestrator types. LangSmith Studio provides a UI but generates code. |
+| 3 | **Parallel subagent execution** | **Fully** | Multiple outgoing edges from a single node execute in parallel (same super-step). `Send()` API enables map-reduce patterns with dynamic fan-out. Subgraphs can run in parallel. Source: LangGraph Graph API docs on edges and Send |
+| 4 | **Strict hierarchy communication** | **Fully** | Subgraphs can have private state schemas invisible to the parent graph. When a subgraph is invoked via a node function, the parent only sees what the node function returns. State isolation is achieved via separate state schemas. Source: LangGraph Subgraphs docs on different state schemas |
+| 5 | **User-to-agent messaging mid-execution** | **Fully** | `interrupt()` function pauses graph execution and returns control to the caller. The caller can inspect state and resume with `Command(resume=...)`. Supports multiple simultaneous interrupts. Source: LangGraph Interrupts documentation |
+| 6 | **Conflict prevention** | **Not at all** | No built-in mechanism for file scope conflict prevention. |
+| 7 | **Role-scoped tooling** | **Fully** | Each agent node can have its own set of tools. In multi-agent patterns, subgraphs/agents have independent tool configurations. Tools are LangChain-compatible. |
+| 8 | **Skills system** | **Not at all** | No skills system for injecting markdown instructions per agent type. No directory-based instruction organization. Agents use system prompts defined in code. |
+| 9 | **LSP integration** | **Not at all** | No Language Server Protocol integration. |
+| 10 | **Shell access with directory permissions** | **Not at all** | No built-in shell access. Code execution relies on external tools or LangChain tool integrations. |
+| 11 | **Session management** | **Partial** | Checkpointer-based threads provide conversation history via `get_state_history()`. Time-travel debugging via replay from checkpoints. `update_state()` for editing state. No explicit chat forking or model switching mid-conversation. |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** | `interrupt()` function for dynamic pausing. Static breakpoints via `interrupt_before`/`interrupt_after` at compile time. Supports approval workflows, review-and-edit, and validation loops. Source: LangGraph Interrupts documentation |
+| 13 | **State persistence** | **Fully** | Multiple checkpointers: InMemorySaver, SqliteSaver, PostgresSaver, Azure CosmosDB. Checkpoints at every super-step. Cross-thread memory via Store. Encryption support. Source: LangGraph Persistence documentation |
+| 14 | **Provider-agnostic LLM** | **Fully** | LangGraph can use any LangChain-compatible model provider (OpenAI, Anthropic, Google, Ollama, AWS Bedrock, Azure, etc.) plus standalone models without LangChain. The `Runtime` context can pass model configuration. |
+| 15 | **Multiple interfaces** | **Partial** | Python API primarily. LangSmith Studio for visual prototyping. LangGraph API for deployment. No native CLI or TUI. Deep Agents SDK provides a higher-level interface. |
+
+---
+
+## Summary Comparison Table
+
+| # | Requirement | CrewAI | AutoGen (AG2) | LangGraph |
+|---|------------|--------|---------------|-----------|
+| 1 | **Three-layer hierarchy** | Partial (2-level natively, chaining via Flows) | Partial (flat agent teams, Swarm/GraphFlow) | **Fully** (arbitrary nesting via subgraphs) |
+| 2 | **Config-driven orchestrators** | **Fully** (YAML agents.yaml/tasks.yaml) | Partial (code-first, Studio GUI, component serialization) | Not at all (purely code-defined) |
+| 3 | **Parallel subagent execution** | **Fully** (async_execution tasks) | Partial (sequential team patterns, AgentTool is blocking) | **Fully** (Send API, parallel edges, map-reduce) |
+| 4 | **Strict hierarchy communication** | Partial (manager delegates but no P2P prevention) | Partial (SelectorGroupChat controls turns, AgentTool boundary) | **Fully** (private state schemas, subgraph isolation) |
+| 5 | **User-to-agent messaging mid-execution** | Partial (@human_feedback at configured points) | **Fully** (UserProxyAgent, ExternalTermination, HandoffTermination) | **Fully** (interrupt()/Command(resume=...) anywhere) |
+| 6 | **Conflict prevention** | Not at all | Not at all | Not at all |
+| 7 | **Role-scoped tooling** | **Fully** (per-agent tools, task override) | **Fully** (per-agent tools) | **Fully** (per-node tools, LangChain-compatible) |
+| 8 | **Skills system** | Partial (agent templates, prompt customization) | Not at all | Not at all |
+| 9 | **LSP integration** | Not at all | Not at all | Not at all |
+| 10 | **Shell access with directory permissions** | Not at all | Not at all | Not at all |
+| 11 | **Session management** | Partial (Flow persist/fork) | Partial (save_state/load_state, JSON serialization) | Partial (checkpointer history, time travel, update_state) |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** (@human_feedback, human_input on tasks) | **Fully** (UserProxyAgent, HandoffTermination, max_turns) | **Fully** (interrupt(), static breakpoints, approval patterns) |
+| 13 | **State persistence** | **Fully** (@persist with SQLite, resume/fork) | **Fully** (save_state/load_state to file or DB) | **Fully** (checkpointers: Memory, SQLite, Postgres, CosmosDB) |
+| 14 | **Provider-agnostic LLM** | **Fully** (many providers via LiteLLM) | **Fully** (many providers via Extensions API + SK adapter) | **Fully** (all LangChain providers, plus standalone mode) |
+| 15 | **Multiple interfaces** | Partial (CLI + Python API, AMP enterprise console) | **Fully** (Python API, CLI, Studio web GUI, FastAPI/Streamlit) | Partial (Python API, LangSmith Studio, LangGraph API) |
+
+---
+
+## Overall Assessment
+
+### CrewAI
+**Strength:** Most opinionated framework for role-based agents with YAML-driven configuration. Excellent for enterprise automation where you want to define agents declaratively. Strong community (100k+ certified developers), active development, and a commercial offering (CrewAI AMP).
+
+**Key Gaps for this harness:** No LSP, no shell permissions, no 3-layer dispatch hierarchy natively, limited mid-execution user injection.
+
+### AutoGen (AG2)
+**Strength:** Best GUI/studio support for prototyping. Most flexible human-in-the-loop with `UserProxyAgent` and handoff patterns. Multiple interface options (Python, CLI, Studio web app). Provider-agnostic model support with Semantic Kernel adapter.
+
+**Key Gaps:** **Maintenance mode** — no new features, users directed to Microsoft Agent Framework. No parallel execution, no config-driven setup, flat agent model.
+
+### LangGraph
+**Strength:** Most architecturally flexible — arbitrary graph topologies, arbitrary subgraph nesting, the richest persistence model (multiple checkpointers + cross-thread store), durable execution, and the most sophisticated interrupt system. Best for complex, long-running, stateful workflows.
+
+**Key Gaps:** Code-only configuration (no YAML), no built-in skills system, no CLI/TUI, steeper learning curve due to low-level nature. No file scope conflict prevention.
+
+---
+
+## Source list
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [CrewAI GitHub Repository](https://github.com/crewAIInc/crewAI) | official |
+| 2 | [CrewAI Documentation - Agents](https://docs.crewai.com/concepts/agents) | official |
+| 3 | [CrewAI Documentation - Tasks](https://docs.crewai.com/concepts/tasks) | official |
+| 4 | [CrewAI Documentation - Flows](https://docs.crewai.com/concepts/flows) | official |
+| 5 | [CrewAI Documentation - Memory](https://docs.crewai.com/concepts/memory) | official |
+| 6 | [CrewAI Documentation - Processes](https://docs.crewai.com/core-concepts/Processes/) | official |
+| 7 | [AutoGen GitHub Repository](https://github.com/microsoft/autogen) | official |
+| 8 | [AutoGen Documentation - Teams](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/teams.html) | official |
+| 9 | [AutoGen Documentation - Human-in-the-Loop](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/human-in-the-loop.html) | official |
+| 10 | [AutoGen Documentation - Managing State](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/state.html) | official |
+| 11 | [AutoGen Documentation - Models](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/models.html) | official |
+| 12 | [AutoGen Documentation - Selector Group Chat](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/selector-group-chat.html) | official |
+| 13 | [LangGraph GitHub Repository](https://github.com/langchain-ai/langgraph) | official |
+| 14 | [LangGraph Documentation - Overview](https://docs.langchain.com/oss/python/langgraph/overview) | official |
+| 15 | [LangGraph Documentation - Graph API](https://docs.langchain.com/oss/python/langgraph/graph-api) | official |
+| 16 | [LangGraph Documentation - Subgraphs](https://docs.langchain.com/oss/python/langgraph/use-subgraphs) | official |
+| 17 | [LangGraph Documentation - Interrupts](https://docs.langchain.com/oss/python/langgraph/interrupts) | official |
+| 18 | [LangGraph Documentation - Persistence](https://docs.langchain.com/oss/python/langgraph/persistence) | official |
+
+---
+
+## Verbatim quotes
+
+- "CrewAI is a lean, lightning-fast Python framework built entirely from scratch—completely independent of LangChain or other agent frameworks." — [CrewAI GitHub README](https://github.com/crewAIInc/crewAI)
+- "Using YAML configuration provides a cleaner, more maintainable way to define agents. We strongly recommend using this approach in your CrewAI projects." — [CrewAI Agents docs](https://docs.crewai.com/concepts/agents)
+- "AutoGen is now in maintenance mode. It will not receive new features or enhancements and is community managed going forward." — [AutoGen GitHub README](https://github.com/microsoft/autogen)
+- "New users should start with Microsoft Agent Framework. Existing users are encouraged to migrate." — [AutoGen GitHub README](https://github.com/microsoft/autogen)
+- "LangGraph is a low-level orchestration framework for building, managing, and deploying long-running, stateful agents." — [LangGraph GitHub README](https://github.com/langchain-ai/langgraph)
+- "Subgraphs are useful for building multi-agent systems, reusing a set of nodes in multiple graphs, and distributing development." — [LangGraph Subgraphs docs](https://docs.langchain.com/oss/python/langgraph/use-subgraphs)
+- "Interrupts allow you to pause graph execution at specific points and wait for external input before continuing." — [LangGraph Interrupts docs](https://docs.langchain.com/oss/python/langgraph/interrupts)
+- "UserProxyAgent is a special built-in agent that acts as a proxy for a user to provide feedback to the team." — [AutoGen HITL docs](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/human-in-the-loop.html)
+
+---
+
+## Source quality flags
+
+- No significant quality issues found. All three sources are primary (official GitHub repositories, official documentation sites).
+- AutoGen's documentation is comprehensive but the project is in maintenance mode.
+
+---
+
+## Confidence: High
+
+All claims are sourced from official GitHub repositories and official documentation sites for each framework (CrewAI, AutoGen, LangGraph). The information reflects the current state as of May 2026.
+
+## Gaps and open questions
+
+- **LSP integration**: None of the three frameworks support Language Server Protocol integration. This would need to be built as a custom extension for any chosen framework.
+- **Shell access with directory permissions**: None support this natively. Shell access would require wrapping shell tools with permission checks manually in any framework.
+- **Skills system**: Only CrewAI has anything close (prompt templates, agent configuration files), but none have the directory-based markdown instruction system described in the requirements.
+- **Conflict prevention for file scopes**: No framework has built-in mechanisms for this. Would need to be implemented as a custom tool wrapper or middleware.
+- **Session management (chat forking, model switching)**: All three have basic state persistence but not the full session management features (chat forking, model switching mid-conversation).
+- **CrewAI's Allow Delegation**: When `allow_delegation=True`, agents can delegate tasks to other agents, which could include peer agents — creating implicit P2P communication. The strictness of hierarchy enforcement depends on configuration.
+- **AutoGen's successor**: Microsoft Agent Framework (MAF) is the recommended successor and may address many gaps, but it was outside this research scope.
+
+---
+
+## Tool calls made
+
+1. `webfetch` - 3 GitHub README pages (CrewAI, AutoGen, LangGraph)
+2. `webfetch` - CrewAI Processes page, Crews page
+3. `webfetch` - LangGraph overview, AutoGen tutorial, CrewAI agents, tasks
+4. `webfetch` - CrewAI flows, AutoGen HITL, LangGraph interrupts, LangGraph subgraphs (404)
+5. `webfetch` - LangGraph persistence, AutoGen state management, AutoGen selector chat, LangGraph graph API
+6. `webfetch` - LangGraph subgraphs, CrewAI memory, AutoGen models
diff --git a/research/multi-agent-roleplay.md b/research/multi-agent-roleplay.md
new file mode 100644
index 0000000..72686a7
--- /dev/null
+++ b/research/multi-agent-roleplay.md
@@ -0,0 +1,439 @@
+# Subagent Report: Multi-Agent Role-Playing Frameworks (MetaGPT, ChatDev, CAMEL)
+
+## Research summary
+
+This report evaluates three open-source multi-agent simulation/role-playing frameworks — **MetaGPT**, **ChatDev**, and **CAMEL** — against 15 specific AI agent harness requirements. MetaGPT is the most mature for software-development workflows (68.1k stars) but has a flat role-based architecture, not a deep hierarchy. ChatDev 2.0 (released Jan 2026) has pivoted to a zero-code YAML-driven DAG orchestration platform with strong config-driven design and parallel execution. CAMEL is the most general-purpose research framework (17k stars, 210+ releases) with the broadest model provider support and excellent tooling abstractions, but lacks strict hierarchy enforcement and production-oriented features like session management or shell permissions. None of the three frameworks fully satisfy all 15 requirements; each has distinct gaps in hierarchy depth, user-to-agent messaging, LSP integration, and shell permission controls.
+
+---
+
+## Findings
+
+### 1. MetaGPT
+
+#### Core Architecture
+
+MetaGPT is a Python-based multi-agent framework designed to simulate a software company. Its core philosophy is `Code = SOP(Team)` — Standard Operating Procedures materialized into a team of LLM-based agents. The architecture has three main abstractions:
+
+- **Role**: An agent with specific actions, memory, and watch/observe patterns
+- **Action**: A discrete unit of work (LLM-powered or code-only)
+- **Team**: A collection of roles operating within an Environment
+
+Agents communicate by publishing messages to the Environment, which other agents observe. The communication model is a publish-subscribe pattern (agents publish to the environment, others observe based on watched Action types).
+
+**Key metrics**: 68.1k stars, 8.7k forks, 6,367 commits, Python 3.9+ (Python 97.5%). Latest release v0.8.1 (April 2024). The project has been relatively quiet in terms of releases since April 2024, though the repo continues to receive commits. The team has shifted focus to MGX ([mgx.dev](https://mgx.dev/)) — a commercial natural language programming product.
+
+[Source: GitHub README](https://github.com/geekan/MetaGPT)
+
+#### Hierarchy Depth
+
+MetaGPT supports a **flat role pool** within a single Team/Environment. Agents are peers — there is no orchestrator/subagent distinction. The `Team` class manages a set of roles, but it does not support recursive nesting (a sub-team within a team). Communication is broadcast-based: all messages go to the Environment, and roles subscribe to specific message types via `_watch()`. There is no built-in three-layer hierarchy.
+
+[Source: MultiAgent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html)
+
+#### Config System
+
+MetaGPT has a YAML config file (`~/.metagpt/config2.yaml`) that supports:
+- LLM provider settings (api_type, base_url, api_key, model)
+- Per-role LLM overrides
+- Search, browser, Redis, S3 configurations
+- Experience pool and memory settings
+
+However, orchestrator types, agent behaviors, and workflow logic are **defined in Python code**, not in configuration files. The config covers infrastructure and LLM settings but not orchestration logic.
+
+[Source: config2.example.yaml](https://github.com/geekan/MetaGPT/blob/main/config/config2.example.yaml)
+
+#### Parallel Execution
+
+MetaGPT's `Team` executes roles **sequentially in a round-robin fashion**. The `n_round` parameter controls how many rounds of interaction occur. There is no built-in parallel subagent execution — agents take turns within each round. The async implementation (`asyncio`) exists in the codebase but is used for single-agent action execution, not parallel multi-agent execution.
+
+[Source: MultiAgent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html)
+
+#### Communication Restrictions
+
+MetaGPT uses a **broadcast-based** communication model. When a Role publishes a message, it goes to all agents via the Environment. There is a `send_to` field on messages, but the default sends to `["<all>"]`. There is no built-in mechanism to enforce that subagents only talk to their parent orchestrator — all agents observe all messages.
+
+[Source: Agent communication](https://docs.deepwisdom.ai/main/en/guide/in_depth_guides/agent_communication.html)
+
+#### User-to-Agent Messaging Mid-Execution
+
+MetaGPT supports **human engagement** via setting `is_human=True` on a Role. This causes the terminal to prompt for user input when it's that role's turn. However, this is a **replacement** of an agent role with a human, not arbitrary message injection to any running agent at any time. The human takes over a specific role in the SOP.
+
+[Source: Human Engagement](https://docs.deepwisdom.ai/main/en/guide/tutorials/human_engagement.html)
+
+#### Conflict Prevention (File Scopes)
+
+MetaGPT has **no built-in mechanism** for assigning non-overlapping file scopes to parallel agents. The generated code is written to a shared workspace directory. There is no file-locking, scope assignment, or conflict detection.
+
+#### Role-Scoped Tooling
+
+MetaGPT supports **per-role tool assignment** through the `Role` class and its `set_actions()` method. Each role can be equipped with different actions (tools). The config file supports per-role LLM configuration. However, tool configuration is done in Python code, not via config files.
+
+[Source: Agent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/agent_101.html)
+
+#### Skills System
+
+MetaGPT does **not** have a directory-based skills system with markdown/text instructions per agent type. It has an "experience pool" (`exp_pool`) feature that stores past experiences, but this is not a skills directory system with global + project-level organization.
+
+[Source: config2.example.yaml](https://github.com/geekan/MetaGPT/blob/main/config/config2.example.yaml)
+
+#### LSP Integration
+
+MetaGPT has **no LSP integration**. Generated code is written to disk, but there is no built-in Language Server Protocol client for compiler diagnostics or code validation.
+
+#### Shell Access with Directory Permissions
+
+MetaGPT's `SimpleRunCode` action executes Python code via `subprocess.run()`. There is **no permission control system** — no auto-allow lists, no prompt-for-out-of-scope directories. Code execution is unrestricted within the subprocess.
+
+[Source: Agent 101 - SimpleRunCode](https://docs.deepwisdom.ai/main/en/guide/tutorials/agent_101.html)
+
+#### Session Management
+
+MetaGPT supports **serialization and breakpoint recovery** via `--recover_path`. It serializes team state (roles, memories, actions) to a JSON file and can resume execution. However, there is **no support for chat forking, model switching mid-conversation, or loading/resuming old chats** as separate sessions.
+
+[Source: Serialization & Breakpoint Recovery](https://docs.deepwisdom.ai/main/en/guide/in_depth_guides/breakpoint_recovery.html)
+
+#### Human-in-the-Loop Checkpoints
+
+MetaGPT has **no configurable checkpoint system** for pausing execution at predefined points. The human engagement feature (`is_human=True`) pauses at each role turn, but this replaces the agent rather than providing a checkpoint/approval mechanism. There are no approval gates or pause/resume hooks.
+
+#### State Persistence
+
+Yes — MetaGPT serializes team, environment, roles, and actions to JSON files in a `./workspace/storage` directory. This enables recovery after crashes or Ctrl-C. The serialization captures memory, role state, action progress, and environment history.
+
+[Source: Serialization & Breakpoint Recovery](https://docs.deepwisdom.ai/main/en/guide/in_depth_guides/breakpoint_recovery.html)
+
+#### Provider-Agnostic LLM
+
+Yes — MetaGPT supports multiple LLM providers via a configurable `api_type` field. Options include OpenAI, Azure, Ollama, Groq, Gemini, and others. Each role can have a different LLM provider.
+
+[Source: Config example](https://github.com/geekan/MetaGPT/blob/main/config/config2.example.yaml)
+
+#### Multiple Interfaces
+
+MetaGPT supports:
+- **CLI**: `metagpt "Create a 2048 game"` command
+- **Python library**: Import and use as a Python package
+- No dedicated TUI or API mode
+
+[Source: README](https://github.com/geekan/MetaGPT)
+
+---
+
+### 2. ChatDev (2.0 / DevAll)
+
+#### Core Architecture
+
+ChatDev has undergone a major transformation. **ChatDev 2.0 (DevAll)** — released Jan 7, 2026 — is a **zero-code multi-agent orchestration platform**. It is no longer focused solely on software development but positions itself as a platform for "Developing Everything" through configurable DAG-based workflows.
+
+The architecture is:
+- **YAML-defined workflows** describing DAGs of agent nodes
+- **Node types**: `agent` (LLM), `python`, `human`, `subgraph`, `passthrough`, `literal`, `loop_counter`, `loop_timer`
+- **Web UI** (Vue 3) for visual workflow design + execution dashboard
+- **Python SDK / PyPI package** (`chatdev`) for programmatic execution
+- **FastAPI backend** for the server component
+
+**Key metrics**: 33.1k stars, 4.1k forks, Python 68.7% / Vue 28.5%. Latest release v2.2.0 (March 23, 2026). Very active development — 190 commits on main branch for 2.0. The legacy ChatDev 1.x is preserved on the `chatdev1.0` branch.
+
+[Source: ChatDev README](https://github.com/OpenBMB/ChatDev)
+
+#### Hierarchy Depth
+
+ChatDev 2.0 supports **multi-level nesting** through its `subgraph` node type. A workflow can reference another workflow file or inline graph, enabling recursive hierarchy. Additionally, its "Tree Mode" (in dynamic execution) provides a natural fan-out/reduce pattern. The DAG structure with start/end nodes enables clear orchestration flows. However, this is a **directed acyclic graph** structure, not a strict three-level dispatch->orchestrator->subagent tree. The hierarchy is defined by edge topology, not by enforced levels.
+
+[Source: Workflow Authoring Guide](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Config-Driven Orchestrators
+
+**Yes** — this is ChatDev 2.0's core strength. Entire workflows, including agent types, models, prompts, tools, edge conditions, and execution logic, are defined in YAML configuration files under `yaml_instance/`. No Python code changes are needed to create new orchestrator types. The `DesignConfig` structure with `version`, `vars`, and `graph` keys provides a clean configuration schema.
+
+[Source: Workflow Authoring Guide](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Parallel Subagent Execution
+
+**Yes** — ChatDev 2.0 supports parallel execution through its **dynamic execution** feature. Both **Map Mode** (`type: map`, fan-out) and **Tree Mode** (`type: tree`, fan-out with recursive reduce) are supported. The `max_parallel` parameter controls concurrency limits. This enables true parallel subagent execution.
+
+[Source: Workflow Authoring Guide - Dynamic Execution](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Strict Hierarchy Communication
+
+ChatDev 2.0's DAG-based execution **enforces communication along defined edges**. Nodes only receive messages from their upstream nodes and send to downstream nodes. However, there is no parent-child trust boundary — all nodes within a workflow are at the same trust level. The `subgraph` node type provides some isolation (child graph has its own nodes), but there is no mechanism to enforce that sub-nodes can _only_ communicate with their parent orchestrator.
+
+**Partial support** — edges define communication paths, but strict hierarchical communication with parent-only routing is not enforced.
+
+[Source: Workflow Authoring Guide - Edges](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### User-to-Agent Messaging Mid-Execution
+
+ChatDev 2.0 supports this via the **`human` node type**. When execution reaches a `human` node, the workflow pauses and waits for user input in the Web UI. However, messaging is constrained to the node level — the user interacts at predefined points in the DAG, not with arbitrary running agents at any time. The `human` node must be explicitly placed in the workflow DAG.
+
+**Partial support** — user can inject messages at designated human nodes, but not to any arbitrary agent mid-execution.
+
+[Source: Workflow Authoring Guide - Node Types](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Conflict Prevention
+
+ChatDev 2.0 has **no built-in conflict prevention** mechanism for non-overlapping file scopes. The system does not assign or track file write scopes. The `python` node type executes scripts sharing a `code_workspace/` directory, but there is no file-locking or scope isolation.
+
+#### Role-Scoped Tooling
+
+**Yes** — the `AgentConfig.tooling` field lets different agent nodes have different tool sets. Tools are configured per-node in the YAML workflow. The MCP (Model Context Protocol) is also supported for tool integration. Tooling is defined at the node level, providing role-scoped access.
+
+[Source: Workflow Authoring Guide - Agent Node Advanced Features](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Skills System
+
+ChatDev 2.0 has a **`.agents/skills`** directory with a directory-based skill organization. Currently contains `greeting-demo`, `python-scratchpad`, and `rest-api-caller` skills. The skills system is designed for injectable markdown/text instructions per agent type, with both global and project-level organization. However, the documentation does not detail how skills are loaded/scoped to specific agent nodes.
+
+[Source: .agents/skills directory](https://github.com/OpenBMB/ChatDev/tree/main/.agents/skills)
+
+#### LSP Integration
+
+ChatDev has **no LSP integration** documented. The platform focuses on workflow orchestration and multi-agent collaboration, not on providing compiler diagnostics or language server features.
+
+#### Shell Access with Directory Permissions
+
+ChatDev 2.0 has **no documented shell permission controls**. The `python` node type executes scripts, and Docker support is available for safe execution (noted in legacy 1.x), but there is no auto-allow list, prompt-for-scope, or directory-based permission system in the current documentation.
+
+#### Session Management
+
+ChatDev 2.0 supports **session management** through the Web UI — workflows can be launched, monitored, and inspected. Context snapshots are saved to `WareHouse/<session>/context.json`. There is HTTP API support (`POST /api/workflow/execute`) and CLI execution (`python run.py`). However, there is **no support for chat forking, model switching mid-conversation, or loading/resuming old chats** in the documented features.
+
+[Source: Workflow Authoring Guide - CLI/API Execution Paths](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Human-in-the-Loop Checkpoints
+
+**Yes** — the `human` node type provides explicit checkpoints where the workflow pauses for user input. Conditions on edges (e.g., `keyword` condition checking for "ACCEPT") can create approval gates. The workflow only continues when the user provides the expected input.
+
+[Source: Workflow Authoring Guide - Conditions](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### State Persistence
+
+ChatDev 2.0 saves workflow session data to `WareHouse/` directories, including context snapshots. The SQLite database (`sync` command) stores workflow definitions. However, there is **no documented mechanism for full state persistence across restarts** that would allow resuming a workflow from where it left off after a crash (like MetaGPT's breakpoint recovery).
+
+**Partial support** — artifacts and logs persist, but workflow execution state recovery is not documented.
+
+[Source: Workflow Authoring Guide - Debugging Tips](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Provider-Agnostic LLM
+
+**Yes** — ChatDev 2.0 supports multiple LLM providers configured per-node via `provider` field (e.g., `openai`, `gemini`, etc.). The config uses `${VAR}` syntax for API keys and base URLs. The `globals.default_provider` sets a fallback. Providers can be mixed within a single workflow.
+
+[Source: Workflow Authoring Guide - Providers](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md)
+
+#### Multiple Interfaces
+
+ChatDev 2.0 supports:
+- **Web UI / TUI**: Vue 3 frontend at port 5173 with visual workflow canvas and execution dashboard
+- **CLI**: `python run.py --path yaml_instance/demo.yaml --name test_run`
+- **HTTP API**: `POST /api/workflow/execute` with session management
+- **Python SDK**: `chatdev` PyPI package for programmatic execution
+
+[Source: ChatDev README](https://github.com/OpenBMB/ChatDev)
+
+---
+
+### 3. CAMEL
+
+#### Core Architecture
+
+CAMEL is a Python-based research framework focused on "finding the scaling laws of agents." It is the most general-purpose and academically oriented of the three frameworks. The architecture includes:
+
+- **ChatAgent**: Core agent abstraction with LLM, tools, and memory
+- **Agent Societies**: Coordination layers including `RolePlaying`, `BabyAGI`, and `Workforce`
+- **Workforce**: A hierarchical agent orchestration system with task decomposition and worker assignment
+- **ModelFactory**: Abstract LLM interface supporting 45+ model platforms
+- **Toolkits**: 70+ tool integrations (search, browser, code execution, MCP, etc.)
+- **Memory & Storage**: Persistent state management
+- **Interpreters**: Code/command execution backends
+- **Human-in-the-Loop**: Tool approval and interactive components
+
+**Key metrics**: 17k stars, 1.9k forks, 2,208 commits, Python 95.9% / TypeScript 2.1%. Latest release v0.2.90 (March 22, 2026). Extremely active — 210 releases and very frequent commits. Large community with Discord, WeChat, and Reddit presence. Backed by Eigen AI research collective with 100+ researchers.
+
+[Source: CAMEL README](https://github.com/camel-ai/camel)
+
+#### Hierarchy Depth
+
+CAMEL's **Workforce** module provides a hierarchical multi-agent system. The `Workforce` class acts as an orchestrator that receives a task, decomposes it into subtasks, assigns workers (single agents or role-playing pairs), and aggregates results. The workforce supports **multiple levels** — a worker can itself be a `Workforce` (recursive nesting), enabling arbitrary hierarchy depth. The `RolePlaying` society creates two-agent task-solving pairs. However, workforce configuration is done via Python code, not config files, and the hierarchy is defined programmatically.
+
+[Source: CAMEL societies/workforce](https://github.com/camel-ai/camel/tree/master/camel/societies/workforce)
+
+#### Config-Driven Orchestrators
+
+CAMEL is **primarily Python-code-driven**. The `ModelFactory.create()` method accepts programmatic configuration. While `create_from_yaml()` and `create_from_json()` methods exist for model configuration, the orchestration logic (workforce setup, task decomposition, worker assignment) is **defined in Python code**. There is no YAML-based workflow definition system like ChatDev 2.0.
+
+**Partial support** — model config can be loaded from YAML/JSON files, but orchestration logic cannot.
+
+[Source: model_factory.py](https://github.com/camel-ai/camel/blob/master/camel/models/model_factory.py)
+
+#### Parallel Subagent Execution
+
+CAMEL's Workforce **does not appear to have built-in parallel execution** of subagents. The workforce processes tasks sequentially through its worker pool. While individual agents can make async LLM calls, the workforce orchestration itself is not designed for parallel fan-out execution. The documentation emphasizes scalability to "millions of agents" as a design principle but does not describe current parallel execution in the workforce implementation.
+
+**Not supported** in the current documented and released architecture.
+
+[Source: CAMEL societies/workforce](https://github.com/camel-ai/camel/tree/master/camel/societies/workforce)
+
+#### Strict Hierarchy Communication
+
+CAMEL's Workforce uses a **centralized orchestration** pattern — the workforce assigns tasks to workers and collects results. Workers do not communicate directly with each other; they receive tasks from and return results to the workforce. The `RolePlaying` society, however, involves direct agent-to-agent communication. The framework does not explicitly enforce a "subagents only talk to parent" policy — the architecture enables this pattern through the workforce abstraction by design, but there is no system-level enforcement mechanism.
+
+**Partial support** — Workforce architecture naturally restricts communication to parent-child, but no formal enforcement mechanism exists.
+
+[Source: CAMEL Workforce codebase](https://github.com/camel-ai/camel/tree/master/camel/societies/workforce)
+
+#### User-to-Agent Messaging Mid-Execution
+
+CAMEL has a **Human toolkit** (`human_toolkit.py`) and documented **Human-in-the-Loop** features including tool approval. The framework supports interactive components for human oversight and intervention. However, these are primarily **tool approval** mechanisms (approving tool calls before execution), not free-form message injection into any running agent at any time. The messaging pattern requires explicit setup in code.
+
+**Partial support** — human-in-the-loop tool approval exists, but mid-execution arbitrary message injection is not a built-in feature.
+
+[Source: CAMEL README - Human-in-the-Loop](https://github.com/camel-ai/camel)
+
+#### Conflict Prevention
+
+CAMEL has **no built-in conflict prevention** mechanism for non-overlapping file scopes. The framework provides code execution via `Interpreters` and toolkits like `code_execution.py`, but there is no file-locking, scope assignment, or conflict detection for parallel agents writing to the same filesystem.
+
+#### Role-Scoped Tooling
+
+**Yes** — CAMEL has an extensive toolkit system with **70+ toolkits** (search, browser, code execution, GitHub, Gmail, Slack, MCP, etc.). Tools are assigned to individual agents at creation time via the `tools` parameter of `ChatAgent`. Different agents can have completely different tool sets. The skill system also provides role-scoped capabilities.
+
+[Source: CAMEL toolkits directory](https://github.com/camel-ai/camel/tree/master/camel/toolkits)
+
+#### Skills System
+
+CAMEL has a **`.camel/skills`** directory with a directory-based skill organization. Currently contains `docs-incremental-update` and `skill-creator` skills. The `skill_toolkit.py` provides programmatic access to skills. Skills can be loaded and assigned to agents. However, the system does not appear to support both global and project-level skill directories — the skills are in a single project-level `.camel/skills` directory.
+
+[Source: .camel/skills directory](https://github.com/camel-ai/camel/tree/master/.camel/skills)
+
+#### LSP Integration
+
+CAMEL has **no LSP integration** documented. The framework focuses on multi-agent research and task automation, not on providing compiler diagnostics or language server features.
+
+#### Shell Access with Directory Permissions
+
+CAMEL provides shell access through its `TerminalToolkit` (`terminal_toolkit/`), `CodeExecution` toolkit, and `Interpreters`. However, there is **no documented permission control system** — no auto-allow lists, no prompt-for-out-of-scope directories, no directory-based sandboxing.
+
+[Source: CAMEL toolkits - terminal_toolkit](https://github.com/camel-ai/camel/tree/master/camel/toolkits/terminal_toolkit)
+
+#### Session Management
+
+CAMEL supports **session and conversation management** through its `Memory` module and `Storage` module, which provide persistent context layers for chat history and tool outputs. The framework logs model request/response to JSON files when `CAMEL_MODEL_LOG_ENABLED=true`. However, there is **no support for chat forking, model switching mid-conversation, or loading/resuming old chats** as a built-in feature.
+
+[Source: CAMEL README - Model Logging](https://github.com/camel-ai/camel)
+
+#### Human-in-the-Loop Checkpoints
+
+CAMEL supports **Human-in-the-Loop** features including tool approval (approving tool calls before execution) and interactive components. The `human_toolkit.py` provides human interaction capabilities. However, there is **no checkpoint system** for pausing execution at configurable predefined points — the human involvement is primarily about tool-call approval, not workflow-level pause/resume.
+
+**Partial support** — tool approval exists, but configurable execution checkpoints do not.
+
+[Source: CAMEL README - Human-in-the-Loop](https://github.com/camel-ai/camel)
+
+#### State Persistence
+
+CAMEL provides **stateful memory** as a core design principle. Agents maintain stateful memory enabling multi-step interactions. The `Memory` and `Storage` modules provide persistent context. However, there is **no documented mechanism for full workflow/execution state persistence across restarts** (like MetaGPT's breakpoint recovery). The logging system captures request/response data, but this is not a recovery mechanism.
+
+**Partial support** — agent memory persists within a session, but full execution state recovery across restarts is not documented.
+
+[Source: CAMEL README - Statefulness](https://github.com/camel-ai/camel)
+
+#### Provider-Agnostic LLM
+
+**Yes** — this is CAMEL's strongest area. The `ModelFactory` supports **45+ model platforms** including OpenAI, Azure, Anthropic, Gemini, Mistral, Cohere, Ollama, vLLM, SGLang, Groq, DeepSeek, Qwen, and many more. The `ModelFactory.create()` method provides a clean abstract interface, and models can also be created from YAML/JSON config files via `create_from_yaml()` and `create_from_json()`.
+
+[Source: model_factory.py](https://github.com/camel-ai/camel/blob/master/camel/models/model_factory.py)
+
+#### Multiple Interfaces
+
+CAMEL supports:
+- **Python library**: Primary interface as a Python package (`pip install camel-ai`)
+- **CLI**: Via environment variable configuration and example scripts
+- **API/Server**: Apps directory contains server applications (FastAPI-based)
+- No dedicated TUI or web UI (unlike ChatDev's Vue 3 frontend)
+
+[Source: CAMEL README](https://github.com/camel-ai/camel)
+
+---
+
+## Comparison Table
+
+| Requirement | MetaGPT | ChatDev 2.0 | CAMEL |
+|---|---|---|---|
+| **1. Three-layer hierarchy** | Not supported (flat role pool) | Partial (DAG + subgraph nesting) | Partial (recursive Workforce, code-configured) |
+| **2. Config-driven orchestrators** | Partial (infra config only, logic in code) | **Yes** (full YAML workflow definitions) | Partial (model config from YAML, logic in code) |
+| **3. Parallel subagent execution** | Not supported (sequential rounds) | **Yes** (Map/Tree modes, max_parallel) | Not supported (sequential workforce) |
+| **4. Strict hierarchy communication** | Not supported (broadcast) | Partial (edge-routed but no parent-only enforcement) | Partial (Workforce centralizes but no enforcement) |
+| **5. User-to-agent messaging mid-execution** | Partial (is_human role replacement) | Partial (human nodes in DAG) | Partial (tool approval, not free injection) |
+| **6. Conflict prevention** | Not supported | Not supported | Not supported |
+| **7. Role-scoped tooling** | **Yes** (per-role actions in code) | **Yes** (per-node tooling in YAML) | **Yes** (per-agent toolkits) |
+| **8. Skills system (markdown dirs)** | Not supported | Partial (.agents/skills dir exists) | Partial (.camel/skills dir exists) |
+| **9. LSP integration** | Not supported | Not supported | Not supported |
+| **10. Shell access with permissions** | Not supported (unrestricted subprocess) | Not supported (Docker for safety only) | Not supported (no permission system) |
+| **11. Session management** | Partial (breakpoint recovery) | Partial (context snapshots) | Partial (memory storage) |
+| **12. Human-in-the-loop checkpoints** | Not supported | **Yes** (human nodes + edge conditions) | Partial (tool approval only) |
+| **13. State persistence** | **Yes** (JSON serialization + recovery) | Partial (artifacts persist, no recovery) | Partial (memory/storage, no recovery) |
+| **14. Provider-agnostic LLM** | **Yes** (multiple providers via config) | **Yes** (per-node provider config) | **Yes** (45+ platforms, broadest support) |
+| **15. Multiple interfaces** | Partial (CLI + Python lib) | **Yes** (Web UI, CLI, HTTP API, Python SDK) | Partial (Python lib + server apps) |
+
+---
+
+## Source list
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [MetaGPT GitHub](https://github.com/geekan/MetaGPT) | GitHub |
+| 2 | [MetaGPT Documentation - Concepts](https://docs.deepwisdom.ai/main/en/guide/tutorials/concepts.html) | Official docs |
+| 3 | [MetaGPT Documentation - Agent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/agent_101.html) | Official docs |
+| 4 | [MetaGPT Documentation - MultiAgent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html) | Official docs |
+| 5 | [MetaGPT Documentation - Human Engagement](https://docs.deepwisdom.ai/main/en/guide/tutorials/human_engagement.html) | Official docs |
+| 6 | [MetaGPT Documentation - Breakpoint Recovery](https://docs.deepwisdom.ai/main/en/guide/in_depth_guides/breakpoint_recovery.html) | Official docs |
+| 7 | [MetaGPT config2.example.yaml](https://github.com/geekan/MetaGPT/blob/main/config/config2.example.yaml) | GitHub source |
+| 8 | [ChatDev GitHub](https://github.com/OpenBMB/ChatDev) | GitHub |
+| 9 | [ChatDev Workflow Authoring Guide](https://github.com/OpenBMB/ChatDev/blob/main/docs/user_guide/en/workflow_authoring.md) | Official docs |
+| 10 | [ChatDev .agents/skills](https://github.com/OpenBMB/ChatDev/tree/main/.agents/skills) | GitHub source |
+| 11 | [CAMEL GitHub](https://github.com/camel-ai/camel) | GitHub |
+| 12 | [CAMEL model_factory.py](https://github.com/camel-ai/camel/blob/master/camel/models/model_factory.py) | GitHub source |
+| 13 | [CAMEL societies/workforce](https://github.com/camel-ai/camel/tree/master/camel/societies/workforce) | GitHub source |
+| 14 | [CAMEL .camel/skills](https://github.com/camel-ai/camel/tree/master/.camel/skills) | GitHub source |
+| 15 | [CAMEL toolkits directory](https://github.com/camel-ai/camel/tree/master/camel/toolkits) | GitHub source |
+| 16 | [CAMEL Documentation](https://docs.camel-ai.org/) | Official docs |
+
+---
+
+## Verbatim quotes
+
+- "MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents" — [Source 1](https://github.com/geekan/MetaGPT)
+- "Code = SOP(Team) is the core philosophy. We materialize SOP and apply it to teams composed of LLMs." — [Source 1](https://github.com/geekan/MetaGPT)
+- "ChatDev has evolved from a specialized software development multi-agent system into a comprehensive multi-agent orchestration platform." — [Source 8](https://github.com/OpenBMB/ChatDev)
+- "ChatDev 2.0 (DevAll) is a Zero-Code Multi-Agent Platform for 'Developing Everything'. It empowers users to rapidly build and customize multi-agent systems through simple configuration. No coding is required." — [Source 8](https://github.com/OpenBMB/ChatDev)
+- "CAMEL is an open-source community dedicated to finding the scaling laws of agents." — [Source 11](https://github.com/camel-ai/camel)
+- "The framework enables multi-agent systems to continuously evolve by generating data and interacting with environments." — [Source 11](https://github.com/camel-ai/camel) (Evolvability principle)
+- "The framework is designed to support systems with millions of agents, ensuring efficient coordination, communication, and resource management at scale." — [Source 11](https://github.com/camel-ai/camel) (Scalability principle)
+
+---
+
+## Source quality flags
+
+- Source 5 (MetaGPT Human Engagement): mentions that the current interaction is "through terminal input, which is inconvenient for multi-line or structured writeup" — the feature is acknowledged as limited by maintainers
+- Source 8 (ChatDev GitHub): marketing language in the README — "World's first AI agent development team", "Zero-Code Multi-Agent Platform" — these are product positioning claims
+- Source 11 (CAMEL GitHub): marketing language — "the first and the best multi-agent framework" — this is self-promotion, not an objective claim
+
+---
+
+## Confidence: Medium
+
+The information is drawn directly from GitHub repositories, official documentation, and source code. Confidence is medium because some features (especially those related to hierarchy depth, parallel execution, and session management) required interpretation of architecture and code structure rather than explicit documentation. ChatDev 2.0 is very new (released Jan 2026) and its documentation is still evolving, so some features may be present but undocumented.
+
+---
+
+## Gaps and open questions
+
+1. **Hierarchy depth**: None of the three frameworks explicitly implements a three-layer dispatch->orchestrator->subagent architecture. ChatDev's subgraph nesting comes closest but is DAG-based, not tree-based. CAMEL's recursive Workforce is close but configured in code. A follow-up investigation should assess whether any framework could be extended to support this pattern without major refactoring.
+2. **Conflict prevention**: No framework addresses file-scope conflict prevention. This would likely require custom implementation regardless of framework choice.
+3. **LSP integration**: No framework has LSP support. This would be a novel feature to add.
+4. **Shell permissions**: No framework has directory-based shell permission controls. This is a significant gap for any production deployment that needs sandboxing.
+5. **MetaGPT release cadence**: The latest release (v0.8.1) is from April 2024 — over 13 months old. The project may be entering maintenance mode as the team focuses on MGX. This should be verified.
+6. **ChatDev 2.0 maturity**: ChatDev 2.0 was released in January 2026 and the documentation is still being written. Several module documentation pages (e.g., skills, tooling) are marked as "Chinese for now" or return 404. The framework's long-term stability is unproven.
+7. **CAMEL's parallel execution**: The workforce module's parallel execution capabilities need deeper investigation. The design principles mention scalability to "millions of agents" but current implementation appears sequential.
+
+
diff --git a/research/pi-dev-harness.md b/research/pi-dev-harness.md
new file mode 100644
index 0000000..8fb94b9
--- /dev/null
+++ b/research/pi-dev-harness.md
@@ -0,0 +1,322 @@
+# Subagent Report: pi.dev (Pi) — Super-Customizable AI Agent Harness
+
+## Research summary
+
+Pi (pi.dev / `@earendil-works/pi-coding-agent`) is a minimal, aggressively **extensible** terminal-based coding agent harness built in **TypeScript** by Mario Zechner (Earendil Inc.). It is **not** a multi-agent orchestration framework — it is a **single-agent CLI** (like Claude Code, opencode, or Codex) designed to be customized via TypeScript extensions, markdown skills, prompt templates, and themes. It has excellent built-in support for provider-agnostic LLMs, session management (tree-structured with branching), state persistence, user-to-agent messaging mid-execution, and multiple interfaces (TUI, CLI, RPC, SDK). However, **multi-agent hierarchy, config-driven orchestrators, LSP integration, conflict prevention, and role-scoped tooling are all absent by design** — the project philosophy is "primitives, not features." These would need to be built from scratch as extensions. Confidence: **high** — the source, docs, and blog posts are extensive and transparent about what Pi does and does not include.
+
+---
+
+## Findings
+
+### 1. Overview: What Pi Is and Is Not
+
+**What exactly IS Pi?** Pi is a monorepo containing five npm packages:
+
+| Package | Purpose |
+|---|---|
+| `@earendil-works/pi-ai` | Unified multi-provider LLM API (15+ providers: Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Groq, Cerebras, xAI, Hugging Face, OpenRouter, etc.) |
+| `@earendil-works/pi-agent-core` | Agent runtime with tool calling, state management, event streaming, compaction |
+| `@earendil-works/pi-coding-agent` | The main CLI: interactive TUI, print mode, JSON mode, RPC mode, SDK |
+| `@earendil-works/pi-tui` | Terminal UI library with differential rendering |
+| `@earendil-works/pi-web-ui` | Web components for AI chat interfaces |
+
+Pi is **not** a multi-agent orchestration framework like LangGraph or CrewAI. It is a **single-agent coding assistant CLI** that you run in a terminal. The "super customizable" claim refers to its extension system — you can add tools, commands, UI components, and behaviors via TypeScript extensions without forking the codebase.
+
+[Source: GitHub monorepo README](https://github.com/earendil-works/pi)
+
+**Language & Tech Stack:**
+
+- **TypeScript** (96.5% of the repo), JavaScript (2.9%), CSS (0.4%), Shell (0.2%)
+- Runtime: Node.js >= 22.19.0
+- Key dependencies: `jiti` (TypeScript loader for extensions), `typebox` (schema validation), `chalk`, `yaml`, `diff`, `undici`, `cross-spawn`
+- Packaging: npm ecosystem, published as `@earendil-works/pi-coding-agent`
+- Custom-built TUI framework (`pi-tui`) — not React/Ink-based
+
+[Source: package.json](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/package.json)
+
+**License:** MIT — fully open source.
+
+[Source: LICENSE file](https://github.com/earendil-works/pi/blob/main/LICENSE)
+
+**Activity & Community:** Extremely active.
+
+| Metric | Value |
+|---|---|
+| GitHub Stars | **51.4k** |
+| Forks | **6.1k** |
+| Commits | **4,188** |
+| Releases | **219** (latest: v0.75.3, May 18, 2026) |
+| Open Issues | 28 |
+| Open PRs | 6 |
+| Discord Community | Yes (linked from site) |
+| Maintainer | Mario Zechner (badlogicgames) / Earendil Inc. |
+
+Note: New contributor issues/PRs are auto-closed by default; maintainers review them daily.
+
+[Source: GitHub repo](https://github.com/earendil-works/pi)
+
+**Philosophy (from the maintainer's blog):**
+> "If I don't need it, it won't be built. And I don't need a lot of things."
+> "Pi is aggressively extensible so it doesn't have to dictate your workflow."
+> "Features that other tools bake in can be built with extensions, skills, or installed from third-party pi packages."
+
+Pi deliberately ships **without**: sub-agents, plan mode, permission popups, MCP support, background bash, and to-do lists. The maintainer believes these should be built as extensions or handled externally (tmux, containers, file-based plans).
+
+[Source: maintainer blog post](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/)
+
+---
+
+### 2. Architecture Deep-Dive
+
+#### Core Architecture (Layered)
+
+```
+┌────────────────────────────────────────────────┐
+│ pi-coding-agent (CLI/TUI) │
+│ Session management, extensions, skills, │
+│ themes, prompt templates, commands, UI │
+├────────────────────────────────────────────────┤
+│ pi-agent-core │
+│ Agent loop, tool execution, state management │
+│ Event streaming, compaction, transport layer │
+├────────────────────────────────────────────────┤
+│ pi-ai │
+│ Unified LLM API: OpenAI, Anthropic, Google, │
+│ and 12+ more providers. Tool calling, │
+│ streaming, thinking/reasoning, context handoff │
+├────────────────────────────────────────────────┤
+│ pi-tui / pi-web-ui │
+│ Terminal UI framework / Web components │
+└────────────────────────────────────────────────┘
+```
+
+[Source: GitHub README](https://github.com/earendil-works/pi)
+
+#### Agent Creation & Management
+
+Pi's agent model is **single-agent per session**. Key classes:
+
+- **`AgentSession`** — Manages one agent's lifecycle, message history, model state, streaming. Created via `createAgentSession()`.
+- **`Agent`** (from `@earendil-works/pi-agent-core`) — The core loop: processes user messages, executes tool calls, feeds results back to LLM, repeats until done.
+- **`AgentSessionRuntime`** — Wraps `AgentSession` with session replacement capabilities (new/resume/fork/clone).
+- **SessionManager** — Handles persistence as tree-structured JSONL files.
+
+The agent loop is minimal: no max-steps limit, no sub-agent spawning. It loops until the model produces a non-tool-call response.
+
+#### Extension System (Plugin Model)
+
+Extensions are **TypeScript modules** auto-discovered from well-known directories:
+
+| Location | Scope |
+|---|---|
+| `~/.pi/agent/extensions/*.ts` | Global (all projects) |
+| `~/.pi/agent/extensions/*/index.ts` | Global (subdirectory) |
+| `.pi/extensions/*.ts` | Project-local |
+| `.pi/extensions/*/index.ts` | Project-local (subdirectory) |
+
+Extensions export a default factory function receiving `ExtensionAPI`:
+
+```typescript
+import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
+
+export default function (pi: ExtensionAPI) {
+ // Register tools
+ pi.registerTool({ name: "my_tool", ... });
+
+ // Register commands
+ pi.registerCommand("mycmd", { handler: async (args, ctx) => { ... } });
+
+ // Register keyboard shortcuts
+ pi.registerShortcut("ctrl+x", { handler: async (ctx) => { ... } });
+
+ // Register CLI flags
+ pi.registerFlag("my-flag", { description: "..." });
+
+ // Hook events
+ pi.on("tool_call", async (event, ctx) => { ... });
+ pi.on("session_start", async (event, ctx) => { ... });
+ pi.on("before_agent_start", async (event, ctx) => { ... });
+ // ... 30+ event types
+}
+```
+
+**What extensions can do:**
+- Custom tools (or replace built-in tools entirely)
+- Intercept/block/modify tool calls via `tool_call` event
+- Custom compaction and summarization
+- Permission gates, path protection
+- Custom UI components, editors, status bars, overlays
+- SSH and sandbox execution
+- MCP server integration (via custom tool wrapping)
+- Sub-agents (spawn child `pi` processes via bash)
+- Session persistence via `pi.appendEntry()`
+- Custom providers and models
+
+Extensions are loaded via `jiti` (TypeScript compilation on-the-fly). Async factories are supported. Pi packages bundle extensions, skills, prompts, and themes for sharing via npm or git.
+
+**50+ extension examples** in the repo, including: subagent, plan-mode, permission-gate, protected-paths, sandbox, ssh, custom-compaction, snake game, Doom.
+
+[Source: Extensions documentation](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/docs/extensions.md)
+
+#### Skills System
+
+Pi implements the [Agent Skills standard](https://agentskills.io). Skills are markdown files with YAML frontmatter stored in directories:
+
+```
+~/.pi/agent/skills/my-skill/SKILL.md
+~/.agents/skills/
+.pi/skills/
+.agents/skills/ (in cwd and ancestor directories)
+```
+
+Skills follow progressive disclosure: only descriptions are always in the context; the full SKILL.md content loads on-demand when triggered via `/skill:name` command or when the agent decides to read it.
+
+#### Event Lifecycle
+
+Pi exposes ~30 events across the lifecycle:
+- **Resource events**: `resources_discover`
+- **Session events**: `session_start`, `session_before_switch`, `session_before_fork`, `session_before_compact`, `session_compact`, `session_shutdown`
+- **Agent events**: `before_agent_start`, `agent_start`, `agent_end`, `turn_start`, `turn_end`, `message_start/update/end`
+- **Tool events**: `tool_call` (can block), `tool_result` (can modify), `tool_execution_start/update/end`
+- **Model events**: `model_select`, `thinking_level_select`
+- **Input events**: `input` (can intercept/transform)
+- **User bash events**: `user_bash`
+
+[Source: Extensions documentation (event reference)](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/docs/extensions.md)
+
+---
+
+### 3. Requirements Checklist Evaluation
+
+For each requirement, two ratings are given:
+- **Built-in Support**: Is it present in the core today?
+- **Ease of Adding**: How hard to build on top of Pi's extension system?
+
+| # | Requirement | Built-in Support | Ease of Adding | Assessment |
+|---|---|---|---|---|
+| 1 | **Three-layer hierarchy** (dispatch→orchestrator→subagent) | **Not supported** — Pi is a single-agent system. No parent-child agent relationships. | **Hard** — No architectural concept of agent hierarchy exists. The subagent extension spawns child `pi` processes via bash, but has no orchestration layer, routing, or lifecycle management. Building a full 3-layer hierarchy with dispatch and orchestrator would require creating the entire system from scratch as an extension. | The architecture is flat: one agent session, one agent loop. Adding hierarchy means building a new abstraction layer atop the existing single-agent runtime, not extending an existing one. |
+| 2 | **Config-driven orchestrators** (orchestrator types defined via YAML/JSON config) | **Not supported** — No concept of "orchestrator types" or orchestration configs. | **Hard** — Pi uses JSON files for its own settings (settings.json, models.json), but there is no orchestrator abstraction. Would need to build a config schema and runtime interpreter for orchestrator definitions from scratch. | Pi's `settings.json` is for Pi configuration, not agent orchestration. A new config format and execution engine would be needed. |
+| 3 | **Parallel subagent execution** | **Not supported** built-in. The subagent example extension supports parallel execution (up to 8 tasks, 4 concurrent). | **Moderate** — The subagent extension already exists as a working example and supports parallel `pi` process spawning. However, it spawns child processes via bash (not in-process agents), has a 50KB output cap per task, and limited concurrency. Would need significant enhancement for production use. | Example exists but is a demo, not production-grade. Architecture of spawning subprocesses works but has limitations. |
+| 4 | **Strict hierarchy communication** (subagents only talk to parent orchestrator, no peer-to-peer) | **Not supported** — No communication framework between agents. Subagents (when used) are independent processes. | **Moderate** — Could build a message-passing protocol on top of the subagent extension or RPC mode. But Pi has no built-in mechanism for parent-child routing or peer-to-peer blocking. | Would need to implement a communication protocol and enforce routing rules. |
+| 5 | **User-to-agent messaging mid-execution** | **Supported** — Built-in message queuing. `Enter` queues a "steer" message (delivered after current tool call, interrupts remaining tools). `Alt+Enter` queues a "follow-up" (delivered when agent finishes). Escape aborts and restores queued messages. | N/A (already built-in) | **Strong feature.** Configurable delivery modes: `one-at-a-time` (default) or `all`. Also available programmatically via `session.steer()` and `session.followUp()` methods in the SDK. |
+| 6 | **Conflict prevention** (non-overlapping file scopes for parallel agents) | **Not supported** — Pi runs in YOLO mode with full filesystem access. No concept of file scope assignment. | **Hard** — Goes against Pi's core philosophy of unrestricted access. Could be approximated by running each agent in a different directory/container, but there's no built-in file-scope assignment or enforcement mechanism. | Pi's design philosophy explicitly rejects this kind of restriction. Working around it would be fighting the architecture. |
+| 7 | **Role-scoped tooling** (different agents get different tool sets based on role) | **Partial** — Pi's `--tools` flag can restrict which tools are available globally. Extensions can register custom tools. But there's no role-based assignment system. | **Moderate** — Could use the `before_agent_start` event to dynamically modify the toolset based on context. But there's no built-in concept of "agent roles" or tool-to-role mapping. | Single-agent system means no role differentiation. Would need to build role management into whatever multi-agent layer you add. |
+| 8 | **Skills system** (injectable markdown instructions per agent type, with specific directory structure) | **Partial** — Pi has a full skills system following the Agent Skills standard. Skills are markdown files with YAML frontmatter. They are discovered from `~/.pi/agent/skills/`, `~/.agents/skills/`, `.pi/skills/`, and ancestor `.agents/skills/` directories. However, the directory structure does NOT match the required `default/`, `agents/`, `project/` subdirectory scheme. | **Easy** — The skills system is mature and extensible. The directory structure is configurable through the `DefaultResourceLoader`. Adding support for additional directory conventions would be straightforward. | Skills system is one of Pi's strongest extensibility points. The directory convention difference is minor. |
+| 9 | **LSP integration** (Language Server Protocol for compiler/linter diagnostics) | **Not supported** — No LSP client. Pi has no compiler, linter, or language server integration. | **Hard** — Would need to build an LSP client as an extension, handling stdio JSON-RPC protocol, file synchronization, diagnostics display, etc. No existing LSP primitives exist in the codebase. | This is a significant feature to build, but not architecturally impossible — just no existing support. |
+| 10 | **Shell access with directory permissions** (auto-allow lists, prompt for out-of-scope directories) | **Not supported** — Pi has full unrestricted bash access by design ("YOLO mode"). There IS a `permission-gate.ts` extension example that prompts before dangerous commands, and a `protected-paths.ts` extension example. | **Moderate** — The tool_call event can intercept bash commands. Directory awareness could be added. But the permission model would need to be built from scratch (auto-allow lists, scope checking). | The extension event model makes this possible, but Pi's philosophy is deliberately against permission systems. |
+| 11 | **Session management** (chat forking, model switching mid-conversation, loading/resuming old chats) | **Supported** — Excellent built-in session management: tree-structured JSONL files, `/tree` navigation to any previous point, `/fork` (new session from user message), `/clone` (duplicate branch), `/resume` (pick from past sessions), `/model` to switch models mid-session, HTML export, share via GitHub gist. Auto-save on every message. | N/A (already built-in) | **Strong feature.** Sessions persist to `~/.pi/agent/sessions/` organized by working directory. Continue with `pi -c`. |
+| 12 | **Human-in-the-loop checkpoints** (execution pauses at configurable points for user approval) | **Partial** — No built-in checkpoint system. But the extension event model allows blocking on any tool_call event. The permission-gate extension demonstrates this pattern. | **Easy** — Extensions can block tool execution via `{ block: true, reason: "..." }` return from `tool_call` event handler. Can show confirmation dialogs via `ctx.ui.confirm()`. Configurable checkpoint logic can be implemented entirely in an extension. | The `tool_call` event's blocking capability is exactly designed for this. |
+| 13 | **State persistence** (sessions, plans, artifacts persist across restarts) | **Supported** — Sessions auto-save to disk as JSONL files. Everything persists: full message history, model state, compaction metadata, tool results. `pi -c` continues the most recent session. Sessions survive process restarts. | N/A (already built-in) | **Strong feature.** Tree structure with branching means no information is lost — old branches remain accessible via `/tree`. |
+| 14 | **Provider-agnostic LLM** (multiple providers through abstract interface) | **Supported** — 15+ built-in providers (Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Groq, Cerebras, xAI, Hugging Face, OpenRouter, Together AI, Fireworks, DeepSeek, Kimi, MiniMax, etc.), custom providers via `models.json`, custom providers via extensions with full OAuth flows. Model switching mid-session. | N/A (already built-in) | **Strong feature.** Cross-provider context handoff is built into `@earendil-works/pi-ai` — models can be switched mid-conversation. |
+| 15 | **Multiple interfaces** (CLI, TUI, API modes) | **Supported** — 4 built-in modes: **Interactive** (full TUI), **Print** (`-p` for scripts), **JSON** (`--mode json` for structured output), **RPC** (`--mode rpc` for stdin/stdout JSONL protocol). Plus an **SDK** for embedding Pi in Node.js apps. Plus a **web-ui** package for web interfaces. | N/A (already built-in) | **Strong feature.** The SDK exports `InteractiveMode`, `runPrintMode`, and `runRpcMode` utilities for building custom interfaces on top. |
+
+---
+
+### 4. Strengths and Weaknesses as a Base for the Dispatch Harness
+
+#### Strengths
+
+1. **Exceptional session management** — Tree-structured sessions with branching, forking, cloning, resume, and model switching mid-conversation. This provides a robust foundation for state persistence.
+
+2. **Excellent provider-agnostic LLM layer** — 15+ providers, cross-provider context handoff, and a clean abstraction (`@earendil-works/pi-ai`). This could directly power the LLM layer of a dispatch system.
+
+3. **Powerful extension system** — TypeScript extensions with full access to lifecycle events, tool registration, UI components, and state management. The `tool_call` event's blocking capability is ideal for human-in-the-loop checkpoints. The `before_agent_start` event allows dynamic system prompt modification. These are the primitives needed for orchestration logic.
+
+4. **Message queuing mid-execution** — Built-in steer/follow-up messaging provides the foundation for user-to-agent communication during execution.
+
+5. **Multiple interface modes** — TUI, CLI, JSON, RPC, and SDK mean the system can be used interactively, programmatically, or embedded.
+
+6. **Mature skills system** — Follows the Agent Skills standard, with progressive disclosure. Easily adaptable to different directory conventions.
+
+7. **MIT license** — No restrictions on use or modification.
+
+8. **Active community and maintenance** — 219 releases, very active development. The project is not abandoned.
+
+#### Weaknesses
+
+1. **No multi-agent architecture** — Pi is fundamentally single-agent. There is no concept of agent hierarchy, orchestrator, dispatch, or subagent management. Building a 3-layer hierarchy (dispatch→orchestrator→subagent) means creating an entirely new architectural layer on top of Pi, not extending an existing one. This is the single biggest gap.
+
+2. **No config-driven orchestration** — Pi has no YAML/JSON-based orchestrator definitions, no workflow DSL, no task routing. This would need to be built from scratch.
+
+3. **No role-scoped tooling** — Pi's tool model is flat. There's no concept of "this agent type gets these tools." Role-based tool assignment would need to be built.
+
+4. **"YOLO by design" philosophy** — The maintainer explicitly rejects permission gates, file-scope restrictions, and safety rails. While extensions can add some of these, the architecture and philosophy push against them. Conflict prevention for parallel agents (requirement 6) is particularly at odds with Pi's design.
+
+5. **Subagent implementation is ad-hoc** — The subagent extension spawns child `pi` processes via bash, not in-process agents. This means: separate process overhead, no shared state, limited communication, 50KB output cap. Production-grade subagent management would need significant rework.
+
+6. **No LSP integration** — Building LSP support from scratch is a significant undertaking.
+
+7. **No built-in WebSocket/server mode** — While RPC mode exists, there's no persistent server/API mode that could serve as a dispatch endpoint. The SDK can be embedded, but you'd need to build the server layer.
+
+8. **Node.js-only** — The entire stack is TypeScript/Node.js. If the Dispatch system needs polyglot support, Pi cannot provide it.
+
+#### Summary Verdict
+
+Pi is **not a suitable base framework** for the Dispatch requirements as stated, primarily because it lacks any multi-agent architecture. It would require building the entire hierarchy, orchestration, routing, and role systems from scratch. What Pi does provide (session management, provider abstraction, extension system, state persistence, messaging) are valuable **components that could be used within** a Dispatch-like system, but Pi itself is the wrong substrate.
+
+A better approach would be to use `@earendil-works/pi-ai` and `@earendil-works/pi-agent-core` as **libraries** in a custom-built orchestration system, rather than trying to extend the Pi CLI into something it was never designed to be.
+
+---
+
+## Source list
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [pi.dev website](https://pi.dev) | Official website |
+| 2 | [GitHub monorepo](https://github.com/earendil-works/pi) | Source code |
+| 3 | [Coding Agent README](https://github.com/earendil-works/pi/tree/main/packages/coding-agent) | Official docs |
+| 4 | [Extensions documentation](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/docs/extensions.md) | Official docs |
+| 5 | [Skills documentation](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/docs/skills.md) | Official docs |
+| 6 | [SDK documentation](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/docs/sdk.md) | Official docs |
+| 7 | [Subagent extension example](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/examples/extensions/subagent/README.md) | Example code |
+| 8 | [Maintainer's blog post](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/) | Blog post |
+| 9 | [package.json (coding-agent)](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/package.json) | Source metadata |
+| 10 | [package.json (agent-core)](https://github.com/earendil-works/pi/blob/main/packages/agent/package.json) | Source metadata |
+| 11 | [Permission gate extension example](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/examples/extensions/permission-gate.ts) | Example code |
+| 12 | [Subagent extension source](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/examples/extensions/subagent/index.ts) | Example code |
+
+---
+
+## Verbatim quotes
+
+- "Pi is a minimal terminal coding harness. Adapt Pi to your workflows, not the other way around." — [pi.dev](https://pi.dev)
+- "Pi ships with powerful defaults but skips features like sub-agents and plan mode. Ask Pi to build what you want, or install a package that does it your way." — [pi.dev](https://pi.dev)
+- "No sub-agents. There's many ways to do this. Spawn pi instances via tmux, or build your own with extensions, or install a package that does it your way." — [Coding Agent README](https://github.com/earendil-works/pi/tree/main/packages/coding-agent)
+- "If I don't need it, it won't be built. And I don't need a lot of things." — [Mario Zechner's blog](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/)
+- "pi runs in full YOLO mode and assumes you know what you're doing. It has unrestricted access to your filesystem and can execute any command without permission checks or safety rails." — [Mario Zechner's blog](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/)
+- "Spawning multiple sub-agents to implement various features in parallel is an anti-pattern in my book and doesn't work, unless you don't care if your codebase devolves into a pile of garbage." — [Mario Zechner's blog](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/)
+- "Extensions are TypeScript modules that extend pi's behavior. They can subscribe to lifecycle events, register custom tools callable by the LLM, add commands, and more." — [Extensions documentation](https://github.com/earendil-works/pi/blob/main/packages/coding-agent/docs/extensions.md)
+- "Pi does not and will not have a built-in plan mode." — [Mario Zechner's blog](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/)
+- "pi's system prompt and tool definitions together come in below 1000 tokens." — [Mario Zechner's blog](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/)
+- "Submit messages while the agent works. Enter sends a steering message (delivered after current tool, interrupts remaining tools). Alt+Enter sends a follow-up (waits until the agent finishes)." — [pi.dev](https://pi.dev)
+- "I prefer Claude Code for most of my work... Over the past few months, Claude Code has turned into a spaceship with 80% of functionality I have no use for." — [Mario Zechner's blog](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/)
+
+---
+
+## Source quality flags
+
+- Source 6 (Maintainer's blog): **Personal blog post** — strong authority on Pi's design philosophy and rationale, but represents one person's opinion. Explicitly states design decisions that may not align with all use cases. The benchmark claims are specific to Pi and useful for comparison but should be taken as one data point.
+- Source 1 (pi.dev): **Marketing website** — the landing page is promotional, but the actual content is technically accurate and links to verifiable source code. Not marketing hype in the traditional sense, but does emphasize features positively.
+- No AI-generated summaries or paid content were used.
+
+---
+
+## Confidence: High
+
+Comprehensive primary source data was available: the full source code (GitHub), extensive documentation (extensions.md at 94KB, sdk.md at 32KB), the maintainer's detailed technical blog post, and the package.json metadata. The project is transparent about what it does and doesn't do. No conflicting information was found across sources.
+
+---
+
+## Gaps and open questions
+
+1. **Real-world multi-agent usage**: No evidence was found of anyone successfully building a production multi-agent orchestration layer on top of Pi. The subagent extension is explicitly labeled as an example/demo. It's unknown how well it holds up under production loads.
+
+2. **Performance with deeply hierarchical systems**: Since Pi's architecture is single-agent, there's no data on how it performs when one Pi instance orchestrates many child Pi instances. The subagent example caps at 8 tasks/4 concurrent.
+
+3. **LSP integration**: No community extensions or discussions about LSP support were found. The feasibility of building an LSP extension is theoretical.
+
+4. **Conflict prevention approaches**: While Pi's philosophy rejects permissions, there may be creative approaches using containers, directory-restricted `pi` instances, or RPC-level routing that were not explored in this research.
+
+5. **Community ecosystem size**: The Discord server exists and packages are listed on npm, but no hard data was found on how many third-party extensions/packages exist or how active the community is beyond the core maintainer.
+
+6. **Comparison with opencode**: The maintainer mentions opencode in his blog (using their models.dev data), and Pi was partly inspired by frustrations with Claude Code. But no direct architectural comparison with opencode was found — this would be valuable for evaluating which framework is a better base for the Dispatch requirements.