1 files changed, 235 insertions, 0 deletions
diff --git a/research/multi-agent-orchestration.md b/research/multi-agent-orchestration.md
new file mode 100644
index 0000000..6c5c5f5
--- /dev/null
+++ b/research/multi-agent-orchestration.md
@@ -0,0 +1,235 @@
+# Subagent Report: Multi-Agent Orchestration Frameworks
+
+## Research summary
+This report evaluates three open-source multi-agent orchestration frameworks — **CrewAI**, **Microsoft AutoGen (AG2)**, and **LangGraph** — against a 15-point requirements checklist for an AI agent harness. CrewAI is the most opinionated with a built-in hierarchy and YAML-driven configuration, LangGraph is the lowest-level graph-based orchestration runtime with the most flexible persistence model, and AutoGen is a maintenance-mode framework with a layered design and no-code GUI. No framework fully satisfies all 15 requirements; each has gaps in the areas of LSP integration, shell access with directory permissions, and skills systems as defined.
+
+---
+
+## Findings
+
+### 1. CrewAI
+
+#### Core Architecture
+
+CrewAI is an open-source Python framework for orchestrating role-based AI agents. It is built entirely from scratch — independent of LangChain. It provides two complementary paradigms: **Crews** (autonomous agent teams with role-based collaboration) and **Flows** (event-driven, stateful workflows). Version 1.14.5 (latest as of May 2026). Language: Python (99%+). Stars: ~51.7k on GitHub. Commits: ~2,414. Recent releases: 191 total, active.
+
+**Architecture layers:**
+- **Agent** — role, goal, backstory, tools, LLM config
+- **Crew** — orchestrates a team of agents with a Process (sequential, hierarchical, or hybrid)
+- **Flow** — higher-level orchestration with `@start`, `@listen`, `@router` decorators, state management, persistence
+- **Config** — YAML files for agents and tasks (recommended approach)
+
+**Primary use case:** General-purpose multi-agent automation for enterprise workflows. Not specifically focused on coding — more general business process automation.
+
+**Project status:** Very active. Backed by a company (CrewAI Inc.). 100k+ certified developers in the community.
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Detail |
+|---|------------|--------|--------|
+| 1 | **Three-layer hierarchy** | **Partial** | CrewAI has a native hierarchical process where a manager agent coordinates sub-agents. However, it's a 2-level hierarchy (manager → agent), not 3-level (dispatch → orchestrator → subagent). Flows can chain multiple crews, achieving multi-level composition programmatically but not as a built-in dispatch architecture. |
+| 2 | **Config-driven orchestrators** | **Fully** | Agents and tasks are defined in YAML (`agents.yaml`, `tasks.yaml`) loaded via `@CrewBase` decorators. This is the recommended approach. Source: CrewAI docs "YAML Configuration (Recommended)" |
+| 3 | **Parallel subagent execution** | **Fully** | Tasks support `async_execution=True` for parallel execution. Multiple tasks can run concurrently when using the context mechanism for dependency management. Source: CrewAI Tasks docs on Asynchronous Execution |
+| 4 | **Strict hierarchy communication** | **Partial** | The hierarchical process assigns a manager that delegates tasks and validates results, providing structured parent-child communication. However, there is no built-in mechanism to prevent peer-to-peer agent messaging when delegation is enabled (`allow_delegation`). |
+| 5 | **User-to-agent messaging mid-execution** | **Partial** | CrewAI supports `@human_feedback` decorator (v1.8.0+) for human-in-the-loop in Flows, and `human_input=True` on tasks. However, this is for configured approval points, not arbitrary mid-execution injection to any running agent. |
+| 6 | **Conflict prevention** | **Not at all** | No built-in mechanism for assigning non-overlapping file scopes to parallel agents. Code execution is deprecated (uses external sandboxes like E2B). |
+| 7 | **Role-scoped tooling** | **Fully** | Agents can have different tool sets based on role. Tools are assigned per-agent via the `tools` parameter. Tasks can also override tools. Source: CrewAI Agents documentation on tools |
+| 8 | **Skills system** | **Partial** | Supports custom system templates, prompt templates, and response templates per agent. The project template uses `agents.yaml` for defining agent behaviors. Has a "skills" feature via MCP/skills.sh for AI coding assistants, but no directory-based markdown instruction system for agent definition. |
+| 9 | **LSP integration** | **Not at all** | No Language Server Protocol integration for compiler diagnostics. |
+| 10 | **Shell access with directory permissions** | **Not at all** | Code execution is deprecated in favor of external sandboxes. No shell access with permission controls. |
+| 11 | **Session management** | **Partial** | Flows support `@persist` decorator for state persistence across restarts using SQLite. Supports "fork" via `restore_from_state_id`. No chat forking or model switching mid-conversation in the traditional sense. |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** | `@human_feedback` decorator on Flow methods pauses execution and collects feedback. `human_input=True` on tasks enables human review. Source: CrewAI Flows docs on human_feedback |
+| 13 | **State persistence** | **Fully** | `@persist` decorator on Flows persists state to SQLite automatically. Supports resume and fork patterns. Source: CrewAI Flows persistence docs |
+| 14 | **Provider-agnostic LLM** | **Fully** | Supports OpenAI, Azure, Anthropic, Ollama, Gemini, and many more via LiteLLM integration. The `llm` parameter accepts model strings or `LLM` instances. Source: CrewAI docs "Connecting Your Crew to a Model" |
+| 15 | **Multiple interfaces** | **Partial** | Has a CLI (`crewai create`, `crewai run`, `crewai flow kickoff`) and a Python API. No native TUI or API server in the open-source version (CrewAI AMP provides enterprise management console). |
+
+---
+
+### 2. Microsoft AutoGen (AG2)
+
+#### Core Architecture
+
+AutoGen is a Python framework for building multi-agent AI applications. Developed by Microsoft Research. Currently in **maintenance mode** — no new features, community-managed only. Latest release: `python-v0.7.5` (Sep 2025). Stars: ~58.2k. Forks: ~8.8k. Commits: ~3,782. Microsoft recommends migrating to **Microsoft Agent Framework** for new projects.
+
+**Architecture layers (3-tier design):**
+- **Core API** — message passing, event-driven agents, distributed runtime, cross-language (Python + .NET)
+- **AgentChat API** — higher-level opinionated API for rapid prototyping with agents, teams, group chats
+- **Extensions API** — model clients, tools, code execution backends
+
+**Team patterns:** RoundRobinGroupChat, SelectorGroupChat, Swarm, MagenticOneGroupChat, GraphFlow
+
+**Primary use case:** Conversational multi-agent AI applications, research, prototyping. Magentic-One for web/file tasks.
+
+**Project status:** Maintenance mode. No new features. Users directed to Microsoft Agent Framework.
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Detail |
+|---|------------|--------|--------|
+| 1 | **Three-layer hierarchy** | **Partial** | AutoGen has a flat agent model with "teams" orchestrating agents. It supports Swarm and SelectorGroupChat for multi-agent coordination, and GraphFlow for workflows. No native 3-layer dispatch → orchestrator → subagent hierarchy exists. Subordination must be implemented manually via `AgentTool` wrapping. |
+| 2 | **Config-driven orchestrators** | **Partial** | AutoGen Studio provides a no-code GUI for prototyping. The framework itself is code-first — teams, agents, and termination conditions are defined in Python code. Component serialization exists (`.dump_component()`) but is not a YAML-based config system. |
+| 3 | **Parallel subagent execution** | **Partial** | The `AgentTool` pattern allows one agent to call another as a tool, but this is sequential delegation, not parallel execution. RoundRobinGroupChat is sequential (turn-based). No native parallel agent execution within a team. |
+| 4 | **Strict hierarchy communication** | **Partial** | In SelectorGroupChat and Swarm, speaker selection is controlled. `AgentTool` wrapping creates a tool-call boundary. However, no strict parent-child communication restriction mechanism is built in. |
+| 5 | **User-to-agent messaging mid-execution** | **Fully** | `UserProxyAgent` allows injecting user input during team execution (blocking). `ExternalTermination` can stop teams mid-execution. `HandoffTermination` enables handoff to user. Source: AutoGen Human-in-the-Loop docs |
+| 6 | **Conflict prevention** | **Not at all** | No built-in mechanism for non-overlapping file scopes in parallel agents. |
+| 7 | **Role-scoped tooling** | **Fully** | Each `AssistantAgent` can have its own set of tools. Agent descriptions define their role for the selector. Source: AutoGen AgentChat docs on agents and tools |
+| 8 | **Skills system** | **Not at all** | No skills system for injecting markdown/text instructions per agent type. Agents are configured via `system_message` string and `description` string in code. |
+| 9 | **LSP integration** | **Not at all** | No Language Server Protocol integration. |
+| 10 | **Shell access with directory permissions** | **Not at all** | Code execution requires external sandboxes or MCP tools. No built-in shell access with permissions. |
+| 11 | **Session management** | **Partial** | Teams support `save_state()` and `load_state()` for persisting conversation state. State can be serialized to JSON. No chat forking or model switching mid-conversation. |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** | `UserProxyAgent` for inline feedback, `HandoffTermination` for async feedback, `max_turns` for turn-based pausing. Source: AutoGen Human-in-the-Loop tutorial |
+| 13 | **State persistence** | **Fully** | `save_state()` / `load_state()` on agents and teams. State dictionaries can be serialized to file or database. Source: AutoGen Managing State docs |
+| 14 | **Provider-agnostic LLM** | **Fully** | Supports OpenAI, Azure OpenAI, Azure AI Foundry, Anthropic (experimental), Ollama (experimental), Gemini (via API), Llama API, plus Semantic Kernel adapter for even more providers. Source: AutoGen Models docs |
+| 15 | **Multiple interfaces** | **Fully** | Python API, CLI (`autogenstudio ui`), AutoGen Studio (no-code GUI web app), and FastAPI/ChainLit/Streamlit integration samples. Source: AutoGen README and FastAPI sample |
+
+---
+
+### 3. LangGraph
+
+#### Core Architecture
+
+LangGraph is a low-level orchestration framework for building stateful, long-running agents. Developed by LangChain Inc. Built as a graph-based runtime inspired by Google's Pregel and Apache Beam. Latest release: `langgraph==1.2.0` (May 2026). Stars: ~32.4k. Commits: ~6,862. Active development (534 releases total).
+
+**Architecture:**
+- **StateGraph** — defines state schema (TypedDict/Pydantic), nodes (functions), edges (conditional/static)
+- **Subgraphs** — graphs used as nodes in other graphs (supports multi-agent patterns)
+- **Checkpointer** — persistence layer for state snapshots at every super-step
+- **Store** — cross-thread memory for long-term knowledge
+- **Persistence modes:** per-invocation (default), per-thread, stateless
+
+**Primary use case:** Low-level agent orchestration for complex, stateful, long-running workflows. Used by Klarna, Replit, Elastic, Uber, J.P. Morgan. Higher-level abstraction available via Deep Agents and LangChain agents.
+
+**Project status:** Very active. Backed by LangChain Inc with commercial LangSmith platform.
+
+#### Requirements Checklist
+
+| # | Requirement | Status | Detail |
+|---|------------|--------|--------|
+| 1 | **Three-layer hierarchy** | **Fully** | LangGraph's subgraph architecture supports arbitrary nesting. A parent graph can contain subgraphs, which can contain further subgraphs. `Command(goto=..., graph=Command.PARENT)` enables navigation between levels. Each level has its own state schema. Source: LangGraph Subgraphs documentation |
+| 2 | **Config-driven orchestrators** | **Not at all** | LangGraph is purely code-defined — graphs, nodes, edges, and state schemas are all Python code. No YAML or config file support for defining orchestrator types. LangSmith Studio provides a UI but generates code. |
+| 3 | **Parallel subagent execution** | **Fully** | Multiple outgoing edges from a single node execute in parallel (same super-step). `Send()` API enables map-reduce patterns with dynamic fan-out. Subgraphs can run in parallel. Source: LangGraph Graph API docs on edges and Send |
+| 4 | **Strict hierarchy communication** | **Fully** | Subgraphs can have private state schemas invisible to the parent graph. When a subgraph is invoked via a node function, the parent only sees what the node function returns. State isolation is achieved via separate state schemas. Source: LangGraph Subgraphs docs on different state schemas |
+| 5 | **User-to-agent messaging mid-execution** | **Fully** | `interrupt()` function pauses graph execution and returns control to the caller. The caller can inspect state and resume with `Command(resume=...)`. Supports multiple simultaneous interrupts. Source: LangGraph Interrupts documentation |
+| 6 | **Conflict prevention** | **Not at all** | No built-in mechanism for file scope conflict prevention. |
+| 7 | **Role-scoped tooling** | **Fully** | Each agent node can have its own set of tools. In multi-agent patterns, subgraphs/agents have independent tool configurations. Tools are LangChain-compatible. |
+| 8 | **Skills system** | **Not at all** | No skills system for injecting markdown instructions per agent type. No directory-based instruction organization. Agents use system prompts defined in code. |
+| 9 | **LSP integration** | **Not at all** | No Language Server Protocol integration. |
+| 10 | **Shell access with directory permissions** | **Not at all** | No built-in shell access. Code execution relies on external tools or LangChain tool integrations. |
+| 11 | **Session management** | **Partial** | Checkpointer-based threads provide conversation history via `get_state_history()`. Time-travel debugging via replay from checkpoints. `update_state()` for editing state. No explicit chat forking or model switching mid-conversation. |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** | `interrupt()` function for dynamic pausing. Static breakpoints via `interrupt_before`/`interrupt_after` at compile time. Supports approval workflows, review-and-edit, and validation loops. Source: LangGraph Interrupts documentation |
+| 13 | **State persistence** | **Fully** | Multiple checkpointers: InMemorySaver, SqliteSaver, PostgresSaver, Azure CosmosDB. Checkpoints at every super-step. Cross-thread memory via Store. Encryption support. Source: LangGraph Persistence documentation |
+| 14 | **Provider-agnostic LLM** | **Fully** | LangGraph can use any LangChain-compatible model provider (OpenAI, Anthropic, Google, Ollama, AWS Bedrock, Azure, etc.) plus standalone models without LangChain. The `Runtime` context can pass model configuration. |
+| 15 | **Multiple interfaces** | **Partial** | Python API primarily. LangSmith Studio for visual prototyping. LangGraph API for deployment. No native CLI or TUI. Deep Agents SDK provides a higher-level interface. |
+
+---
+
+## Summary Comparison Table
+
+| # | Requirement | CrewAI | AutoGen (AG2) | LangGraph |
+|---|------------|--------|---------------|-----------|
+| 1 | **Three-layer hierarchy** | Partial (2-level natively, chaining via Flows) | Partial (flat agent teams, Swarm/GraphFlow) | **Fully** (arbitrary nesting via subgraphs) |
+| 2 | **Config-driven orchestrators** | **Fully** (YAML agents.yaml/tasks.yaml) | Partial (code-first, Studio GUI, component serialization) | Not at all (purely code-defined) |
+| 3 | **Parallel subagent execution** | **Fully** (async_execution tasks) | Partial (sequential team patterns, AgentTool is blocking) | **Fully** (Send API, parallel edges, map-reduce) |
+| 4 | **Strict hierarchy communication** | Partial (manager delegates but no P2P prevention) | Partial (SelectorGroupChat controls turns, AgentTool boundary) | **Fully** (private state schemas, subgraph isolation) |
+| 5 | **User-to-agent messaging mid-execution** | Partial (@human_feedback at configured points) | **Fully** (UserProxyAgent, ExternalTermination, HandoffTermination) | **Fully** (interrupt()/Command(resume=...) anywhere) |
+| 6 | **Conflict prevention** | Not at all | Not at all | Not at all |
+| 7 | **Role-scoped tooling** | **Fully** (per-agent tools, task override) | **Fully** (per-agent tools) | **Fully** (per-node tools, LangChain-compatible) |
+| 8 | **Skills system** | Partial (agent templates, prompt customization) | Not at all | Not at all |
+| 9 | **LSP integration** | Not at all | Not at all | Not at all |
+| 10 | **Shell access with directory permissions** | Not at all | Not at all | Not at all |
+| 11 | **Session management** | Partial (Flow persist/fork) | Partial (save_state/load_state, JSON serialization) | Partial (checkpointer history, time travel, update_state) |
+| 12 | **Human-in-the-loop checkpoints** | **Fully** (@human_feedback, human_input on tasks) | **Fully** (UserProxyAgent, HandoffTermination, max_turns) | **Fully** (interrupt(), static breakpoints, approval patterns) |
+| 13 | **State persistence** | **Fully** (@persist with SQLite, resume/fork) | **Fully** (save_state/load_state to file or DB) | **Fully** (checkpointers: Memory, SQLite, Postgres, CosmosDB) |
+| 14 | **Provider-agnostic LLM** | **Fully** (many providers via LiteLLM) | **Fully** (many providers via Extensions API + SK adapter) | **Fully** (all LangChain providers, plus standalone mode) |
+| 15 | **Multiple interfaces** | Partial (CLI + Python API, AMP enterprise console) | **Fully** (Python API, CLI, Studio web GUI, FastAPI/Streamlit) | Partial (Python API, LangSmith Studio, LangGraph API) |
+
+---
+
+## Overall Assessment
+
+### CrewAI
+**Strength:** Most opinionated framework for role-based agents with YAML-driven configuration. Excellent for enterprise automation where you want to define agents declaratively. Strong community (100k+ certified developers), active development, and a commercial offering (CrewAI AMP).
+
+**Key Gaps for this harness:** No LSP, no shell permissions, no 3-layer dispatch hierarchy natively, limited mid-execution user injection.
+
+### AutoGen (AG2)
+**Strength:** Best GUI/studio support for prototyping. Most flexible human-in-the-loop with `UserProxyAgent` and handoff patterns. Multiple interface options (Python, CLI, Studio web app). Provider-agnostic model support with Semantic Kernel adapter.
+
+**Key Gaps:** **Maintenance mode** — no new features, users directed to Microsoft Agent Framework. No parallel execution, no config-driven setup, flat agent model.
+
+### LangGraph
+**Strength:** Most architecturally flexible — arbitrary graph topologies, arbitrary subgraph nesting, the richest persistence model (multiple checkpointers + cross-thread store), durable execution, and the most sophisticated interrupt system. Best for complex, long-running, stateful workflows.
+
+**Key Gaps:** Code-only configuration (no YAML), no built-in skills system, no CLI/TUI, steeper learning curve due to low-level nature. No file scope conflict prevention.
+
+---
+
+## Source list
+
+| # | Source | Type |
+|---|--------|------|
+| 1 | [CrewAI GitHub Repository](https://github.com/crewAIInc/crewAI) | official |
+| 2 | [CrewAI Documentation - Agents](https://docs.crewai.com/concepts/agents) | official |
+| 3 | [CrewAI Documentation - Tasks](https://docs.crewai.com/concepts/tasks) | official |
+| 4 | [CrewAI Documentation - Flows](https://docs.crewai.com/concepts/flows) | official |
+| 5 | [CrewAI Documentation - Memory](https://docs.crewai.com/concepts/memory) | official |
+| 6 | [CrewAI Documentation - Processes](https://docs.crewai.com/core-concepts/Processes/) | official |
+| 7 | [AutoGen GitHub Repository](https://github.com/microsoft/autogen) | official |
+| 8 | [AutoGen Documentation - Teams](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/teams.html) | official |
+| 9 | [AutoGen Documentation - Human-in-the-Loop](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/human-in-the-loop.html) | official |
+| 10 | [AutoGen Documentation - Managing State](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/state.html) | official |
+| 11 | [AutoGen Documentation - Models](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/models.html) | official |
+| 12 | [AutoGen Documentation - Selector Group Chat](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/selector-group-chat.html) | official |
+| 13 | [LangGraph GitHub Repository](https://github.com/langchain-ai/langgraph) | official |
+| 14 | [LangGraph Documentation - Overview](https://docs.langchain.com/oss/python/langgraph/overview) | official |
+| 15 | [LangGraph Documentation - Graph API](https://docs.langchain.com/oss/python/langgraph/graph-api) | official |
+| 16 | [LangGraph Documentation - Subgraphs](https://docs.langchain.com/oss/python/langgraph/use-subgraphs) | official |
+| 17 | [LangGraph Documentation - Interrupts](https://docs.langchain.com/oss/python/langgraph/interrupts) | official |
+| 18 | [LangGraph Documentation - Persistence](https://docs.langchain.com/oss/python/langgraph/persistence) | official |
+
+---
+
+## Verbatim quotes
+
+- "CrewAI is a lean, lightning-fast Python framework built entirely from scratch—completely independent of LangChain or other agent frameworks." — [CrewAI GitHub README](https://github.com/crewAIInc/crewAI)
+- "Using YAML configuration provides a cleaner, more maintainable way to define agents. We strongly recommend using this approach in your CrewAI projects." — [CrewAI Agents docs](https://docs.crewai.com/concepts/agents)
+- "AutoGen is now in maintenance mode. It will not receive new features or enhancements and is community managed going forward." — [AutoGen GitHub README](https://github.com/microsoft/autogen)
+- "New users should start with Microsoft Agent Framework. Existing users are encouraged to migrate." — [AutoGen GitHub README](https://github.com/microsoft/autogen)
+- "LangGraph is a low-level orchestration framework for building, managing, and deploying long-running, stateful agents." — [LangGraph GitHub README](https://github.com/langchain-ai/langgraph)
+- "Subgraphs are useful for building multi-agent systems, reusing a set of nodes in multiple graphs, and distributing development." — [LangGraph Subgraphs docs](https://docs.langchain.com/oss/python/langgraph/use-subgraphs)
+- "Interrupts allow you to pause graph execution at specific points and wait for external input before continuing." — [LangGraph Interrupts docs](https://docs.langchain.com/oss/python/langgraph/interrupts)
+- "UserProxyAgent is a special built-in agent that acts as a proxy for a user to provide feedback to the team." — [AutoGen HITL docs](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/human-in-the-loop.html)
+
+---
+
+## Source quality flags
+
+- No significant quality issues found. All three sources are primary (official GitHub repositories, official documentation sites).
+- AutoGen's documentation is comprehensive but the project is in maintenance mode.
+
+---
+
+## Confidence: High
+
+All claims are sourced from official GitHub repositories and official documentation sites for each framework (CrewAI, AutoGen, LangGraph). The information reflects the current state as of May 2026.
+
+## Gaps and open questions
+
+- **LSP integration**: None of the three frameworks support Language Server Protocol integration. This would need to be built as a custom extension for any chosen framework.
+- **Shell access with directory permissions**: None support this natively. Shell access would require wrapping shell tools with permission checks manually in any framework.
+- **Skills system**: Only CrewAI has anything close (prompt templates, agent configuration files), but none have the directory-based markdown instruction system described in the requirements.
+- **Conflict prevention for file scopes**: No framework has built-in mechanisms for this. Would need to be implemented as a custom tool wrapper or middleware.
+- **Session management (chat forking, model switching)**: All three have basic state persistence but not the full session management features (chat forking, model switching mid-conversation).
+- **CrewAI's Allow Delegation**: When `allow_delegation=True`, agents can delegate tasks to other agents, which could include peer agents — creating implicit P2P communication. The strictness of hierarchy enforcement depends on configuration.
+- **AutoGen's successor**: Microsoft Agent Framework (MAF) is the recommended successor and may address many gaps, but it was outside this research scope.
+
+---
+
+## Tool calls made
+
+1. `webfetch` - 3 GitHub README pages (CrewAI, AutoGen, LangGraph)
+2. `webfetch` - CrewAI Processes page, Crews page
+3. `webfetch` - LangGraph overview, AutoGen tutorial, CrewAI agents, tasks
+4. `webfetch` - CrewAI flows, AutoGen HITL, LangGraph interrupts, LangGraph subgraphs (404)
+5. `webfetch` - LangGraph persistence, AutoGen state management, AutoGen selector chat, LangGraph graph API
+6. `webfetch` - LangGraph subgraphs, CrewAI memory, AutoGen models