1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
|
# Dispatch Restructure — Living Plan
> **Status:** Planning only. No implementation has begun.
> **Purpose:** Capture the target architecture, the engineering principles that
> govern it, and the current-state map — so any agent or human picking this up
> has the full picture in one place. This is a *living* document: update it as
> decisions are made and pieces land.
---
## 0. The goal in one paragraph
Restructure Dispatch so the **kernel is the absolute minimum** — just enough to
run an agent turn and host extensions — and **every feature is an extension**.
Extensions must be creatable and loadable *from outside this project* (custom /
third-party extensions), with identical contracts to the bundled ones. For now
we are planning the **backend only**; the frontend will be reworked separately
and modularly later, so **no design decision here should be driven by the current
frontend**.
---
## 1. Engineering principles (the standard for this project)
These are adopted because each solves a **specific, named problem in this
codebase** — not because they are popular. Each carries its stopping point so we
don't over-apply it.
### P1 — Feature-as-a-library
Every feature is independently importable with a clean, documented, minimal API.
The acceptance test: *can you import just this feature and use it standalone,
without dragging in the whole app?*
- **Evidence:** `agent-manager.ts` is ~2,453 lines where no single behavior
(queueing, tool-assembly, fallback) can be extracted or reasoned about in
isolation. By contrast `chunks/transform.ts` is deliberately DB-free so the
backend *and* frontend share the same pure logic — feature-as-a-library done
right, already in the repo.
- **Stopping point:** Do **not** over-split into dozens of micro npm packages
with version-skew and `package.json` ceremony. Internal import-cleanliness
first; a separately *publishable* package only when there's a genuine outside
consumer.
### P2 — Functional core / imperative shell
Pure *decision* logic ("given this state + event, what should happen?") as pure
functions; the actual I/O (shell, fs, LLM, SQLite) lives in thin adapters
**injected** at the edges.
- **Evidence:** `wake-scheduler.ts` already does this and says so: "Pure helpers…
side-effect-free so the logic can be unit-tested without spinning up Hono or
touching SQLite." The giant `vi.mock("@dispatch/core")` blocks in
`agent-manager.test.ts` exist *because* effects are reached for instead of
passed in.
- **The honest framing:** An agent system *is* side effects — running shell,
writing files, calling the LLM are the product. The goal is **testability and
predictability, not purity for its own sake.**
- **Stopping point:** Where separating decision from effect makes a unit
obviously testable, do it. Where it would only add ceremony (DI containers,
effect-wrapper types) around an unavoidable `await spawn(cmd)`, don't. Purity
is a means; if it stops paying for itself, drop it.
### P3 — No ambient / hidden state
State is **owned and passed explicitly**, never reached for as a hidden global or
stateful singleton.
- **Evidence:** Wishlist bugs #16 ("agent tools leak across tabs") and #17
("agent/model setting changes on tab switch") are *caused by* shared mutable
singletons / frontend-held state. Explicit per-tab state ownership fixes them
structurally.
- **Stopping point:** Stateless classes-as-namespaces are fine. Stateful
god-objects (today's managers) are the thing we're killing. The tool-set for a
turn must be reproducible from `(agent profile + capabilities + active
extensions)` — pure input → output.
### P4 — Don't adopt by reputation (meta-principle)
Every pattern, library, or methodology — **including the "minimal kernel +
extensions" architecture itself** — earns its place by solving a specific,
named problem in *this* codebase, and we note where it stops paying off. "It's a
known good practice" is a hypothesis to test, not a justification.
### P5 — The repo is a harness, not just code
Meta-information that guides future agents is a **first-class deliverable**,
maintained like code. Modeled as a *tiered cache* of context: small
always-loaded files + larger on-demand files, so an agent gets the right info at
the right moment without burning context.
(Source: "The AI Harness" — see §7. Bounded to our scale in §7.4.)
### P6 — Document only the non-inferable
Harness docs contain **tribal knowledge and scar tissue only** — never generic
best-practice the model already knows. Test: *"Could a fresh frontier model
figure this out by reading the code? If yes, leave it out."*
(This is P4 applied to documentation — it self-limits harness bloat.)
### P7 — The harness is extension-scoped
Every extension ships **its own** constitution snippet, safety rules, feature
doc, glossary terms, and skills — portable with the code. This is P1
(feature-as-a-library) applied to documentation: import the extension, get its
harness too. Better than a repo-global harness for a modular system.
### P8 — One canonical vocabulary
A `GLOSSARY.md` with an **"aliases to avoid"** column governs naming. New code
reuses existing terms; it never invents a synonym for an existing concept.
- **Evidence:** This codebase overloads **tab / session / conversation** and
**chunk / message / turn / step** — the chunk-log refactor notes exist
precisely because those terms got tangled.
- **Live application:** "core" now has a precise meaning (the extension tier in
§2.6) — it must NOT be reused for the kernel. Kernel ≠ core.
---
## 2. Target architecture — minimal kernel + extensions
### 2.1 Layered picture
```
┌───────────────────────────────────────────────────────────────┐
│ Clients (any frontend — reworked later, out of scope now) │
└───────────────────────────────────────────────────────────────┘
▲ typed events / commands (via a transport extension)
┌───────────────────────────────────────────────────────────────┐
│ STANDARD extensions (the features people think of as Dispatch)│
│ tools (read_file, run_shell…) · agents · skills · lsp · │
│ compaction · notifications · scheduler · attachments · … │
└───────────────────────────────────────────────────────────────┘
▲ depend on kernel + core (never upward)
┌───────────────────────────────────────────────────────────────┐
│ CORE extensions (minimum glue to run ONE turn end-to-end) │
│ transport · provider · auth · session-orchestrator │
└───────────────────────────────────────────────────────────────┘
▲ register contributions ▲ receive Host API
┌───────────────────────────────────────────────────────────────┐
│ KERNEL (minimal; not an extension) │
│ │
│ Extension Host Agent Runtime Event/Hook Bus │
│ (discover/resolve/ (the turn loop, (typed pub/sub │
│ activate/registries) provider+tool + filters) │
│ agnostic) │
│ │
│ Kernel Services (exposed through Host API): │
│ • Capability/Permission gate • Config (merge + schema) │
│ • Storage + migration runner • Secret/credential vault │
│ • Conversation/chunk store • Logger │
│ │
│ Contracts (the stable ABI every extension compiles against) │
└───────────────────────────────────────────────────────────────┘
```
### 2.2 The Kernel — the "absolute minimum"
Five things, nothing more:
1. **Contracts (the stable ABI).** The only types extensions depend on, versioned
independently from implementations. Seeded from today's `types/index.ts`:
- `ToolContract` (today's `ToolDefinition`: `{ name, description, parameters,
execute(args, ctx) }`) — see §3.3 for the `ctx` requirements concurrency forces.
- `ProviderContract` (model factory + streaming + catalog/capability entries)
- `AuthContract` (credential sources / OAuth flows feeding the vault)
- `Extension` + `Manifest` (id, version, apiVersion range, deps, activation,
contributions, capabilities)
- `HostAPI` (what an extension receives on activate — see §2.3)
- `Hook`/event taxonomy (the lifecycle surface)
- Conversation model (`ChatMessage`, `Chunk`, turn/step)
2. **Extension Host.** Discover → validate manifest → resolve dependency DAG →
check apiVersion compat → run migrations → activate (topological) → register
contributions → dispose on shutdown/reload. Owns the **registries** (tools,
providers, hooks, routes/commands, services, settings, migrations, jobs).
3. **Agent Runtime (the turn loop).** The refactored heart of today's `agent.ts`:
takes *a resolved provider + a tool set + messages + a dispatch policy*,
streams, dispatches tool calls (see §3.3), dedups, truncates/spills, emits
events. **Provider-agnostic and tool-agnostic** — knows only the contracts.
Names no concrete tool or provider.
4. **Event / Hook Bus.** Typed pub/sub plus *filters*:
- **Observers** react (notifications, persistence, usage accounting).
- **Filters** transform in a chain (system-prompt assembly, message pre-send,
tool-result transform, tool-set filtering).
5. **Kernel Services (via Host API).** The kernel exposes *interfaces* and pure
logic here — **never concrete I/O backends** (those are `core` extensions; see
§2.8). This keeps "kernel touches no I/O" (§2.7) literally true.
- **Config loader** — merged loader (global → project) + per-extension
settings schema/validation. **Must be in the kernel** (not an extension):
it's needed at boot to *find and resolve* extensions — a chicken-and-egg the
extension system itself can't solve. Seeded from today's `config/`.
- **Logger** — always-on, available before any extension activates.
- **Permission rule *evaluation*** — the pure `evaluate(rules, request) →
decision` function (today's `permission/evaluate.ts`): rules in, decision
out, no I/O. The *interactive prompting* (asking a human, today's
`permission-manager.ts`) is a transport/UI concern owned by a `core`/
`standard` extension, not the kernel.
- **Storage interface + migration runner** — the kernel defines the storage
*contract* (namespaced KV/SQL + per-extension migration registration) and
exposes `host.storage(ns)`, but the **concrete backend (SQLite) is a `core`
extension** (`storage-sqlite`), swappable for an in-memory store in tests
(serves P2 directly). Bootstrap ordering: the storage backend activates
first (no deps) so later extensions can run their migrations.
- **Secret/credential vault interface** — `host.secrets` (capability-gated);
the concrete store and the *auth flows* that fill it are extensions.
- **Conversation/chunk store** — NOTE: the kernel owns only the **conversation
model TYPES** (`Chunk`/`ChatMessage` in contracts) and the pure
explode/group transforms (today's DB-free `chunks/transform.ts`). The
**persistent store itself is a `core` extension** built on `host.storage` —
because persistence is I/O. The runtime reads/writes history *through the
orchestrator*, which calls the store; the kernel's `runTurn` takes
`messages` as a plain input and returns result messages (it never touches
the DB).
> **Deliberately NOT in the kernel:** any concrete tool, any provider, any
> concrete persistence/secret backend, the persona/system-prompt text, the HTTP
> server, interactive permission prompting, tab/queue orchestration, sub-agents,
> skills, LSP, notifications, compaction, scheduling.
### 2.3 The extension model
- **What it is:** a directory or npm package with a **manifest** + entry module
exporting `activate(host)` and optional `deactivate()`.
- **Manifest shape:** `id, name, version, apiVersion (semver range), dependsOn[],
activation ("eager" | lazy event triggers), contributes {tools, providers,
routes, commands, hooks, settings, migrations, scheduledJobs, services},
capabilities {fs, shell, network, secrets, db, spawn…}, settingsSchema`.
- **Each extension's contract is two-sided (provides + expects):** what it
*exposes* (its contributions/services) and what it *expects exposed to it*
(its `dependsOn` services + `capabilities`). This two-sided contract is what
the host uses to resolve load order and what makes an extension portable.
- **Host API (what `activate(host)` receives):**
- `host.defineTool/defineProvider/defineAuth(...)`
- `host.defineRoute/defineCommand(...)` — for transports & UI actions
- `host.on(hook, handler)` / `host.addFilter(hook, fn)`
- `host.provideService(handle, impl)` / `host.getService(handle)` — typed DI
via **typed service handles** (an exported symbol, NOT a raw string — so
`lsp references` can compute a service's consumers; see §5)
- `host.storage(namespace)` — scoped KV/SQL + migrations (interface; backed
by the `storage-sqlite` core extension — see §2.8)
- `host.config` / `host.settings`
- `host.secrets` (capability-gated)
- `host.permissions.check(request)`
- `host.events.emit(...)` / `host.logger`
- `host.scheduler.register(job)`
- **Contribution points** (replacing today's wiring):
| Point | Replaces today's | Examples |
|---|---|---|
| tools | per-turn assembly in `agent-manager` | read_file, run_shell, web_search |
| providers | `llm/provider.ts`, `models/registry` | anthropic, opencode, google |
| auth | `credentials/*` | claude OAuth, api-keys |
| context filters | `buildSystemPrompt`, skills/agents injection | persona, skills, agent profiles |
| hooks/observers | scattered wiring | notifications, usage accounting |
| routes/commands | `api/routes/*` | `/chat`, `/tabs`, `/models` |
| scheduled jobs | `wake-scheduler.ts` | cache-warm, wake probes |
| migrations | `db/index.ts` table block | each extension owns its tables |
| services | implicit singletons | LSP manager, model registry |
- **Loading / lifecycle:** search paths (precedence high→low) =
project `.dispatch/extensions` → global `~/.config/dispatch/extensions` →
installed npm packages (naming convention) → bundled first-party. Resolve DAG →
verify apiVersion → run migrations → activate topologically (lazy ones defer to
their activation event) → ready. Hot-reload via watchers (config already does
this); deactivate disposes everything the extension registered.
### 2.4 Extension catalog (current code → extensions, with tier)
- **core tier (the minimum to complete one turn — see §2.8):**
`storage-sqlite` (concrete backend behind `host.storage`), `conversation-store`
(append-only turn/chunk persistence on top of `host.storage`; today's
`db/chunks.ts` + `db/tabs.ts`), `transport` (accept message, stream events —
HTTP/WS, or even stdio), `provider-×1` (one LLM provider), `auth-×1` (that
provider's credentials), `session-orchestrator` (the turn-driver carved out of
`agent-manager.ts`).
- **standard tier — tools:** `tools-fs` (read_file, read_file_slice, write_file,
list_files), `tool-shell` (run_shell + background store + shell-analyze),
`tool-search` (search_code), `tool-web`, `tool-youtube`, `tool-todo`,
`tool-key-usage`.
- **standard tier — providers & auth beyond the minimum:** `provider-anthropic`,
`provider-opencode`, `provider-google`, `provider-copilot`; `auth-claude`
(OAuth), `auth-apikeys`, `models-catalog` (registry + capabilities). *(Note:
the single provider/auth required to boot is "core"; additional ones are
"standard". Which specific one is the core default is a §8 decision.)*
- **standard tier — subsystems:** `lsp` (manager service + `lsp` tool +
diagnostics-on-write filter), `agents` (sub/user-agent system + `summon`/
`retrieve`), `skills` (loader + context-filter), `session-features` (tabs,
queue, deliverMessage, auto-wake budget, `send_to_tab`/`read_tab` — the parts
beyond the minimal orchestrator), `compaction`, `notifications-ntfy`,
`wake-scheduler`, `attachments` (multimodal validation/limits).
> Result: **`agent-manager.ts` dissolves** into the kernel's turn loop + the
> core `session-orchestrator` + standard-tier contributions.
### 2.5 Proposed package layout
```
packages/
kernel/ # the kernel ONLY (NOT named "core" — see P8 / §2.6)
contracts/ # the KERNEL ABI ONLY (turn loop, HostAPI, hook/event
# mechanism, conversation model) — versioned.
# Per-extension contracts are NOT here — they live
# co-located in each extension package (see §5).
host/ # discovery/resolve/activate + registries
runtime/ # the agent turn loop (incl. tool dispatch, §3.3)
bus/ # events + filters
services/ # config loader, logger, permission eval, storage IFACE + migration runner, secrets IFACE
extensions/
core/ # core-tier: storage-sqlite, conversation-store, transport,
# provider-×1, auth-×1, session-orchestrator
standard/ # standard-tier: tools, agents, skills, lsp, compaction, …
# each extension package owns its OWN contract
# (what it exposes/requires + its hook & service
# handles) co-located inside it — see §5
host-bin/ # thin bootstrapper: make kernel, point at ext dirs, activate
sdk/ # helper toolkit + types for THIRD-PARTY ext authors
frontend/ # reworked later
```
### 2.6 Tiers: kernel → core → standard
We classify extensions into tiers. **Tiers are labels over the dependency DAG,
not a second enforcement mechanism** — the host resolves load order from each
extension's declared deps, and the capability gate enforces access. Tiers
describe *what ships in which distribution*.
| Tier | Objective test | Distribution |
|---|---|---|
| **kernel** | the ABI + turn loop; *not* an extension | always |
| **core** | required to complete one turn end-to-end | "minimal Dispatch" |
| **standard** | ships on by default; defines Dispatch-as-known | "default Dispatch" |
| *(external)* | not in this repo | community / custom |
- **No "extras" tier yet.** Empty categories are over-planning. A fourth tier
(bundled-but-off-by-default) earns existence only when a real feature is
genuinely opt-in — not by demoting an existing feature to fill a slot.
- **The one invariant that gives tiers teeth — no upward dependencies.** A `core`
extension may depend on the kernel and other `core` extensions, never on
`standard`. Checkable straight from manifests (a lint). This is what makes
"the minimal distribution still boots" *true* rather than aspirational.
- **Naming (P8):** "core" is the extension tier; the runtime primitive is the
**kernel**. Never reuse "core" for the kernel.
**Placement test in action — `read_file` is `standard`, not `core`.** Apply the
test: remove `read_file` → the agent just replies with text; the turn still
completes. So it fails the core test → it's `standard`. The surprise that
validates the model: **tools are not the minimum.** A turn can happen with zero
tools. `read_file` being *important* is why it ships on-by-default in `standard`
— not why it's `core` (resisting "important ⇒ core" keeps `core` from regrowing
into a god-object; P4).
### 2.7 Kernel vs core boundary + how a tool plugs in
**Boundary rule (one sentence):**
> **Kernel = the pure turn mechanism** (decides nothing, touches no I/O, names no
> feature). **Core = the minimum glue** that wires real inputs into that
> mechanism and handles the results — opinionated and effectful, which is exactly
> why it can't live in the kernel.
**Example — the `session-orchestrator` (core), carved out of `agent-manager.ts`:**
```ts
host.on("message.received", async (msg) => {
const conversation = await host.conversation.load(msg.tabId); // effect: read state
const provider = host.providers.resolve(msg.model); // decision: pick LLM
const tools = host.tools.resolveFor(msg.tabId); // decision: gather/filter
const dispatch = resolveDispatchPolicy(msg); // decision: §3.3 toggle
const result = await kernel.runTurn({ // ← call the kernel
provider, messages: conversation.messages, tools, dispatch,
emit: host.events.emit,
});
await host.conversation.append(msg.tabId, result.messages); // effect: persist
});
```
Every line is a **decision** (which provider/tools/policy) or an **effect**
(load/persist) — neither belongs in the kernel.
**How a tool builds "on top of" the kernel (inversion of control).** The kernel
never *finds* tools; it *receives* them. The dependency arrow points
tool → contract → kernel, never the reverse:
1. A tool conforms to `ToolContract` (owned by the kernel) — importing only the
contract, not the kernel internals or other tools.
2. It registers at activation: `host.defineTool(createReadFileTool(workdir))`.
3. The orchestrator gathers them: `host.tools.resolveFor(tabId)`.
4. They're handed into `runTurn`, which calls them blindly by shape
(`byName.get(call.name).execute(...)`). The kernel never knows `read_file`
exists. 0, 1, or 50 tools — the loop is identical.
### 2.8 The Minimum Viable Turn (what "core" must contain)
Derived by tracing the **real** end-to-end path of a single message in today's
code — `POST /chat` → `deliverMessage` → `processMessage` → `getOrCreateAgentForTab`
(`new Agent`) → `for await (event of agent.run())` → `emit(event)` → `/ws`
fan-out — and stripping everything not load-bearing.
**Two readings of "send a message, get a response":**
- **(A) Absolute minimum mechanism** — one stateless request→response; needs *no
DB at all*. (Useful as the testing/embedded floor.)
- **(B) Minimum useful chat** — real multi-turn, so turn 2 sees turn 1. Adds
conversation persistence.
**DECIDED: `core` targets (B).** "Minimal Dispatch" is a usable multi-turn chat.
The single piece separating (B) from (A) is the **conversation store + storage
backend** — drop those two and you have the stateless (A) floor (which is exactly
the in-memory test configuration).
**Stripped from the real path → all of these are `standard`, NOT core** (each
confirmed removable without breaking a basic turn): key/model **fallback chain**
(`buildFallbackSequence`, rate-limit retry), **tools** entirely (empty tool list
→ turn still completes as text), **interactive permission prompting** (only
exercised *by* tools), **reasoningEffort / attachments / workingDirectory**
overrides, **skills, agents/summon, lsp, notifications, compaction, queue /
auto-wake, usage telemetry, prompt-cache warming**, and the system-prompt
**TOOL_DESCRIPTIONS + task-management** assembly (minimal = a plain/empty system
string). This concretely confirms §2.6's surprise: **tools, persona, and
permissions are all riders — the turn loop needs none of them.**
**KERNEL exposes (for the minimal turn):**
| Thing | Why kernel | From today |
|---|---|---|
| Contracts (ABI): `ProviderContract`, `ToolContract`, `AuthContract`, `Extension`/`Manifest`, `HostAPI`, event taxonomy, conversation model (`Chunk`/`ChatMessage`) | shared types everything compiles against | `types/index.ts` |
| Extension Host + registries | nothing runs without discover/resolve/activate | (new) |
| `runTurn({ provider, messages, tools, dispatch, emit, signal })` | the pure turn loop (§3.3); takes `messages` as input, returns result messages, touches no DB | `agent.ts` |
| Event bus | how the turn talks to the outside | `onEvent`/`emit` |
| Config loader | needed at boot to find extensions (chicken-and-egg) | `config/` |
| Logger | always-on, pre-extension | — |
| Permission rule *evaluation* (pure) | rules in → decision out | `permission/evaluate.ts` |
| `host.storage` / `host.secrets` *interfaces* | exposes the shape; backend injected | — |
**CORE provides (the minimum extensions to complete one turn):**
| Extension | Job on the minimal path |
|---|---|
| `storage-sqlite` | concrete backend behind `host.storage` (the (A)↔(B) piece; swap for in-memory in tests) |
| `conversation-store` | append-only turn/chunk persistence on `host.storage` (so turn 2 sees turn 1) |
| `transport` | accept the message; stream events back (HTTP/WS, or stdio) |
| `provider-×1` | call an LLM and stream tokens |
| `auth-×1` | supply that provider's credentials |
| `session-orchestrator` | wire it together (below) |
**The minimal turn, end to end (target):**
```
transport.receive(msg)
→ orchestrator: history = conversationStore.load(convId) // core (skip → (A) stateless)
→ orchestrator: provider = providers.resolve(model) // core ext + auth
→ kernel.runTurn({ provider, messages: [...history, msg], tools: [], dispatch, emit })
→ emit(events) → transport.stream(events) // core ext
→ orchestrator: conversationStore.append(convId, result) // core ext
```
Note `tools: []` — a turn completes with zero tools (text reply). Every capability
beyond this is a `standard` extension that contributes tools / filters / hooks.
### 2.9 Contract versioning (convention now, machinery deferred)
**Reframe first (P4):** semver's machinery exists to coordinate **independent
release timelines** (a producer ships v2; consumers upgrade whenever). That
*temporal decoupling* is the problem it solves — and we mostly don't have it:
- **Internal extensions** (bundled, in-repo): no decoupling. A contract change is
found via `lsp references` (§5.3) and fixed atomically in one change set. **The
type system IS the version check** — a breaking change is a compile error.
- **External/custom extensions** (out-of-repo): decoupling is real — the compiler
can't see their code. A declared version compatibility gate earns its place
**only here.** *(And we don't support external extensions yet — see below.)*
So versioning is **asymmetric**, like §3.6 / §3.7: *internal = the type system is
the version; external = a declared version is the contract.*
**Two different "versionings" — keep them separate:**
- **Data/schema migration** (persisted-data evolution) — already decided (§2.2:
each extension owns its migrations). NOT this section.
- **Contract/API-surface versioning** — this section. Independent: a contract can
change with no migration, and vice-versa.
**DECISION — convention-only and dormant in 0.x.** Because everything is
**developed in-house today** (no external extensions), we adopt the *vocabulary*
of versioning, not the *bureaucracy*:
- **Every package self-versions.** No enforced lockstep / single repo version:
the kernel bumps when the ABI changes; an extension bumps when *its* contract
changes. Independent versioning matches one-agent-per-unit (§5) — each owner
manages its own.
- **Semver *meaning* as disciplined changelog hygiene** (and the §5.3 fan-out
signal), using the standard terms:
- **major** — removing or modifying the contract surface (incl. a hook/service
payload shape change). *Breaking.* This bump is the orchestrator's cue to fan
out to **all** consumers (found via `lsp references`).
- **minor** — adding to the contract surface. Existing consumers unaffected.
- **patch** — internal change only; no surface/payload change.
- **Right now the version is COMMUNICATION, not ENFORCEMENT.** With no external
consumers, the type system + `lsp references` are the actual mechanism; the
number is a changelog/fan-out signal for humans and agents — not load-bearing.
- **Stay in `0.x`** (conventionally: "no stability promised") through the rewrite,
while the ABI churns. `1.0.0` is reserved for "stable enough to invite external
extensions" — and **that** decision is the trigger to build the deferred
machinery below. We worry about it when we get there, not before.
**Deliberately NOT built now (deferred until external extensions exist):**
- A load-time **version-compat gate** (external manifest pins an `apiVersion`
range; host disables+surfaces on mismatch per §3.7 fault containment).
- A mechanical **`.d.ts`-surface-diff** in CI to flag breaking changes
automatically (removes semver's human-judgment weakness).
**Harness rule this generates (scoped to contract-defining agents only; written
into agent files when those agents exist, not now):** "Follow semver on your
contract: **major** = removed/renamed/retyped export or changed hook/service
payload (and signals the orchestrator to fan out to all `lsp references`
consumers); **minor** = additive; **patch** = no surface change. Internal
consumers are caught by the compiler — the version is for the fan-out signal (and,
later, external consumers)." *(The term "patch" is training-standard vocabulary,
so it needs no glossary entry — P6.)*
---
### 2.10 Core-default provider/auth (the boot minimum + primary testbench)
**Criterion (not "best provider" — leanest, most-testable core per §2.8/§3.6):**
the one provider+auth that makes "minimal Dispatch" boot with the smallest auth
surface and the lightest test setup.
**DECISION: OpenAI-compatible provider + API-key auth is the core default** —
`provider-openai-compat` + `auth-apikey`. This is *also* the primary testbench:
**OpenCode Go (flash) IS this path.**
- In today's code it is `createProvider`'s **default branch**
(`createOpenAICompatible`, name `"opencode-zen"`) with the hardcoded defaults
`model: "deepseek-v4-flash"`, `baseURL: "https://opencode.ai/zen/go/v1"`, and a
plain **API key** — the simplest possible `AuthContract`.
- **Why it's the right core default (grounded, P4):**
1. **Simplest auth = leanest core.** `apiKey` + `baseURL`, nothing else. Claude
OAuth (token refresh, billing/beta headers, session id, account discovery)
would bloat the *minimum* tier and contradict §2.8.
2. **Most generic contract shape.** OpenAI-compatible is a near-universal wire
format (dozens of providers + local Ollama/LM Studio), so the core's one
provider is really "the protocol most of the world implements."
3. **Already the literal default** in `createProvider` — core encodes a decision
the codebase already made.
4. **Best for §3.6 testability.** API-key auth fakes trivially (a string + a
base URL at a mock server); OAuth would force token-refresh mocking — the
exact mock-sprawl we're fighting.
- **Project fit (the deciding constraint):** the two available subscriptions are
**Claude** and **OpenCode Go**. OpenCode Go has the most generous limits/API
(especially the **flash** agents) → it is the **primary test bench**. The lean
core default and the testbench are therefore the *same* path — no tension.
**Tier placement that follows:**
- **core:** `provider-openai-compat` + `auth-apikey` (boots minimal Dispatch; =
OpenCode Go flash via `/zen/go/v1`).
- **standard:** `provider-anthropic` + `auth-claude` (OAuth — your daily driver,
rides on top), plus the **Anthropic-format OpenCode Go models** (MiniMax/Qwen
via `isOpencodeGoAnthropicModel`, a different endpoint than flash),
`provider-google`, `provider-copilot`, etc.
- Mirrors every prior decision: the rich/preferred providers ride on top as
standard extensions; core proves the architecture with the simplest path.
**Naming (P8):** `provider-openai-compat`, `auth-apikey` — descriptive,
training-adjacent; no glossary entry needed.
---
## 3. Runtime flow
### 3.1 Boot
1. Host process starts kernel with config + extension search paths.
2. Kernel opens DB, loads merged config, builds the capability gate.
3. Extension host discovers manifests → resolves DAG → checks apiVersion → runs
migrations.
4. Activates extensions topologically; each registers tools / providers / hooks /
routes / services / jobs.
5. `transport-http` listens; `session-orchestrator` subscribes to message intake;
scheduler arms jobs. Ready.
### 3.2 A turn
1. Inbound message hits a `transport` route → emits `message.received`.
2. `session-orchestrator` resolves conversation, working dir, the
**provider+model+key** (provider registry + auth vault), the agent profile,
and the **tool-dispatch policy** (§3.3).
3. **Context-assembly filter chain** runs: persona + skills + agent profile
contribute system prompt and a tool-name filter.
4. Tool set = tool registry filtered by the **capability gate** + agent whitelist.
5. **Agent runtime loop:** `provider.stream(messages, tools)` → dispatch tool
calls per the policy (§3.3) → gate check → `tool.before` filter → execute
(exec context: shell-output streaming, cancellation, queued-message
injection) → `tool.after` filter → feed results back; repeat until done.
6. Events stream on the bus → transport pushes to clients; `notifications`
reacts; conversation store appends chunks; usage recorded.
7. `turn.sealed` hook → `compaction` may trigger; scheduler may schedule
cache-warm.
### 3.3 Kernel internals — tool dispatch (togglable: `maxConcurrent` + `eager`)
**Mechanism.** The model streams tool calls *incrementally*: each `tool-call`
event is fully formed (parsed `input`) **before** the step's `finish-step`. So
the kernel can launch a call the moment it arrives. Tool calls batched in one
step are **independent by construction** — the model sees no result until the
next step — so running them concurrently/eagerly is *semantically safe*, not a
reordering risk.
**Today (for contrast):** `agent.ts` collects all `tool-call`s during the stream,
then executes them **after** the loop, **sequentially** (`for … await execute`).
That is `{ maxConcurrent: 1, eager: false }` — the safe baseline we keep available.
**Two orthogonal axes — the toggle.** A single enum conflated two independent
questions; we split them so every combination is coherent (no invalid states):
- `maxConcurrent` (a number) = *how many tools run at once*: `0` → unlimited,
`1` → sequential (a concurrency limit of 1 is exactly serial), `2+` → that cap.
- `eager` (a boolean) = *when execution starts*: `true` → launch each call the
instant its `tool-call` streams in (overlaps with the rest of generation);
`false` → wait until the step's `finish-step`, then dispatch the batch.
| `maxConcurrent` | `eager` | Meaning |
|---|---|---|
| 1 | false | One at a time, after the stream ends → **previous (pre-rework) behavior** |
| 1 | true | **DEFAULT.** Start the first tool the instant it arrives (overlap with generation), but never run two tools at once — safe for any tool |
| 2+ | false | Up to N in parallel, after the stream ends |
| 2+ | true | Up to N in parallel, launched as they stream in |
| 0 | false | All in parallel, after the stream ends |
| 0 | true | All in parallel, launched as they stream in |
**The policy is a KERNEL INPUT, never ambient (P3):**
```ts
interface ToolDispatchPolicy {
maxConcurrent: number; // 0 = unlimited, 1 = sequential, 2+ = cap
eager: boolean; // true = launch on arrival; false = after finish-step
}
runTurn({ provider, messages, tools, dispatch /* : ToolDispatchPolicy */, emit })
```
The kernel receives a *resolved* policy; it never reads config itself.
**`eager` + a limit — exact semantics.** A streaming semaphore: launch on arrival
until `maxConcurrent` is reached, then queue; as each tool finishes, the next
(queued or newly-arrived) call starts. Well-defined for every combination above.
**Resolution (who sets it)** — mirrors the existing `reasoningEffort` precedence:
per-turn/tab override → agent definition → global config (`dispatch.toml`) →
built-in default. The `session-orchestrator` (core) resolves this and hands the
final value to the kernel.
**Default — DECIDED: `{ maxConcurrent: 1, eager: true }`.** Never two tools at
once (safe for any tool, incl. non-concurrency-safe ones), yet still overlaps the
first tool's execution with the rest of generation — zero risk, free latency.
Raising `maxConcurrent` (e.g. 4) is the opt-in throughput win; `0` (unlimited) is
a deliberate, footgun-aware opt-in (see complication #2).
**Contract requirements this forces (must be in `ToolContract`/`ctx` on day one
— retrofitting later is painful):**
- `ctx.onOutput(data, stream)` — streaming output the **kernel attributes by
`toolCallId`**, so concurrent shell output doesn't interleave ambiguously
(today's `shell-output` event carries no id — fine only because exec is
sequential).
- `ctx.signal` — cancellation, so an aborted turn doesn't leak in-flight tool
work.
- **`execute` must be safe to run concurrently** with other tools (no shared
ambient state — this is just P3 paying off).
**Optional refinement (note, don't build yet):** a tool may declare
`concurrencySafe: false` in its contract; the kernel serializes *those* even when
`maxConcurrent` allows parallelism — so one mutating tool doesn't force the whole
batch sequential. This overrides the global setting **downward only** (never
widens parallelism).
**Complications checklist (carried from today's sequential code):**
1. **Shell-output attribution** → tag by `toolCallId` (above).
2. **Concurrency cap + dedup** → bound parallelism; populate the byte-identical-
call dedup map in emission order (the "150 identical calls" incident — do not
fire 150 effects at once). `maxConcurrent: 0` (unlimited) re-opens this
footgun for *distinct* calls, so it must stay a deliberate opt-in, never the
default.
3. **User-interrupt injection** → target the last call by **batch index**, not
completion time (results return nondeterministically under concurrency).
4. **Abort / error cleanup** → await or cancel in-flight tools via `ctx.signal`;
synthesize residual results for orphaned tool-call IDs (today's safety nets).
5. **Wasted effects on abort** → eager exec may complete a side-effecting tool
(`run_shell`) *before* an abort; the effect already happened, result
discarded. Accepted consciously for non-idempotent tools.
**Scope boundary.** This is **within a step's batch only**. Next-step tools can't
start early — they don't exist until the model sees this step's results. So
"before the turn ends" = "across the multiple tool calls in one step," which is
exactly the multi-tool-call case.
### 3.4 State, durability & crash recovery
**The worry (context):** a chat must survive *any* interruption — random shutdown,
token exhaustion, tool error — and the user just resumes with the same history,
never facing a "wipe it clean and start over" broken state.
**What today's code already gets right (keep this):**
- `appendChunks` wraps a whole turn's rows in **one SQLite transaction** + WAL →
**atomic**: a hard crash mid-write yields *all* those rows or *none*. No half
rows, no DB corruption. This is the most important property and it already holds.
- History is an **append-only chunk log** keyed by monotonic per-tab `seq`. Prior
history is never mutated, so a crash can't corrupt what's already written.
**The real danger window (what to fix):** the whole assistant turn is accumulated
**in memory** (`chunks: Chunk[]`) and written **once at the end** (`flushAssistant`
on seal). A mid-turn crash loses the *entire* assistant turn. Two latent issues
compound it:
1. **Orphaned `running` status** — `status` is persisted to `tabs`; a crash leaves
it `running` forever (no boot reconciliation resets stale `running → idle`).
2. **Orphaned tool-call IDs** — a crash between an assistant `tool_call` and its
`tool_result` leaves a dangling call. Anthropic **rejects** such a history
(`MissingToolResultsError`). Today's `synthesizeResidualToolResults` guards
this *in memory* only — useless once the process is dead. **This is the exact
"history the provider refuses to accept → start over" failure.**
```
user message ──► [persisted immediately ✓]
│
├─ assistant streams text/thinking/tool-calls ──► accumulates IN MEMORY ONLY
│ (50 steps, tool runs, minutes…)
│ ◄── CRASH HERE ──► entire assistant turn GONE; maybe a dangling tool_call
│
└─ turn seals ──► flushAssistant() ──► [persisted ✓]
```
**The design — make broken state *unreachable*, not just recoverable.** Four
rules, each tied to a real failure above:
- **R1 — Persist incrementally, append-only (kill the in-memory window).** Write
each step (not each delta) to the log as it completes, in its own transaction.
A crash then loses at most the *last in-flight step*, not the whole turn.
Granularity = per **step**, not per **delta** (a handful of writes per turn, not
hundreds) — keeps IO modest. Make granularity configurable.
- **R2 — Recovery is a pure function of the log (the keystone).** On load, run a
pure **`reconcile(rows) → cleanHistory`** that deterministically repairs any
partial turn:
- `tool_call` with no matching `tool_result` → synthesize an error result
("interrupted by shutdown"). This is today's `synthesizeResidualToolResults`
logic **moved to the READ path** so it runs on *every* load, not just live.
- a turn with no terminal assistant content → mark interrupted; user simply
sends the next message to continue.
- **Functional-core (P2):** rows in → clean history out, no I/O, exhaustively
unit-testable with crafted "crash-shaped" inputs. **Guarantee: whatever a
crash leaves, `reconcile` always yields a provider-acceptable history.**
"Broken state" becomes a state the rest of the system never observes — it's
repaired at the boundary.
- **R3 — Status is derived, never authoritative.** A persisted `running` flag is a
lie waiting to happen. On boot, sweep all `running → interrupted`; AND treat
live status as runtime-only (derive "is this tab live?" from "is there an
in-process turn driving it?"). A crash can't leave a tab stuck running.
- **R4 — Resume = load → reconcile → continue.** Because history is append-only
and `reconcile` guarantees validity, resuming after *any* failure is identical
and invisible to the user — no special "recovery mode". Token-exhaustion and
tool-errors already end the turn cleanly and persist (the error becomes a
chunk), so they are *already* resumable once R1 closes the crash window.
**Where it lives (fits the architecture):** almost entirely in the
`conversation-store` **core extension** (R1 incremental write, R2 reconcile-on-load)
+ a tiny **boot sweep** (R3). The **kernel stays pure** — `runTurn` still just
takes `messages` and emits events; it knows nothing about crashes. `reconcile` is
the canonical **functional-core** unit (P2) and the highest-value test target in
the system (feed it every crash shape).
**Cost / boundary (P4):**
- R1 trades IO for safety (more, smaller transactions vs. one-per-turn — the
current code chose one-fsync-per-turn for "constrained backends"). Per-step
batching is the mitigation; granularity configurable.
- **Out of scope here:** resuming a half-finished assistant message *mid-sentence*
(wishlist #1 "resume mid-generation" — needs in-flight streaming state). The
promise here is narrower and is what's actually wanted: **the history is never
broken, and the user can always continue the conversation.** Mid-stream
resumption can build on this foundation later.
### 3.5 The hook system (extensible without prediction)
**The goal:** features react to actions in other features (e.g. *"user sent a
message → reset the cache-warming timer"*). Hooks must be **part of the
contracts** (typed, stable, exposed) *and* **easy to add later** without
predicting features that may never exist. Those only conflict if hooks live in a
central kernel registry — so they don't.
**What today's code already does (the patterns to generalize):**
- **Observer stream.** `NotificationDispatcher` depends not on `AgentManager` but
on a minimal interface — `interface AgentEventSource { onEvent(listener):
() => void }` — and wraps every handler so *"a transport bug can never
propagate into the agent loop."* That's already a primitive hook contract
(subscribe → react → unsubscribe, errors isolated).
- **Semantic lifecycle calls (a hook in disguise).** Cache-warming exposes
`onUserMessage(tabId)` (cancel timer) and `onTurnEnded(tabId)` (re-arm),
*called explicitly* from `tabs.svelte.ts`. Hand-wired coupling we want to
dissolve into subscriptions.
**The keystone decision — decentralized hook catalog:**
> The **kernel owns the hook *mechanism*** (`emit`, `on`, `applyFilters`). Each
> **extension declares the hooks it emits** as part of its own contract. The hook
> catalog is the *union* of all extensions' declarations — never a central list.
The kernel never enumerates "the hooks that exist." This is what makes "add a
hook as required" a **local, additive** change instead of a kernel edit.
**The typed descriptor (the contract surface).** A hook is an exported, typed
descriptor — not a loose string:
```ts
// owned by the session-orchestrator (it performs message intake)
export const MessageReceived = defineHook<{ tabId: string; text: string }>("session/message.received");
// owned by the KERNEL (it owns the turn loop)
export const TurnSealed = defineHook<{ tabId: string; turnId: string }>("kernel/turn.sealed");
```
Consumers get full type inference, no central enum to edit:
```ts
// cache-warming extension (dependsOn session-orchestrator)
host.on(MessageReceived, ({ tabId }) => cancelTimer(tabId)); // payload inferred
host.on(TurnSealed, ({ tabId }) => armTimer(tabId));
```
The descriptor **is** the contract: importing it gives the id + payload type.
Adding a hook = exporting one more descriptor from its owner.
**Two hook kinds (and one thing that is NOT a hook):**
| Kind | Shape | Changes outcome? | Errors | Awaited by turn? | Example |
|---|---|---|---|---|---|
| **Event** | fire-and-forget, N listeners | No | **isolated per-handler — never breaks the turn** (today's rule) | No (optional bounded timeout) | `message.received`, `turn.sealed`, `tool.after` |
| **Filter** | chain, value in → value out, ordered | Yes (in-band) | fail-open + log by default; owner may mark a chain fail-closed | Yes (in-band; a slow filter slows the turn, by design) | system-prompt assembly, tool-result transform |
> **NOT a hook: request/response with exactly one responder** (e.g. "ask the
> human for permission"). That's a **service** (`host.provideService` /
> `getService`) — one responder, returns a value. Modeling it as a hook invites
> "which of N handlers wins?" ambiguity. (Permission-prompting is the tempting
> thing to mis-call a hook — it isn't one.)
**The workflow you actually care about — "add a hook later":**
1. Find the **owner** (the extension that performs the action).
2. Export one descriptor from its contract: `defineHook<Payload>("owner/the.action")`.
3. Emit at the action site: `host.emit(TheAction, payload)`.
4. The consumer `dependsOn` the owner and subscribes. **Kernel unchanged.**
The kernel changes *only* when the action is a kernel-intrinsic turn-loop moment
(e.g. a new `tool.before` phase) — and even then it's **+1 exported descriptor +
1 emit line**, never a structural change, because the mechanism is generic.
**Decisions baked in now (all grounded, P4):**
- **Namespacing (P8):** every hook id is `owner/name` (`kernel/turn.sealed`,
`session/message.received`) — prevents third-party collisions.
- **Event error isolation is a hard contract rule** (lifted from
`NotificationDispatcher`): a thrown/rejected event handler is caught, logged,
dropped — it can *never* fail the turn.
- **Filter ordering is deterministic:** dependency-topological registration order,
with an optional numeric `priority` escape hatch.
- **Async semantics:** events are not awaited (fire-and-forget, optional bounded
timeout); filters *are* awaited (in-band).
**Deliberately NOT built yet (P4 / P6):**
- No wildcard/pattern subscriptions (`turn.*`) until something needs them.
- No hook-to-hook dependency graph — registration order + `priority` suffices.
- **Don't hook every internal function.** A hook exists only where *cross-
extension* reaction is a real need (mirrors P6 — expose only what's needed).
Over-hooking turns the codebase into spaghetti-by-events.
**The cache-warming example, fully mapped:**
| Today (coupled) | Target (hooked) |
|---|---|
| `tabs.svelte.ts` calls `cacheWarming.onUserMessage(tabId)` | cache-warming does `host.on(MessageReceived, …)`; orchestrator emits it |
| `tabs.svelte.ts` calls `cacheWarming.onTurnEnded(tabId)` | cache-warming does `host.on(TurnSealed, …)`; kernel emits it |
| frontend hard-wires the dependency | cache-warming `dependsOn` session-orchestrator; zero call-site coupling |
Both hooks it needs (`message.received`, `turn.sealed`) already have natural
owners — **no prediction required**, which is the test that the model holds up.
### 3.6 Testability enforcement (design for tests, don't just write them)
**The principle:** don't merely write tests for code — write code *specifically so
it is testable*. Crucially, this is **not directly machine-enforceable**: a tool
can catch the *symptoms* of untestable code, never the intent. So the strategy is
two-pronged — **make the testable path the path of least resistance, then
mechanically catch the worst regressions.**
**Testability is an OUTPUT of principles we already adopted** — enforce the
*causes*, not the slogan:
- **P2 (inject effects)** → code becomes input→output → testable without mocks.
- **P3 (no ambient state)** → nothing hidden to stub → testable in isolation.
- **P1 (feature-as-a-library)** → small importable surface → testable standalone.
**Evidence in today's code (the disease we enforce against):**
`packages/api/tests/agent-manager.test.ts` is **2,142 lines** with a large
`vi.mock("@dispatch/core")` block — which exists *solely because* `agent-manager.ts`
reaches for its dependencies instead of receiving them. That is not a testing
failure; it's a P2/P3 failure that *manifested* in the tests. **Mock count is a
proxy metric for design quality** — that's the lever. (Today: ~14 test files use
`vi.mock`; the kernel + each pure-core must reach **zero internal mocks**.)
**The enforcement ladder (cheapest/strongest first):**
- **Tier 1 — Structural (free, mechanical, highest leverage).** The package
boundaries we're already building *are* testability enforcement. A feature's
decision logic lives in a package with **zero effectful imports** (no
`bun:sqlite`, `node:fs`, `node:child_process`) → it is *structurally
impossible* to write untestable effectful code there; the imports don't exist.
Proven by today's deliberately DB-free `chunks/transform.ts`. **Enforce via a
dependency-direction lint** (Biome `noRestrictedImports` forbidding effect
modules in pure files). The untestable version *doesn't typecheck* — this is
the real answer to "how do we enforce it."
- **Tier 2 — The no-mock smell test (the proxy metric).** Stated, reviewable rule:
*a unit test that needs to mock OUR OWN modules is a design bug, not a test to
write.* Allowed: mocking the **outermost edge** (real network, real clock).
Banned: mocking `@dispatch/*` internals. Mechanical proxy: a CI grep hard-fails
if a **kernel/core** test introduces an internal mock; the global count must
trend toward zero.
- **Tier 3 — Coverage as a FLOOR, not a target (with a caveat).** No coverage
tooling exists today — add `@vitest/coverage-v8`. But (P4): coverage is a bad
*target* (gameable — 100% of mock-heavy untestable code proves nothing) and a
useful *floor* **only on pure-core/kernel packages**, where high coverage is
cheap *because* the code is pure. **No global coverage gate** — it would
incentivize mock-heavy shell tests, the exact thing we're fighting.
- **Tier 4 — The harness layer (P5/P6 — teach the agents).** Encode the rule so
future agents inherit it: a `rules/` safety reflex (below) + a **testable-by-
default extension scaffold** in `sdk/` shipping the split pre-made: `logic.ts`
(pure, no deps) + `adapter.ts` (effects) + `logic.test.ts` (mock-free). When
the *template* is testable, the default output is testable.
**THE KEY CAVEAT — asymmetric enforcement (strict core, lenient shell).** This is
itself an application of the AI-harness thesis (P5/P6): **scoped rules beat
general rules** — models already know "write testable code"; what they need is
*"this kind of code, in this layer, gets tested this way."*
- **Pure core / kernel:** strict — zero internal mocks, dependency-direction lint,
coverage floor. High coverage is *cheap* here, so demand it.
- **Imperative shell (orchestrator, transport, real SQLite adapter):** lenient —
it will *never* hit high pure-unit coverage, and **forcing it to is the
anti-pattern** (you'd do it by mocking everything, recreating today's mess).
The shell gets a *thin layer of integration tests* against real / in-memory
backends. A blanket rule would backfire — enforcement is asymmetric **by
design**.
**`rules/` safety reflexes to ship (Tier 4, scoped per the asymmetry):**
- *Pure-core/kernel rule:* "Writing a unit test that mocks an internal module?
The code is wrong, not the test. Move the decision logic to a pure function and
inject the effect."
- *Pure-core/kernel rule:* "This package must have zero effectful imports
(`node:fs`, `bun:sqlite`, `node:child_process`, network). Need an effect?
It belongs in the adapter/shell, injected."
- *Shell rule:* "Don't chase pure-unit coverage here. Write a few integration
tests against a real or in-memory backend; do NOT mock sibling extensions."
- *General (all):* "Mocking the outermost edge (real network/clock) is fine;
mocking `@dispatch/*` is a smell — fix the boundary."
**The enforced standard (commit to this):**
1. Every extension has a **pure core with zero effect-imports**, lint-enforced
(Tier 1) — *the load-bearing one.*
2. **No internal mocks in kernel/core tests** — CI grep; proxy metric → zero (T2).
3. **Coverage floor on pure packages only**, never global (Tier 3).
4. **Scoped `rules/` reflexes + a testable-by-default scaffold** (Tier 4).
**Tooling actions (when we start):** add `@vitest/coverage-v8`; add the
dependency-direction lint (Biome `noRestrictedImports`) scoped to pure packages;
add the CI internal-mock grep for kernel/core; ship the `sdk/` scaffold.
### 3.7 Trust & isolation model (fault containment, not adversary sandboxing)
**Threat model first (P4 — defend a real threat, not an imported one).** Dispatch
is **personal, self-hosted, single-operator** today. So:
- **Malicious extension** (data theft, host attack) — **NOT the current threat.**
You run the host and choose the extensions; an installed extension is already
as trusted as code you write. The "untrusted plugin marketplace" justification
for sandboxing does not apply *yet* (revisit if Dispatch goes multi-tenant or
ships a public registry).
- **Buggy extension** (infinite loop, unhandled rejection, leak, bad migration)
taking down every other tab/agent — **REAL and present**, especially since we
want external/custom extensions. This directly threatens the §3.4 "never leave
the system broken" guarantee.
**So we defend against FAULTS, not ADVERSARIES** — until the project's nature
changes. That distinction collapses the decision.
**Options considered:**
- **A — In-process, trusted (no isolation):** simplest/fastest, rich live-object
API. But one throw / `process.exit` / leak hits everyone; capabilities are only
advisory. *Too little — contradicts §3.4.*
- **C — Hard isolation (worker/subprocess/VM per extension):** real fault *and*
adversary isolation, enforceable capabilities. But **forces the entire Host API
to be serializable** — no live `provider` handed to `runTurn`, no closure
handlers, no streaming `ctx.onOutput` without marshalling — fighting *every*
contract we designed, at real per-call IPC cost. *Too much, too early; defends
a threat we don't have, and deforms the contracts (the P4 anti-pattern).*
- **B — Soft isolation (in-process, defensively wrapped):** keep the rich
in-process API, but the host wraps every extension boundary. **CHOSEN.**
**DECISION: adopt B now; design contracts so C remains *possible* later without a
rewrite.** Concretely:
- **Host API stays rich/in-process** — live handlers, streams, objects. All prior
design holds unchanged.
- **Every extension boundary is defensively wrapped:** handler try/catch (already
§3.5), **mandatory timeouts on awaited filters** (§3.5 makes filters in-band, so
a runaway filter must be time-bounded), and **per-extension fault tracking →
auto-disable a repeatedly-faulting extension** (contains the fault instead of
letting it recur; ties to §3.4).
- **Tier-aware auto-disable (mirrors the §3.6 asymmetry — strict core, graceful
edge):** `standard`/`external` extensions *may* be auto-disabled on repeated
faults; **`core`/`kernel` faults are fatal-and-surfaced, never silently
degraded** — you want to know storage/transport is broken, not limp on. (Tools
also get a deterministic residual result per §3.4 R2, so a tool fault never
orphans a turn.)
- **Capabilities are declared + gate-enforced at the Host-API surface**
(advisory-but-checked), NOT OS-sandboxed. Honest scope: this catches accidental
overreach and documents intent; it does not stop determined native code.
- **Cheap future-proofing for optional C later:** keep contract payloads
**structured and in-principle serializable** (the typed hook/service handles of
§5.4 already push this way) — don't pass arbitrary live object *graphs* between
extensions via services. Then moving one untrusted extension into a worker is a
localized change, not an architecture rewrite.
- **Manifest `trust` field** (`bundled` | `local` | `external`) recorded now even
though all three behave identically under B — so the *policy hook* exists when
we later want to treat `external` differently (e.g. worker isolation) without
inventing the concept then.
**Harness rules this decision generates (scoped per §5.1 layered knowledge; write
into the agent files when those agents are built — NOT now, per §7.4):**
- *All extension-author agents (shared knowledge):* "Your hook/filter handlers
must never throw uncaught — the host wraps them, but a throw burns your fault
budget and can auto-disable your extension." / "Filters are awaited and
time-bounded — no unbounded work in a filter." / "Assume your extension can be
disabled/reloaded independently; don't rely on ambient process state surviving
(§3.4)."
- *Service/contract-defining agents only:* "Keep service/contract payloads
structured and serializable-friendly — no passing live object graphs across the
extension boundary (preserves the option to isolate later)."
- *Kernel/core agents only (strict):* "Core/kernel faults are fatal-and-surfaced,
NOT auto-disabled — never write graceful-degradation code that hides a
storage/transport failure."
- *Tooling-enforced → deliberately NOT in agent files (P6):* the typed-handle
rule (§5.4) is a compile error, and capability over-declaration is caught at
manifest load — neither is written down as prose.
---
## 4. Cross-cutting decisions to lock down (when we start)
- **Contract versioning:** convention-only & dormant in `0.x` (§2.9). Each package
self-versions; semver *meaning* is changelog hygiene + the §5.3 fan-out signal.
Internal safety = the type system; the compat gate / `.d.ts`-diff are deferred
until external extensions exist.
- **Trust & isolation:** **soft isolation (B)** — rich in-process Host API +
defensively-wrapped extension boundaries (handler try/catch, filter timeouts,
tier-aware auto-disable). Defends FAULTS not adversaries; contracts kept
serializable-friendly so hard isolation (C) stays possible later (§3.7).
- **System prompt / persona:** becomes a context-filter contribution, not a
hard-coded string — so the assistant's "feel" is swappable.
- **Migrations ownership:** each extension owns its tables; the kernel only runs
the migration runner. Defines a clean uninstall story.
- **Deterministic tool-set per turn:** reproducible from `(agent profile +
capabilities + active extensions)` — this is P3 made concrete and kills
wishlist bugs #16/#17.
- **Tool-dispatch policy:** togglable per §3.3; default value is an open question
(see §8).
- **Durability / crash recovery:** incremental append + pure `reconcile()` on load
+ derived status (§3.4). Design rule: no persisted state a crash can leave may
be unrepairable — recovery is deterministic and invisible to the user.
- **Hooks:** decentralized catalog — kernel owns the mechanism, each extension
declares the hooks it emits via typed descriptors (§3.5). Events are
error-isolated; filters are in-band; single-responder request/response is a
service, not a hook.
- **Testability enforcement:** asymmetric — strict on pure core (zero
effect-imports lint, no internal mocks, coverage floor), lenient on the shell
(thin integration tests) (§3.6). Mock-of-internals count is the proxy metric.
- **Agent workflow:** one owner-agent per extension/kernel; agents see only
others' contracts, never implementation; contract changes fan out mechanically
via `lsp references`; non-static cross-extension coupling is forbidden;
glossary terms are human-gated (§5).
---
## 5. Repo & agent workflow conventions (one agent per unit)
The repo's **agent-team structure is isomorphic to its module structure**: agents
communicate through exactly the same contracts the code communicates through. This
is Conway's Law made intentional, and it yields a diagnostic property:
> **Friction between agents is a signal of bad architecture.** Constant
> agent-to-agent messaging ⇒ the contract boundary is wrong. An agent needing to
> read another's implementation ⇒ that contract is underspecified. The workflow
> *surfaces* design smells instead of hiding them.
It is not a bolt-on — every row below already exists in this plan:
| This model needs… | …already provided by |
|---|---|
| Contracts as the only cross-agent surface | ABI (kernel) + two-sided per-extension contracts (§2.3) |
| One agent per unit | P1 feature-as-a-library — one library, one owner |
| Per-agent scoped knowledge | **P7 extension-scoped harness** — an extension's AGENTS.md/rules/glossary *is* its owner-agent's knowledge |
| Layered knowledge (group → file) | P5 tiered-cache layering (§7.1) |
| Persistent, messageable agents | Dispatch's own tabs + `send_to_tab` + `summon`/`retrieve` |
| Bounded cross-agent chatter | the existing `MAX_AGENT_AUTO_WAKES` budget |
| Orchestrator confirms without reading code | **§3.6 testability** — tests-at-boundaries are the trust mechanism |
The last row is the deepest synthesis: **§3.6 is the orchestrator's verification
protocol.** It can't read code, so it confirms "everything works" from
*contracts + test results + build/diagnostics output* — which only works because
we made the boundaries testable. The keystone equivalence: **P7 harness docs ARE
the agents' scoped knowledge** — the same artifact, two views; you don't design
knowledge-scoping separately.
### 5.1 The ownership model
- **One owner-agent per unit** (each extension, and the kernel). Its file(s) are
edited by no one else → single-writer, so a (future) sleeping agent wakes
knowing its own code is current.
- **Knowledge is scoped & layered** (P5/P7): shared group knowledge (e.g. all
"frontend" agents) → per-extension knowledge → per-file specifics. An owner
loads only its layer, so it is a narrow-domain expert with lean context.
- **Visibility rule:** an agent sees **only what other extensions
expose/require** (their contracts) — never their implementation. Implementation
is **not provided by default** (P6/§3.6 caveat #3); *needing* it is a signal
the contract is incomplete — fix the contract (or ask the owner), don't grant
code access. Corollary: **a contract documents behavior & guarantees a consumer
can rely on, not just types** (P6 applied to contracts).
- **Phase note (P4):** start by **summoning fresh agents per task** — files
aren't complex enough to justify warm/persistent agents yet. Persistent
*waking* agents (and the wake-time "contract-delta since last active" sync they
require) are deferred to **after the rewrite**.
### 5.2 The workflow (build a feature)
1. User asks the **orchestrator** for feature X. (Orchestrator sees all
*contracts*, no implementation.)
2. **Overlap check first (anti-webhook-reimplementation, §7):** orchestrator
consults the GLOSSARY + feature-docs to see whether the capability already
exists under a canonical term.
3. **Boundary decision is the USER's, never silent (resolves §3.6 #5):** if X
maps to a new capability, the orchestrator **surfaces "new extension vs.
extend an existing one?" to the user** and waits — it never decides
granularity itself (this is the exact failure the article warns about; the
glossary/feature-docs are the defense, the user is the authority).
4. Orchestrator **summons the owner-agent(s)** to do the work and **messages any
extensions needing changes** (via their owner-agents).
5. Owners report back; orchestrator confirms via contracts + tests + build.
6. Clarification questions agent↔agent are *allowed but rare* — everything an
agent needs (contracts) is already exposed; a needed question usually means a
contract gap.
### 5.3 Contract changes — mechanical blast radius (resolves §3.6 #2)
A contract change is the one event that legitimately fans out. It is handled
**mechanically, not by guessing**, via the existing `lsp` tool:
1. The contract's owner edits it, then runs **`lsp references`** on the changed
symbol(s) → the complete set of consuming files.
2. The owner **reports that file list up to the orchestrator** (it can't see
other extensions itself); the **orchestrator dispatches** the affected
owner-agents to update to the new contract.
- **Ownership:** kernel-intrinsic ABI → kernel agent (most conservative, changes
rarely). Per-extension contracts → that extension's agent, **co-located in its
package** (not a central dir — see §2.5).
- **Prerequisite:** a **TypeScript language server** wired into `dispatch.toml`
(today's LSP config only has the Luau example).
### 5.4 Static-reference rule — non-static cross-extension coupling is forbidden
For §5.3 to be *sound*, `lsp references` must see every coupling. So:
> **Every cross-extension coupling is anchored to an exported typed symbol.**
> Dynamic/string-keyed cross-feature references are forbidden.
- **Enforced by the type system, not a lint:** the Host API *accepts only typed
handles* — `host.on(HookDescriptor<T>, …)`, `host.getService(ServiceHandle<T>)`
— so a raw string at a consumer site is a **compile error** (surfaced via `lsp
diagnostics`). The raw string exists in exactly one place: the owner's
`defineHook`/`defineService` declaration. `lsp references` on that exported
symbol therefore returns the true, complete blast radius. This is *why* typed
descriptors (§3.5) + typed service handles (§2.3) beat string lookups — not
aesthetics, but making the agent workflow mechanically sound.
- **Scope (P4 — don't overclaim):** this bans cross-extension **code** coupling.
Two dynamic lookups are *legitimate and stay*, because they are **data flow /
discovery inside the kernel-host, not feature-to-feature references**:
(a) the kernel routing a model's tool-call by name (`byName.get(name)`) — the
name is the LLM's runtime choice, i.e. data; (b) the host loading extensions by
scanning manifests (traced by the manifest DAG, not symbol refs).
- **The one escape hatch (named, restricted):** generic observability (e.g. a
logger wanting *every* hook) may use a single `host.onAny(listener)` firehose,
explicitly marked "observability only, never feature code."
### 5.5 Integration bugs — the temporary multi-knowledge agent
A bug where X and Y each honor the contract yet don't work together belongs to no
single file. Resolution (resolves §3.6 #4):
- The orchestrator dispatches a **temporary multi-knowledge agent** loaded with
the **scoped knowledge AND read/write access to the 2–3 relevant files** —
unlike normal agents it *does* see implementation, because fixing integration
requires it.
- It becomes the **temporary exclusive owner** of those files for its lifetime
(the orchestrator must not let the normal owners edit them concurrently →
preserves single-writer).
- **Both trigger paths:** the orchestrator dispatches it proactively, OR a
file-owner who spots the bug **requests one from the orchestrator** (reuses the
§3.5 agent→orchestrator message path; no new mechanism).
- It leverages the existing knowledge-scoping so the agent gets *exactly* the
context to fix the seam and no more.
### 5.6 The glossary is a human-gated checkpoint (strengthens P8)
This is the article's central anti-synonym-drift mechanism: the GLOSSARY's
**"aliases to avoid" column** exists so the agent never reinvents a concept under
a new name (the article's `WebhookEvent` / `WebhookHook` / `HookedWebhook`
problem), and the §5.2 step-2 overlap check is *when* it runs ("mandatory feature
overlap detection before any new feature"). The orchestrator may **never silently
coin a term.** Two cases:
**Case A — concept already exists (synonym-drift defense — the priority).** When a
request *describes* an existing concept — even by behavior, under a different name
— the orchestrator must **recognize the match and steer to the existing canonical
term, creating NO new entry.**
- *Example (the user's):* request = "implement a **web-notifier**: accept a
request from an HTTP endpoint requiring no password, then log it." The
orchestrator recognizes this *is* a **webhook** (already in the glossary) and
responds "that's a `webhook` — I'll use that name," rather than adding
"web-notifier".
- Recognition is powered by the glossary's aliases + overlap check, and works on
**behavioral descriptions**, not just name matches.
- **Still suggest-then-confirm (P4):** recognition can misfire (the user may mean
something subtly different). The orchestrator *proposes* the match ("this looks
like a `webhook` — shall I call it that?"); the user has final say. It never
silently collapses a possibly-distinct concept into an existing term. If the
user confirms it's a new alias for an existing term, add it to that term's
"aliases to avoid" column (don't make a new entry).
**Case B — genuinely new concept (name it well).** When the concept is actually
new, before adding the entry the orchestrator must:
1. State the new term and its understanding of what it means.
2. **Propose a name, defaulting to the standardized / training-baked term**
(e.g. "patch" not "Bugfix"; "debounce" not "cooldown-wait"). Rationale (P6): a
name models already know costs **zero agent-file/glossary space**, so the
glossary only grows entries for genuinely project-specific concepts — it
actively fights its own bloat.
3. **Ask the user** to approve or rename. The user is the final authority: if they
prefer a different name, **always go with the user's choice** (record the
standard term, if any, under "aliases to avoid"). The "suggest the standard
name" rule applies only to a *not-yet-decided* term — never to override a name
the user already set.
This keeps the user the authority on the project's vocabulary and makes synonym
drift impossible at the source — P8 with a mandatory human in the loop, biased
toward (A) reusing existing terms and (B) names the model already knows.
---
## 6. Current-state map (as of this plan)
Dependency direction is one-way: **`frontend → api → core`**. `core` is already
framework-agnostic (no Hono/HTTP) — the cleanest existing seam. *(Note: "core"
here is the **current** package name; under the new model the runtime primitive
is the kernel and "core" becomes the extension tier — see §2.6.)*
```
packages/
│
├── core/ → @dispatch/core — shared domain logic (the "brain"), framework-agnostic
│ │ (exported via src/index.ts barrel)
│ ├── agent/agent.ts agentic LLM loop (streamText + manual tool-call dispatch,
│ │ dedup, per-line/spill truncation, user-interrupt injection,
│ │ reasoning-effort, multimodal user content)
│ ├── llm/
│ │ ├── provider.ts createProvider() — Anthropic + OpenAI-compatible factories,
│ │ │ mcp_ tool-name prefix/unprefix
│ │ ├── anthropic-oauth-transform.ts Claude OAuth request-body transform
│ │ └── debug-logger.ts DISPATCH_DEBUG_LLM stream/loop/fetch logging
│ ├── tools/ tool implementations (each createXTool → ToolDefinition)
│ │ ├── registry.ts createToolRegistry; Zod→JSONSchema + Anthropic normalize
│ │ ├── read-file.ts, read-file-slice.ts, write-file.ts, list-files.ts
│ │ ├── run-shell.ts (+ BackgroundShellStore), shell-analyze.ts, bash-arity.ts
│ │ ├── search-code.ts, web-search.ts, youtube-transcribe.ts (+ BackgroundTranscriptStore)
│ │ ├── summon.ts, retrieve.ts sub-agent spawn / result collection
│ │ ├── send-to-tab.ts, read-tab.ts tab-to-tab comms
│ │ ├── task-list.ts (todo), key-usage.ts, lsp.ts
│ │ ├── truncate.ts universal tool-output truncator + /tmp spill
│ │ └── path-utils.ts canonicalize / workdir-containment guard
│ ├── db/ SQLite (bun:sqlite, XDG data dir)
│ │ ├── index.ts singleton DB + table DDL/migrations (credentials, api_keys,
│ │ │ usage_cache, wake_schedule, tabs, chunks, settings)
│ │ ├── tabs.ts tabs CRUD, short-prefix resolution, positions/status/title
│ │ ├── chunks.ts append-only chunk log: explode/group rows ↔ messages, usage
│ │ └── settings.ts key/value settings
│ ├── chunks/ pure conversation-model transforms (no DB import — shared w/ frontend)
│ │ ├── append.ts appendEventToChunks / applySystemEvent (stream → Chunk[])
│ │ └── transform.ts explode/group between Chunk[] and flat ChunkRow log
│ ├── compaction/index.ts head/tail selection, summary prompt + transcript render
│ ├── config/ dispatch.toml (global ~/.config + project merge)
│ │ ├── loader.ts, schema.ts, watcher.ts, index.ts load/validate/hot-reload; configToRuleset
│ ├── credentials/ claude.ts (OAuth identity/billing), api-keys.ts, opencode.ts,
│ │ copilot.ts, google.ts, anthropic-betas.ts, store.ts, index.ts
│ ├── models/ registry.ts (ModelRegistry, key states), catalog.ts,
│ │ attachments.ts (image/pdf validation + limits), index.ts
│ ├── skills/ parser.ts, loader.ts, index.ts (skill files → agent injection)
│ ├── agents/ loader.ts, index.ts (global + .dispatch/agents defs, tool-group expand)
│ ├── permission/ rules engine: evaluate.ts, service.ts, wildcard.ts, index.ts
│ ├── lsp/ manager.ts, client.ts, server.ts, language.ts, diagnostic.ts, index.ts
│ ├── notifications/ ntfy.sh: dispatcher.ts, ntfy.ts, config.ts, types.ts, index.ts
│ ├── types/index.ts ALL shared contracts: Chunk/ChatMessage, AgentEvent, AgentConfig,
│ │ ToolDefinition, ToolExecuteContext, DispatchConfig, ReasoningEffort…
│ └── index.ts public barrel (entire core API surface)
│
├── api/ → @dispatch/api — backend HTTP + WebSocket server (Hono on Bun)
│ ├── index.ts Bun.serve (+ EADDRINUSE port-fallback) + /ws WebSocket
│ │ (statuses snapshot, event fan-out, permission replies)
│ ├── app.ts Hono app + CORS; /health, /status, /chat (main entry),
│ │ /chat/cancel, /chat/stop, /chat/warm; mounts routes;
│ │ constructs agentManager + permissionManager + notificationDispatcher
│ ├── agent-manager.ts THE orchestrator (~2.4k lines): per-tab turns, message queue,
│ │ key/model fallback chain, system-prompt assembly (buildSystemPrompt
│ │ + TOOL_DESCRIPTIONS), per-turn tool assembly (perm/whitelist gated),
│ │ sub-agent spawning, LSP-on-write hook, auto-wake budget, compaction
│ ├── permission-manager.ts tool-permission prompts/replies over WS
│ ├── wake-scheduler.ts pure Claude wake-probe scheduling helpers (4 slots/hour, recovery)
│ ├── types.ts thin re-export of AgentEvent/AgentStatus from core
│ ├── routes/ /config, /tabs, /models (+ startWakeScheduler), /skills,
│ │ /agents, /notifications (each uses a setXGetter injection seam)
│ └── tests/ agent-manager, routes, permission-manager, wake-scheduler
│
└── frontend/ → Svelte 5 SPA (Vite); morphable, reworked later
├── main.ts, App.svelte, app.css
└── lib/
├── tabs.svelte.ts central store: sendMessage + WS event handling
├── ws.svelte.ts WebSocket client (auto-reconnect)
├── router.svelte.ts, config.ts, types.ts, theme.ts, settings.svelte.ts
├── context-window.ts, attachment-tokens.ts, snapshot-sequencer.ts
├── cache-warming.svelte.ts, cache-warm-storage.ts, sidebar-storage.ts
└── components/ ChatInput, ChatPanel, ChatMessage, ToolCallDisplay,
TabBar, ModelSelector, ConfigPanel, AgentBuilder,
SystemPromptPanel, SkillsBrowser, ToolPermissions,
PermissionPrompt, TaskListPanel, KeyUsage, CacheRatePanel,
ContextWindowPanel, SettingsPanel, MarkdownRenderer, … (23 total)
```
### 6.1 Key facts that matter for the rework
- **`agent-manager.ts` is the center of gravity** (~2,453 lines): per-turn tool
assembly, system-prompt building, provider/key resolution, sub-agents,
queueing all fused. This is what dissolves into kernel + core orchestrator +
standard contributions.
- **`types/index.ts` is the de-facto contract layer today** — `ToolDefinition`,
`AgentConfig`, `AgentEvent`, `DispatchConfig` all live here. Natural seed for a
real `contracts` package (kernel).
- **Routes already use a `setXGetter` injection pattern** (`setSkillsGetter`,
`setModelsGetter`, …) — a primitive form of the DI seam the extension host
would formalize.
- **Per-turn tool assembly is a giant duplicated if/else** in `agent-manager`
(parent-perms path + child-whitelist path) — prime candidate for a registry
populated by extensions.
- **Tool execution today is post-stream + sequential** (`agent.ts` ~line 1426) —
see §3.3 for the eager/concurrent redesign.
---
## 7. The AI Harness (meta-information layer)
From "The AI Harness: why your AI coding agent is only as smart as the repo you
put it in" (Louai Boumediene, Activepieces). Thesis: the model is rarely the
bottleneck — the structured meta-information around the code is. Agent context is
a **tiered cache**: tiny files always loaded, big files on demand.
### 7.1 The layering (governing test: P6 — only the non-inferable)
| Layer | Size / load | Purpose |
|---|---|---|
| Root `AGENTS.md` — "constitution" | ~55 lines, **every session** | Non-obvious architecture rules only |
| Per-package/extension `AGENTS.md` | ~30–55 lines, when working there | Package-specific patterns |
| `rules/` — "safety reflexes" | 3–5 lines each, every session | Crystallized scar tissue (bugs you've reverted) |
| `features/*` — "module encyclopedia" | ~60 lines each, on demand | Entity schemas, data flow, gotchas per module |
| `skills/*` — codified workflows | slash commands, progressive disclosure | Fixed procedures for repeated tasks |
| `GLOSSARY.md` | term table + "aliases to avoid" | Fights synonym drift |
### 7.2 Why it applies strongly to us (evidence, not fashion)
- **The layering maps 1:1 onto minimal-kernel + extensions.** "One ~60-line doc
per module" *is* "one doc per extension" — the extension boundary is the doc
boundary. The architecture gives us the harness structure for free.
- **We already have the scar tissue that becomes `rules/`:** Anthropic schema
normalization in `registry.ts` ("Claude never sees the tool and thinks
forever"), workdir-containment in `path-utils.ts`, tool-call dedup ("150+
identical calls"), `[USER INTERRUPT]` stripping, the no-`execute` tool pattern.
These are postmortems-as-comments — promote them to 3–5 line rules.
- **Real synonym-drift problem** (P8): tab/session/conversation,
chunk/message/turn/step. A glossary with "aliases to avoid" is warranted.
### 7.3 The special angle for this project (synthesis)
Dispatch is **recursive** — an AI-agent platform that itself *has* skills, agents,
and permissions. Two consequences:
- **The harness is extension-scoped (P7):** each extension carries its own
constitution snippet, rules, feature doc, glossary terms, and skills, portable
with the code. Feature-as-a-library applied to documentation.
- **"Tiered context as a cache" is already Dispatch's product behavior**
(prompt-caching, on-demand skills, compaction). The article describes from the
outside the thing we build from the inside — a strong signal the layering is
sound.
### 7.4 What we bound or reject (P4 applied)
- **Volume (40+ docs, 9 skills) and the 5-features/week cadence** — scale
artifacts of a 12-engineer, 1.6M-LOC monorepo. Our version: write a doc the
moment we touch an extension that lacks one (doc-first as the plan brief), grow
organically.
- **Worktrees / parallel sessions / weekly rhythm / MCPs** — that's *workflow*,
not *architecture*; out of scope for the structure we're designing.
(Amusingly, Dispatch's parallel tabs are its own take on parallel sessions.)
---
## 8. Open questions / where we start (TBD)
- **Starting point (proposed):** lock the **Contracts** + **Extension Host**,
then prove the whole stack with one vertical slice — e.g. extract `read_file`
into a standalone, independently-importable `standard` extension with
pure-core / injected-shell tests. That single slice validates the architecture
(P1, the contracts, the host, the tier model) and the engineering constraints
(P2, P3) before scaling out.
- **Open decisions before we begin:** none remaining — all resolved (see below).
- **Deferred to after the rewrite (P4):**
- Persistent *waking* agents + their wake-time "contract-delta since last
active" sync (§5.1) — start with fresh-summoned agents.
- TypeScript language server wired into `dispatch.toml` is a **prerequisite**
for §5.3's `lsp references` workflow (today only Luau is configured).
- **Vocabulary unification — `command` → `action` (P8; raised during the frontend design,
`notes/frontend-design.md` §9):** the frontend names a backend-invokable action
`action` / `action ref`; the backend's existing contribution point is `command`. Review
renaming `command` → `action` so both sides share ONE term. Until this review the backend
keeps `command` and the frontend uses `action`. Cheap today (the `command` contribution is
design-stage, lightly built); if pursued, fan out via `lsp references`.
- **Decided so far:**
- ~~Tool-dispatch default policy~~ — **DECIDED** (§3.3): default
`{ maxConcurrent: 1, eager: true }`.
- ~~Who drives the multi-step loop~~ — **DECIDED**: the **kernel** drives it
(the loop is the kernel's reason to exist); tools stay dumb objects it calls.
- ~~Conversation-store boundary~~ — **DECIDED** (§2.2, §2.8): the kernel keeps
only the conversation **model types** + pure transforms; the persistent store
and SQLite backend are **`core` extensions** (fixes the §2.2/§2.7 I/O
inconsistency).
- ~~"Minimum viable turn" target~~ — **DECIDED** (§2.8): `core` targets **(B)**
a usable multi-turn chat; the storage backend is the single swappable piece
that drops it to the **(A)** stateless floor (= the in-memory test config).
- ~~Crash-recovery strategy~~ — **DECIDED** (§3.4): incremental append-only
persistence (R1), pure `reconcile(rows)` repair on load (R2), derived/boot-
swept status (R3), resume = load→reconcile→continue (R4). Mid-stream
resumption (wishlist #1) explicitly deferred.
- ~~Hook system shape~~ — **DECIDED** (§3.5): decentralized typed-descriptor
catalog (kernel owns mechanism, owners declare hooks); events vs filters;
single-responder = service, not hook. Wildcards/pattern-subs deferred.
- ~~Testability enforcement~~ — **DECIDED** (§3.6): structural (zero
effect-imports in pure packages, lint-enforced) + no-internal-mocks proxy
metric + coverage floor on pure packages only + scoped `rules/` reflexes;
enforcement is **asymmetric** (strict core / lenient shell).
- ~~Agent workflow / repo conventions~~ — **DECIDED** (§5): one owner-agent per
unit; contracts are the only cross-agent surface (implementation hidden by
default; needing it = contract gap); contract changes fan out via `lsp
references` (orchestrator dispatches); **non-static cross-extension coupling
forbidden** (typed handles, type-system-enforced, `onAny` escape hatch);
temporary multi-knowledge agent for integration bugs; **glossary is
human-gated** (orchestrator must ask before coining a term).
- ~~Per-extension contract location~~ — **DECIDED** (§2.5, §5): co-located in
each extension package; only the kernel ABI is centralized in
`kernel/contracts/`.
- ~~Boundary granularity (new ext vs extend)~~ — **DECIDED** (§5.2): the
**user** decides; the orchestrator surfaces it after a glossary/feature-doc
overlap check, never silently.
- ~~Trust & isolation model~~ — **DECIDED** (§3.7): **soft isolation (B)** —
rich in-process API + defensively-wrapped boundaries; defends faults not
adversaries (single-operator threat model); tier-aware auto-disable (strict
core / graceful edge); contracts kept serializable-friendly + manifest
`trust` field so hard isolation (C) stays possible without a rewrite.
- ~~Contract-versioning policy~~ — **DECIDED** (§2.9): convention-only & dormant
in `0.x`; each package self-versions; semver meaning (major=break/fan-out,
minor=additive, patch=internal) as changelog hygiene + §5.3 signal; type
system is the internal mechanism; compat gate + `.d.ts`-diff deferred until
external extensions exist.
- ~~Core-default provider/auth~~ — **DECIDED** (§2.10): **OpenAI-compatible +
API-key** (`provider-openai-compat` + `auth-apikey`) — leanest auth surface,
most-testable, and = the **OpenCode Go flash** testbench. Claude/OAuth and the
Anthropic-format OpenCode models are `standard` extensions.
---
## Appendix — Principle quick-reference
- **P1** Feature-as-a-library (importable, minimal API; don't over-split)
- **P2** Functional core / imperative shell (testability not purity; inject effects)
- **P3** No ambient state (own and pass explicitly; reproducible tool-sets)
- **P4** Don't adopt by reputation (earn each pattern against real evidence)
- **P5** The repo is a harness (meta-info is a first-class, tiered deliverable)
- **P6** Document only the non-inferable (tribal knowledge / scar tissue only)
- **P7** The harness is extension-scoped (docs portable with the code)
- **P8** One canonical vocabulary (glossary + aliases-to-avoid; no synonym drift)
|