summaryrefslogtreecommitdiffhomepage
path: root/backend-to-fe-handoff-2.md
blob: 942381be29db2a8986ffd591315bf3b0bd6a29c9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Backend → FE handoff — context window + percentage-based compact

> Courier to `../dispatch-web`. Response to the context-window ask in
> `backend-handoff.md` §3 + compacting rework.

## What shipped

1. **`GET /models` now includes `contextWindow` per model** — the FE can replace
   the hardcoded `MAX_CONTEXT = 1,000,000` with the real value.
2. **Auto-compact is now percentage-based** (default 85% of `contextWindow`)
   instead of a flat token count (was 350k).

## Bump pinned deps
- `@dispatch/wire` → `0.11.0` (unchanged)
- `@dispatch/transport-contract` → `0.15.0` (unchanged — the new types are
  additive to the existing version)

## `GET /models` — now includes `modelInfo`

The response now includes an optional `modelInfo` map alongside the existing
`models` array. The `models` array is unchanged (backward compatible).

```ts
interface ModelsResponse {
  readonly models: readonly string[];
  readonly modelInfo?: Readonly<Record<string, ModelMetadata>>;
}

interface ModelMetadata {
  readonly contextWindow?: number;  // max tokens (e.g. 200000)
}
```

**Example response:**
```json
{
  "models": ["opencode/deepseek-v4-flash", "umans/umans-glm-5.2"],
  "modelInfo": {
    "opencode/deepseek-v4-flash": { "contextWindow": 128000 },
    "umans/umans-glm-5.2": { "contextWindow": 200000 }
  }
}
```

`modelInfo` is absent when no provider reports `contextWindow`. Each key is the
same `<credentialName>/<model>` string from the `models` array.

**What the FE should do:**
- When rendering the composer status bar, use `modelInfo[selectedModel].contextWindow`
  as the denominator for `contextSize / contextWindow · pct%`.
- Fall back to the current hardcoded `1,000,000` when `modelInfo` is absent or
  the selected model has no `contextWindow`.

## Auto-compact: now percentage-based

**Old:** flat token threshold (default 350000). `contextSize >= threshold`.
**New:** percentage of `contextWindow` (default 85%). `contextSize >= contextWindow × (percent / 100)`.

Also fixed: the check now uses `contextSize` (true context occupancy = last
step's `inputTokens + outputTokens`) instead of the overcounted aggregate
`usage.inputTokens` (which summed every step's re-prefilled prompt).

### `GET /conversations/:id/compact-percent` — read percent

200: `CompactPercentResponse { conversationId, percent }`
- `percent: 0` — auto-compact explicitly disabled (manual only).
- `percent: null` (not stored) — **default: 85** (85% of the model's context window).
- Any positive number (1-100) — auto-compact triggers when `contextSize`
  exceeds `percent`% of the model's `contextWindow`.

### `PUT /conversations/:id/compact-percent` — set percent

Body: `SetCompactPercentRequest { percent: number }`
- `0` explicitly disables auto-compact.
- Any positive number (1-100) sets the trigger percentage.
- Default (when not stored) is 85.

200: `CompactPercentResponse`

**Renamed from `compact-threshold`** — the old endpoint paths, request types,
and response types are gone. Update any FE code that referenced
`compact-threshold`.

## New types

```ts
// @dispatch/transport-contract
export interface ModelMetadata {
  readonly contextWindow?: number;
}

// ModelsResponse now has modelInfo (additive — models array unchanged)
export interface ModelsResponse {
  readonly models: readonly string[];
  readonly modelInfo?: Readonly<Record<string, ModelMetadata>>;
}

// Renamed from CompactThresholdResponse
export interface CompactPercentResponse {
  readonly conversationId: string;
  readonly percent: number;  // 0 = manual; null = default 85
}

// Renamed from SetCompactThresholdRequest
export interface SetCompactPercentRequest {
  readonly percent: number;
}
```

## What the FE needs to do

1. **Use real `contextWindow`** from `GET /models` → `modelInfo[model].contextWindow`
   instead of hardcoded `MAX_CONTEXT = 1,000,000`.

2. **Rename compact-threshold → compact-percent** in any FE code:
   - `GET /conversations/:id/compact-percent` (was `compact-threshold`)
   - `PUT /conversations/:id/compact-percent` (was `compact-threshold`)
   - `percent` field (was `threshold`)

3. **Settings UI**: change the number input from "token count" to "percent
   (0-100)". Default 85. 0 = manual only.

4. **No other changes** — the compact endpoint, WS message, and chain
   architecture are unchanged.