diff options
Diffstat (limited to 'notes')
| -rw-r--r-- | notes/ntfy-notifications-handoff.md | 311 | ||||
| -rw-r--r-- | notes/wake-schedule-handoff.md | 442 |
2 files changed, 753 insertions, 0 deletions
diff --git a/notes/ntfy-notifications-handoff.md b/notes/ntfy-notifications-handoff.md new file mode 100644 index 0000000..fde84c8 --- /dev/null +++ b/notes/ntfy-notifications-handoff.md @@ -0,0 +1,311 @@ +# Handoff — n2/ntfy-notifications + +## Summary + +Adds **ntfy.sh push notifications** to Dispatch: a configurable per-event +notification dispatcher that POSTs to a user-supplied ntfy topic URL when +notable events happen in the running agent process. + +The architecture is intentionally layered so a future transport (email, +Slack webhook, custom backend) plugs in without touching call sites: + +``` +AgentManager.onEvent ─┐ ┌─→ sendNtfy (fetch) + ├─→ NotificationDispatcher.notify(event) +PermissionMgr ────────┘ (filter / dedupe) └─→ (other transports later) +.onPromptAdded +``` + +### Event taxonomy + +The user toggles each one independently in Settings: + +| event | trigger | default | priority | tags | +|-----------------------|--------------------------------------------------------------------------|---------|----------|------------------| +| `turn-completed` | assistant `done` event (one per cleanly-finished turn) | on | 3 | white_check_mark | +| `turn-error` | assistant `error` event (final, after all fallback retries) | on | 4 | rotating_light | +| `permission-required` | `PermissionManager` newly admits a prompt to its pending list | on | 4 | lock | +| `agent-spawned` | `tab-created` for a **top-level user agent** (parent=null, slug present) | off | 2 | sparkles | + +Each notification carries a short tab tag (`tab-<first8>`) so multi-tab users +can tell which conversation pinged them. + +**Subagent gating**: `turn-completed` and `turn-error` from subagent tabs +(any tab with a `parentTabId`) are suppressed by default — a parent +agent that spawns 8 subagents would otherwise push 9 "Turn complete" +notifications per round. Toggle on "Include subagent tabs" in Settings +to opt in. `permission-required` is deliberately NOT gated: a +subagent's permission prompt still needs a human tap to proceed, so +suppressing it would silently hang the subagent. `agent-spawned` is +already top-level-only by construction. + +### Design notes + +- **Non-blocking**: `dispatcher.notify` does `void Promise.resolve(send(...)).catch(warn)`. + A slow or unreachable ntfy server never stalls a turn. Worst case is a + 10s per-request abort timeout in the transport. +- **Dedupe**: 5 s in-memory window keyed by `dedupeKey`. Used for + `permission-required` because the permission system rebroadcasts the + whole pending list on every change (we'd otherwise re-fire on every + unrelated mutation). +- **Master switch + per-event toggle**: both must allow before a send. + Disabled config is a fast no-op (no fetch, no `loadConfig` work past + the early return). +- **Single global config**: matches the rest of the codebase's settings + table (`perm_*`, `title_model_*` are also global). Stored as one JSON + blob under `settings.key = 'ntfy_config'`. +- **Auth token round-trip**: `GET /notifications` redacts the token but + surfaces `hasAuthToken: boolean`. `PUT /notifications` semantics: + `authToken === undefined` keeps the stored value, `""` clears it, any + other string replaces it. The frontend's "Clear stored token" button + uses the explicit-`""` path **and awaits the server response** before + flipping local UI state (post-review fix — see Review section). +- **Auth header scheme**: tokens that already start with a scheme + (`Bearer foo`, `Basic dXNlcjpwYXNz`) pass through verbatim; bare tokens + get a `Bearer ` prefix automatically. Lets users of private ntfy + servers use any HTTP auth scheme without code changes. +- **Topic-URL validation**: tightened to ntfy's actual constraints — + exactly one path segment, 1–64 chars, `[A-Za-z0-9_-]` only. Catches + topics that would silently 404 at publish time. +- **Header injection guard**: CR/LF and control chars are stripped from + every header value the transport writes (`Title`, `Tags`, `Click`, + `Authorization`). +- **Permission "added" detection**: `PermissionManager.broadcastPending` + now diffs the current pending-id set against an `announcedPromptIds` + set and fires `onPromptAdded` only for genuinely new ids. Resolved ids + are pruned. This keeps the contract "one notification per prompt". + +## Files changed / added + +``` +packages/core/src/notifications/types.ts (new) +packages/core/src/notifications/ntfy.ts (new) +packages/core/src/notifications/config.ts (new) +packages/core/src/notifications/dispatcher.ts (new) +packages/core/src/notifications/index.ts (new) +packages/core/src/index.ts (barrel re-export) +packages/core/tests/notifications/ntfy.test.ts (new, 22 tests) +packages/core/tests/notifications/config.test.ts (new, 10 tests) +packages/core/tests/notifications/dispatcher.test.ts (new, 13 tests) + +packages/api/src/permission-manager.ts (+ onPromptAdded contract) +packages/api/src/routes/notifications.ts (new — GET/PUT/POST routes) +packages/api/src/app.ts (wire dispatcher + mount routes) +packages/api/tests/permission-manager.test.ts (new, 4 tests) +packages/api/tests/routes.test.ts (add mocks for new core exports) + +packages/frontend/src/lib/components/SettingsPanel.svelte (new ntfy section) +``` + +Seven commits on `n2/ntfy-notifications`: + +``` +<new> docs: update HANDOFF.md with notifySubagents +9c93086 feat(notifications): add notifySubagents toggle to suppress subagent turn pings +4185789 docs: update HANDOFF.md with Gemini review triage + post-fix state +1870d0b fix(notifications): address Gemini review — tighten validation, sanitize Click, support Basic auth, non-optimistic UI clear +29bdd00 docs: add HANDOFF.md for ntfy notifications feature +786bc43 feat(frontend): ntfy.sh settings block in SettingsPanel +21cdb11 feat(api): wire notification dispatcher into app + /notifications routes +5e72191 feat(core): ntfy.sh notification dispatcher module +``` + +## Public surface added + +### New config (persisted in `settings` table) + +- `settings.key = "ntfy_config"` → JSON-serialized `NtfyConfig`: + ```ts + { + enabled: boolean, + topicUrl: string, + authToken: string, + events: { + "turn-completed": boolean, + "turn-error": boolean, + "permission-required": boolean, + "agent-spawned": boolean, + }, + notifySubagents: boolean, // default false; gates turn-* from subagent tabs + } + ``` + +### New API routes + +- `GET /notifications` → + `{ config: NtfyConfig & { hasAuthToken: boolean }, eventTypes: string[], defaults: NtfyConfig }` + (authToken is always returned as `""`; `hasAuthToken` reflects what's stored) +- `PUT /notifications` → accepts partial `NtfyConfig`. Validates topic URL + when `enabled === true`. Returns the saved (redacted) config or `400`. +- `POST /notifications/test` → sends a `turn-completed`-typed test + notification using the saved config. Returns `{ ok, status?, error? }`, + or `400` if disabled / invalid topic / event-type disabled, or `502` on + ntfy server failure. + +### New core exports (via `@dispatch/core` barrel) + +Types: `NotificationEvent`, `NotificationEventType`, `NtfyConfig`, +`NtfyPriority`, `NtfySendResult`, `FetchLike`, `DispatcherOptions`, +`AgentEventSource`, `PermissionPromptSource`, `TabTitleLookup`. + +Values: `NotificationDispatcher`, `sendNtfy`, `validateTopicUrl`, +`loadNtfyConfig`, `saveNtfyConfig`, `clearNtfyConfig`, +`normalizeNtfyConfig`, `defaultNtfyConfig`, `redactNtfyConfig`, +`NTFY_EVENT_TYPES`, `NTFY_DEFAULT_EVENTS`, `NTFY_DEFAULT_PRIORITIES`, +`NTFY_DEFAULT_TAGS`, `NTFY_CONFIG_KEY`. + +### New API surface on `PermissionManager` + +- `onPromptAdded(listener) => unsubscribe` — fires exactly once per + genuinely-new pending prompt id (with `{ id, permission, description, metadata }`). + +### New exported singleton in `packages/api/src/app.ts` + +- `notificationDispatcher: NotificationDispatcher` — already wired to + the module-level `agentManager` and `permissionManager`. Exposed so + tests / future callers can `dispose()` or `notify(...)` directly. + +### Frontend + +No new exported props — the change is entirely inside `SettingsPanel.svelte` +and uses its existing `{ keys, apiBase }` props. + +## Verification status + +### `bun run check` + +``` +$ biome check . +Checked 150 files in 158ms. No fixes applied. +``` + +✅ Pass (0 errors, 0 warnings). + +### `bun run test` + +``` +Test Files 28 passed (28) + Tests 451 passed (451) + Duration 2.85s +``` + +✅ Pass. Baseline was 393 tests in 24 files; this branch adds 58 tests +across 4 new files (`notifications/ntfy.test.ts` ×22, +`notifications/config.test.ts` ×13, `notifications/dispatcher.test.ts` ×19, +`permission-manager.test.ts` ×4) and modifies 0 existing tests. + +### Per-package strict typecheck + +``` +@dispatch/core tsc --noEmit — 0 errors +@dispatch/api tsc --noEmit — 0 errors +@dispatch/frontend svelte-check — 0 errors, 0 warnings +``` + +### Manual smoke test + +Verified end-to-end against the real `ntfy.sh` server twice (initial +build + post-review): + +``` +$ bun -e 'import { sendNtfy } from "./packages/core/src/notifications/ntfy.js"; ...' +validate: null +{"ok":true,"status":200} +``` + +Topics were throwaway and only used for these smoke tests. Full UI flow +(Settings → topic URL → Save → Send test → push lands in ntfy app) +was not executed because that requires a live `bun run dev:api` plus +`dev:frontend` plus a phone with the ntfy app — but the same code path +that the "Send test" button exercises is what the smoke test above hit, +and the route logic on top of it is covered by unit tests. + +## Second-opinion review (Gemini) + +After the initial implementation I ran a broad, open-ended code review +via `gemini-3-flash-preview` in YOLO mode (read-only, instructed to find +bugs/flaws/edge-cases). Findings were triaged as follows: + +| # | Severity | Finding | Action | +|---|----------|---------|--------| +| 1 | Medium | "Duplicate error+done notifications on LLM fallback retries" | **Rejected.** Verified against `agent-manager.ts:1525–1532` and `:1611`: inner per-attempt errors set `attemptError` and `break` out of the stream loop; only the final terminal error is `this.emit({type:"error"})`-ed. The dispatcher cannot see intermediate retry errors. | +| 2 | Medium | "Loose topic URL validation" | **Fixed.** `validateTopicUrl` now enforces one segment + `[A-Za-z0-9_-]{1,64}`. | +| 3 | Low | "`Click` header not sanitized" | **Fixed.** Passed through `sanitizeHeader`. | +| 4 | Low | "Use JSON publishing for UTF-8 safety in headers" | **Deferred.** Per ntfy docs UTF-8 in `Title` is supported and works in practice; the only realistic risk is non-ASCII tab titles being mangled by an exotic intermediate proxy. Switching to JSON-body publishing would re-architect the transport (and invalidate all header-shape tests) for a hypothetical edge case. Worth a follow-up if anyone reports mangled titles. | +| 5 | Low | "Optimistic UI clear of auth token" | **Fixed.** `clearNtfyAuthToken` now awaits the response and only mutates local state on success; failures surface via the existing error banner. Added a `ntfyClearingToken` loading flag so the button disables + spins during the request. | +| 6 | Nit | "DB read on every notify()" | **Skipped.** SQLite reads are sub-millisecond, events are human-scale infrequent (one per turn at most, dominated by an LLM round-trip taking seconds). Cache adds invalidation complexity for no measurable win. | +| OQ | — | "Support Basic auth, not just Bearer" | **Fixed.** Tokens with a scheme prefix already (`Bearer xyz`, `Basic dXNlcjpwYXNz`) now pass through verbatim; bare tokens still get the `Bearer ` prefix. | + +The other open questions: click-URL deep link is still a known gap +below; subagent noise was follow-up-fixed in commit 9c93086 (the +`notifySubagents` flag, off by default). + +## Assumptions / known gaps + +Decisions made without product input (the spec said "ask if ambiguous"; +each of these felt unambiguous in the context of Dispatch's current +single-user, single-process design): + +1. **Single global config, not per-user.** The existing `settings` table + is global (e.g. `title_model_*`, `perm_*` — all single-tenant). When + Dispatch grows real multi-tenancy this'll need a `user_id` column and + a load-by-user helper, but that's a much bigger refactor than this + feature. + +2. **Auth token persisted in plain text.** Same as the existing + `credentials` / `api_keys` tables in this DB; SQLite at-rest + encryption isn't a thing in this codebase. Token never leaves the + DB on the read path (`GET /notifications` redacts). + +3. **No rate-limiting or burst grouping** beyond the 5 s permission + dedupe. Notification-worthy events are human-scale infrequent (one + per turn, one per permission prompt). If someone hammers `summon` and + ships 50 user agents in 10 seconds, they'll get 50 pushes — that + matches "agent-spawned is off by default" being the right call. + +4. **No click-URL deep-link to the originating tab.** The frontend + doesn't currently route tabs by URL (`router.svelte.ts` just toggles + between `dashboard` and `agent-builder`), so I left `clickUrl` + plumbing in the transport layer (now sanitized as of the Gemini-fix + commit) but didn't synthesize one in the dispatcher. A future "open + this tab" router change would make this a 4-line addition in + `buildTurnCompleted` / etc. + +5. **Event taxonomy is intentionally small.** I considered `model-changed` + and `queue-overflow`/`auto-wake-budget-exhausted` notices but they + felt like "annoying push" rather than "useful push"; easy to add + later by extending `NotificationEventType`, `NTFY_DEFAULT_EVENTS`, + and adding a builder + dispatch hook. + +6. **No topic validation client-side.** The Settings field accepts any + non-empty string as the topic name and the transport posts to + `https://ntfy.sh/<topic>` (URL-encoded). Earlier revisions enforced + ntfy.sh's documented `^[A-Za-z0-9_-]{1,64}$` rule, but the project + relaxes those rules over time (issue #1451) and a regex here just + locks users out of valid configurations. The server is the final + authority; any rejection surfaces through the "Send test" button or + the first real notification. + +7. **Header-based publishing (vs. JSON-body publishing).** Per the + Gemini-review triage above, the transport sends `Title`/`Tags`/`Click` + as HTTP headers. UTF-8 titles work against ntfy.sh itself; non-ASCII + tab titles through an exotic intermediate proxy could theoretically + be mangled. Switching to ntfy's JSON publish mode (`Content-Type: + application/json`, body `{topic, message, title, ...}`) would side- + step that entirely — leaving as a follow-up if anyone hits it. + +Working tree is clean; seven commits on `n2/ntfy-notifications`; nothing +merged. + +## Update — topic-only input (post-merge of this branch's seventh commit) + +The `topicUrl` field was replaced with `topic`. The user now enters just +the ntfy topic name (e.g. `my-secret-topic`); the transport always posts +to `https://ntfy.sh/<topic>`. `validateTopicUrl` is gone — only an empty +check remains (server-side, and only when `enabled === true`). This +eliminates the "string does not match the expected pattern" error users +hit when entering a bare topic. Tests, the `/notifications` PUT route, +the persisted JSON shape, and the SettingsPanel UI were updated together. +Also fixed a small pre-existing bug: the `/notifications` PUT handler now +honours `notifySubagents` on save (previously it was silently dropped +because the field wasn't passed to `normalizeNtfyConfig`). diff --git a/notes/wake-schedule-handoff.md b/notes/wake-schedule-handoff.md new file mode 100644 index 0000000..3ea711a --- /dev/null +++ b/notes/wake-schedule-handoff.md @@ -0,0 +1,442 @@ +# Claude Reset / Wake Schedule — Fix Handoff + +**Branch:** `r1/claude-reset-fix` (off `dev`) +**Worktree:** `/home/tradam/projects/dispatch/r1-claude-reset-fix` +**Commits:** 4 atomic commits (see `git log r1/claude-reset-fix ^dev --oneline`). + +--- + +## Summary + +The "Claude Wake Schedule" panel (`ClaudeReset.svelte` + `/models/wake-schedule*` +routes + the in-memory backend scheduler) had several real bugs that would +silently lose wakes, drift over time, or behave erratically in the UI. It +also only probed once per marked hour, which is unreliable at rate-window +edges. + +This branch fixes the bugs *and* upgrades probing to 4× per hour +(`:00 / :15 / :30 / :45`), with same-tick coalescing so the upstream still +sees a single call. Schema changed destructively (per direction); migration +code drops the old `wake_schedule` table. + +### What was broken + +#### Backend (`packages/api/src/routes/models.ts`) +1. **Missed wakes silently lost.** `loadScheduleFromDB` saw any past + `next_wake_at`, rewrote it to the next occurrence using server-local TZ, + and never fired the missed wake. So if the API was down when a wake was + due (overnight container restart), the user lost it entirely. +2. **Server-TZ drift.** `nextOccurrenceAt15(hour)` used `new Date().setHours()` + — *server* local time. The client sends absolute Unix ms (user's local + wall-clock intent). On a UTC Docker host running for a PST user, each + reschedule re-anchored to the wrong TZ and slowly migrated the fire time. +3. **Retry storm.** Every failed wake pushed a new entry into a + `pendingRetries[]` array, all converging at the same `+5min` instant. +4. **Retry/fire race within a tick.** A freshly fired wake AND a due retry + could both hit `wakeAllClaudeAccounts()` back-to-back. +5. **No status surface.** Nothing told the user whether scheduled wakes + actually succeeded. +6. **Only one probe per hour.** A single fire at `:15` can land 14 min off + the actual rate-window reset moment. + +#### Frontend (`packages/frontend/src/lib/components/ClaudeReset.svelte`) +7. **`fadedHours` returned a function, not a Set.** The `$derived` had shape + `(): Set<number> => {...}` — `blockClass` then called `fadedHours()` + once per of the 24 buttons, rebuilding the Set 24× per render. +8. **`currentHour` was frozen.** `const currentHour = $derived(new Date().getHours())` + — `new Date()` is not a reactive read; the value never updated. After + midnight (or any hour boundary) the "now" highlight stayed on the wrong + block until reload. +9. **Out-of-order toggles.** Rapid double-clicks fired multiple requests; + the *last response* won, not the *last click* — so a slow add followed + by a fast remove could land in the wrong order. +10. **No success/failure feedback.** No surface for whether the most-recent + wake actually worked. + +### What I changed + +| # | Bug / Feature | Fix | +|---|---|---| +| 1 | Missed wake silently lost | New `recoverScheduleEntry()` helper: if missed by ≤ 2h fire on next tick; either way roll forward by 24h-multiple steps. | +| 2 | Server-TZ drift | Removed server-local `nextOccurrenceAt15`; rescheduling now uses `nextDailyAfter(previous, now)` — adds 24h × N from the *client-supplied* original ms. | +| 3 | Retry storm | Replaced `pendingRetries: []` with a single shared `pendingRetry: PendingRetry \| null` whose budget resets on subsequent failures. | +| 4 | Retry/fire race | Retry processing skipped on any tick where a fresh wake fired. | +| 5 | No status surface | `GET /wake-schedule` now returns `{ schedule, resetOffsetHours, probeSlotMinutes, lastWake, pendingRetry }`. | +| 6 | One probe/hour | A marked hour expands to 4 slots (`:00 :15 :30 :45`), each its own row. Multiple due slots in the same tick coalesce into one upstream wake. | +| 7 | `fadedHours` was a fn | Now `$derived.by(() => Set)`; passed as a value to `blockClass`. Window length is `resetOffsetHours - 1` (no longer hardcoded 4). | +| 8 | Frozen `currentHour` | Backed by `nowMs = $state(Date.now())`, bumped every 30s via `setInterval`, cleaned up in `onDestroy`. | +| 9 | Out-of-order toggles | Per-hour sequence counter (`inFlightSeq`) + `pendingHours: Set<number>` that disables in-flight buttons; stale responses dropped. | +| 10 | No feedback | New status row: "✓ Last wake N min ago" or "✗ Last wake N min ago — <error>"; pending retry row shows retries-left + next-attempt countdown. | + +Also extracted `CLAUDE_RESET_OFFSET_HOURS = 5` and `PROBE_SLOT_MINUTES = [0,15,30,45]` +to a single source of truth in `packages/api/src/wake-scheduler.ts`; the +frontend learns both from the server snapshot. + +--- + +## Files changed + +- **New:** `packages/api/src/wake-scheduler.ts` — pure helpers + (`nextDailyAfter`, `recoverScheduleEntry`, `resetHourFor`, + `isProbeSlotMinute`, `CLAUDE_RESET_OFFSET_HOURS`, + `MISSED_WAKE_GRACE_MS`, `DAILY_INTERVAL_MS`, `PROBE_SLOT_MINUTES`, + `ProbeSlotMinute`). +- **New:** `packages/api/tests/wake-scheduler.test.ts` — 12 unit tests for + the pure helpers (grace boundaries, multi-day skip, custom grace, + midnight wraparound). +- **Modified:** `packages/api/src/routes/models.ts` — full rewrite of the + wake-scheduler section (~280 LoC). Routes preserved + (`POST /models/wake`, `POST /models/wake-schedule/toggle`, + `GET /models/wake-schedule`) but request/response payloads expanded. +- **Modified:** `packages/api/tests/routes.test.ts` — +12 HTTP tests for + the wake-schedule routes (was +9 in the prior commit; rewritten for + the 4-slot payload). +- **Modified:** `packages/core/src/db/index.ts` — `wake_schedule` schema + changed; destructive migration drops the old table if the + `slot_minute` column is missing. Nothing else touched. +- **Modified:** `packages/frontend/src/lib/components/ClaudeReset.svelte` + — full rewrite of the script section; markup updated for the marked-hour + summary + the status footer. + +--- + +## Public surface changes + +### Database schema + +```sql +-- BEFORE +CREATE TABLE wake_schedule ( + hour INTEGER PRIMARY KEY CHECK (hour BETWEEN 0 AND 23), + next_wake_at INTEGER NOT NULL +) + +-- AFTER +CREATE TABLE wake_schedule ( + hour INTEGER NOT NULL CHECK (hour BETWEEN 0 AND 23), + slot_minute INTEGER NOT NULL CHECK (slot_minute IN (0, 15, 30, 45)), + next_wake_at INTEGER NOT NULL, + PRIMARY KEY (hour, slot_minute) +) +``` + +Migration on boot: if `PRAGMA table_info(wake_schedule)` lacks a +`slot_minute` column, `DROP TABLE IF EXISTS wake_schedule` then `CREATE` +the new shape. **No other table is touched** (credentials, api_keys, +usage_cache, tabs, chunks, settings preserved). + +### API: `GET /models/wake-schedule` + +```json +{ + "schedule": { + "9": { "0": 1700001500000, "15": 1700002400000, "30": 1700003300000, "45": 1700004200000 } + }, + "resetOffsetHours": 5, + "probeSlotMinutes": [0, 15, 30, 45], + "lastWake": { + "firedAt": 1700000000000, + "ok": true, + "results": [{ "label": "personal", "ok": true }] + } | null, + "pendingRetry": { + "retriesLeft": 5, + "nextRetryAt": 1700000300000, + "reason": "scheduled probe(s) 9:15" + } | null +} +``` + +### API: `POST /models/wake-schedule/toggle` + +**Add** (when hour is not yet marked): + +```json +{ + "hour": 9, + "timestamps": { "0": 1700001500000, "15": 1700002400000, "30": 1700003300000, "45": 1700004200000 } +} +``` + +All four slot keys are required; each value must be a future Unix ms. +Returns the same expanded snapshot as `GET`. + +**Remove** (when hour *is* marked): + +```json +{ "hour": 9 } +``` + +Same shape as before. Deletes all 4 slots for that hour atomically. + +Validation: hour must be an integer 0–23. Non-integer / out-of-range +hours → 400. Missing or non-object `timestamps` on add → 400. Missing, +non-finite, or past timestamp in any slot → 400. + +### Component props + +`ClaudeReset.svelte` props unchanged: still `{ apiBase?: string }`. + +### New exported helpers + +`packages/api/src/wake-scheduler.ts` exports `nextDailyAfter`, +`recoverScheduleEntry`, `resetHourFor`, `isProbeSlotMinute`, +`CLAUDE_RESET_OFFSET_HOURS`, `DAILY_INTERVAL_MS`, +`MISSED_WAKE_GRACE_MS`, `PROBE_SLOT_MINUTES`, `ProbeSlotMinute`, +`RecoveredEntry`. Not re-exported from `@dispatch/core` — they live in +`@dispatch/api` and aren't intended cross-package surface. + +--- + +## End-to-end traces + +### Happy path: mark 9 AM +1. User clicks the "9" AM block. +2. Frontend computes 4 timestamps in **user's local TZ** for the next + occurrence of `9:00`, `9:15`, `9:30`, `9:45`. +3. `POST /models/wake-schedule/toggle { hour: 9, timestamps: { "0":…, "15":…, "30":…, "45":… } }`. +4. Backend writes 4 rows to `wake_schedule`, returns snapshot. Button + turns primary, +4 trailing blocks fade. +5. Tick loop runs every 30s. At `9:00` the `0`-minute slot becomes due; + the tick advances its `next_wake_at` to tomorrow `9:00`, persists, + and fires *one* coalesced wake. Same dance at 9:15, 9:30, 9:45 — + each is a separate upstream call (different 15-min windows). +6. If two slots happen to come due in the *same* 30s tick (e.g. the + scheduler was paused), they coalesce into ONE upstream wake. + +### Recovery: API was down when 9:15 fired +1. Server boots at 11:00. Reads 4 rows for hour 9. The `9:00`, `9:15`, + `9:30`, `9:45` slots all have `next_wake_at` ≤ now, all overdue by + ≤ 2h → all `shouldFireNow: true`. +2. Each slot's `next_wake_at` is advanced to tomorrow's equivalent + wall-clock via `nextDailyAfter`. The boot-fire flag is set. +3. First tick runs immediately, sees `needsBootFire`, fires ONE coalesced + wake. `lastWake` shows ✓ on the panel. + +### Recovery: API was down for two days +1. Server boots Wed at 14:00. Slots for Mon 9:00/15/30/45 are overdue by + > 48h → `shouldFireNow: false`. +2. Each `nextWakeAt` jumps forward by `nextDailyAfter` (ceil-div, single + step — not a 48-iteration loop) to Thu 9:00/15/30/45. +3. Schedule preserved; no spurious wake; entry resumes normally. + +### Rapid double-click +1. User clicks "9" → request A (add, seq=1) in flight; button disabled. +2. User clicks "9" again → request B (remove, seq=2) in flight. +3. Response B arrives → `inFlightSeq[9] === 2` → applied. +4. Response A arrives later → `inFlightSeq[9] !== 1` → dropped. + +--- + +## Verification + +### `bun run check` +``` +$ biome check . +Checked 142 files in 155ms. No fixes applied. +``` + +### `bun run test` +``` +Test Files 25 passed (25) + Tests 417 passed (417) + Start at 09:52:16 + Duration 2.80s +``` + +(Was 393 tests at branch base; +24 net = 12 helper unit tests + 12 HTTP +route tests for the 4-slot wake schedule.) + +### TypeScript strict checks +- `bun --bun tsc -p packages/api/tsconfig.json --noEmit` → exit 0 +- `bun --bun tsc -p packages/core/tsconfig.json --noEmit` → exit 0 +- `bun run --cwd packages/frontend typecheck` → svelte-check 0 errors, 0 warnings + +--- + +## Assumptions / known gaps + +1. **TZ behavior:** absolute `timestamp` from the toggle request is the + source of truth for *first* fire. On reschedule the slot advances by + exactly 24h × N from its previous `next_wake_at`. Preserves the user's + local wall-clock intent regardless of server TZ. **DST transition days + can drift the fire by ±1h**; self-corrects when the user next toggles + the hour. A more thorough fix would store hour + IANA TZ and recompute + each cycle — punted; requires UI for TZ selection. + +2. **Missed-wake grace = 2h.** Picked because Claude's typical session + window is ~5h. Tunable in `wake-scheduler.ts:MISSED_WAKE_GRACE_MS`; + `recoverScheduleEntry` accepts a custom value (exercised in tests). + +3. **Same-tick coalescing.** Hitting `:00 :15 :30 :45` produces 4 wakes + per hour at steady state. Two slots due in the *same* 30s tick + coalesce into one upstream call — there's no value in 2 simultaneous + probes. The advancement-then-fire ordering means a slow upstream + call can't cause re-firing on the next tick. + +4. **"Reset" semantics:** I interpreted the "Reset by HH:00" label as a + display hint (wake + 5h ≈ when Claude's session window resets), not + a separate event. The scheduler only fires *wakes*. The `+5h` + constant lives in `CLAUDE_RESET_OFFSET_HOURS` if it ever needs + changing. + +5. **Recurring daily.** Matches prior behavior; no UI for one-shot + wakes. + +6. **`nowMs` ticker = 30s on the frontend.** Current-hour ring updates + within at most 30s of the hour boundary. Status "X min ago" labels + refresh at the same cadence. + +7. **Retry budget = 6 × 5min = 30min.** Unchanged from before, just + consolidated to a single shared slot. + +8. **Snapshot polling:** frontend refreshes the snapshot on mount and + after toggles. `lastWake` / `pendingRetry` rows are therefore stale + between user actions; the displayed *relative* timestamps DO refresh + live (driven by the same `nowMs` ticker). Adding a 30s poll would be + a one-liner if desired — left off to avoid quiet background traffic + for a panel that's typically only opened intentionally. + +9. **Destructive migration.** Per direction: no back-compat. Any + existing rows in `wake_schedule` from before this branch are dropped + on first boot. Users will need to re-mark their hours. No other + tables are touched. + +10. **No backend test for `loadScheduleFromDB` recovery branch.** The + module-level scheduler state is initialized at import time; covering + the boot-path recovery from a Vitest module requires either DI for + the DB or spinning up real SQLite. I covered the pure logic via + `recoverScheduleEntry` unit tests and the route surface end-to-end + via HTTP tests. A follow-up could refactor `loadScheduleFromDB` to + take a `db` parameter and write a fixture-backed integration test. + +--- + +## Review followup (Gemini review pass — `notes/claude-reset-review.md`) + +A Gemini code-review pass after the initial 4-slot work surfaced 3 real +bugs (2 High, 1 Medium) and 2 nits. All are now fixed on this branch. + +| # | Sev | Where | Symptom | Fix | +|---|---|---|---|---| +| 1 | High | `models.ts` POST toggle | `raw <= now` rejected legitimate toggles whenever client clock skew or request latency made an imminent slot land in the past → 400 → UI toggle silently fails | Dropped the `<= now` check; kept `Number.isFinite`. The scheduler's `recoverScheduleEntry` already fires within `MISSED_WAKE_GRACE_MS` and rolls forward. | +| 2 | High | `ClaudeReset.svelte` | Per-hour `inFlightSeq` couldn't stop an older snapshot from clobbering a newer one when the two requests covered *different* hours (or initial-load racing a click). `applySnapshot` replaces the whole `schedule` → newest click vanishes. | Replaced per-hour counter with a single global `SnapshotSequencer` (`src/lib/snapshot-sequencer.ts`) used by `loadFromServer` AND `postToggle`. Older `accept(seq)` calls return false and are dropped. | +| 3 | Med | `models.ts` `persistSchedule` | `DELETE` + N `INSERT`s with no transaction; an insert failure left the table empty (DELETE already committed) → schedule silently wiped on next boot, error swallowed. | Wrapped both in `db.transaction(...)`. On failure the DELETE rolls back and the previously persisted snapshot stays intact. | +| 4 | Nit | `ClaudeReset.svelte` | `inFlightSeq` was effectively dead code for user clicks (the `pendingHours.has(hour)` early-return blocks them) but still mattered for initial-load vs first-click. | Subsumed by the new global `SnapshotSequencer` (cleaner than two parallel mechanisms). | +| 5 | Nit | `models.ts` `schedulerTick` | Boot-recovery `reason` was masked whenever boot recovery + due slots coincided in the same tick. | Capture `bootFireRequested` before clearing the flag and append `" (boot recovery)"` to the reason. | + +### Files added/changed in the followup + +- **New:** `packages/frontend/src/lib/snapshot-sequencer.ts` — 47 LoC, the + reusable "most-recent request wins" race guard. Pure class, no Svelte + deps; usable from any component that fans out snapshot-style HTTP calls. +- **New:** `packages/frontend/tests/snapshot-sequencer.test.ts` — 8 unit + tests covering the core race, the initial-load-vs-click race, monotonic + ordering, equal-seq idempotency, and the watermark inspector. +- **New:** `notes/claude-reset-review.md` — the original review (kept for + audit trail). +- **Modified:** `packages/api/src/routes/models.ts` — fixes #1, #3, #5 + (~30 LoC delta). +- **Modified:** `packages/frontend/src/lib/components/ClaudeReset.svelte` + — fix #2 / nit #4 (~20 LoC delta). +- **Modified:** `packages/api/tests/routes.test.ts` — replaced the + "POST toggle rejects past timestamp" test with two new tests: + - "POST toggle ACCEPTS a slightly-past timestamp (clock skew / latency)" + regression-guards finding #1. + - "POST toggle rejects NaN / Infinity / non-number slot values" guards + that we still reject *malformed* inputs. + - Plus "snapshot remains consistent across toggle round-trips" guards + finding #3 (the transactional persist path). + +### Verification (after followup) + +``` +$ bun run check +Checked 144 files in 167ms. No fixes applied. + +$ bun run test +Test Files 26 passed (26) + Tests 427 passed (427) + +$ bun run --cwd packages/frontend typecheck +svelte-check found 0 errors and 0 warnings +``` + +### Still deferred (not addressed in followup) + +These were noted in the review as design pushback rather than bugs and +remain as documented in §"Assumptions / known gaps" above: + +- **Snapshot polling.** UI may show stale "Retrying…" forever if a retry + eventually succeeds in the background without user interaction. Adding + a slow 60s poll is still a one-liner; left off to avoid quiet background + traffic. (Documented in gap #8.) +- **DST drift.** Adding `24h` to an absolute Unix ts ignores DST + transitions; documented in gap #1. + +--- + +## Review followup — Round 2 (Gemini review pass — `notes/claude-reset-review-2.md`) + +After the round-1 fixes shipped, a second Gemini review pass surfaced one +**Critical** and one **High** finding which together exposed a real desync +hazard: the round-1 SnapshotSequencer only protected against RESPONSE +reordering, but the toggle endpoint was vulnerable to REQUEST reordering, +and the toggle endpoint itself ignored client intent — combining into +"clicks feel inverted" when the UI got desynced. + +Both are now fixed on this branch. + +| # | Sev | Where | Symptom | Fix | +|---|---|---|---|---| +| R2-1 | Critical | `ClaudeReset.svelte` + `snapshot-sequencer.ts` | If two concurrent toggle POSTs reorder on the WIRE (B reaches server first), the server's truer post-A snapshot carries the OLDER client seq → SnapshotSequencer discards it as stale → UI permanently desyncs from server. Round-1's per-hour-counter fix was replaced with a global sequencer but BOTH had this blind spot. | Replaced per-hour `pendingHours: Set<number>` with a single global `pendingHour: number \| null` mutation lock. While any POST is in flight, ALL toggle buttons are disabled — mutations are now serialized on the client, so the server never sees two concurrent toggle requests. Sequencer retained for the GET-on-mount vs first-click race (which the global lock doesn't cover). | +| R2-2 | High | `models.ts` POST `/wake-schedule/toggle` | Server decided add-vs-remove from its own in-memory state instead of an explicit request field. Any UI desync (from R2-1 or any future cause) → user clicks to turn on an hour the UI shows off, server sees it on, deletes it → "click inverted" UX, recoverable only by reload. | Toggle endpoint now requires explicit `action: 'on' \| 'off'`. Idempotent: `'off'` on already-off is a no-op success; `'on'` on already-on REPLACES timestamps (so a recovering UI can re-assert wall-clock intent without a delete-then-add round trip). Missing/invalid action → 400. | +| R2-3 | Low (deferred) | `models.ts` `wakeAllClaudeAccounts` / `processPendingRetry` | If 1 of N accounts fails a probe, the 6 × 5min retry loop re-probes ALL accounts (including the ones that already succeeded). Wastes bandwidth, but the probe payload is tiny (~16 tok) and the constant 30-min budget caps the blast radius. | **Not fixed** — explicit deliberate trade-off; per-account success tracking inside `PendingRetry` would meaningfully complicate the retry path for marginal savings. Noted in §"Assumptions / known gaps". | + +### Files changed in round 2 + +- **Modified:** `packages/api/src/routes/models.ts` — toggle endpoint + rewritten to require explicit `action` (`+27 / -6` LoC, idempotency rules + documented inline). +- **Modified:** `packages/api/tests/routes.test.ts` — `toggle()` helper + auto-derives `action` from `timestamps` presence so the existing 12 tests + stayed terse; one test (`POST toggle rejects missing timestamps on add`) + was renamed to `rejects action='on' with missing timestamps` and now + passes `action` explicitly. **+4 new contract tests** (29 / 29 routes + tests pass): + - `POST toggle requires explicit action: 'on' | 'off'` (rejects missing + action, rejects non-`'on'/'off'` strings/numbers/`null`). + - `POST toggle action='off' is idempotent on an already-off hour`. + - `POST toggle action='on' on an already-on hour REPLACES timestamps` + (the recovery-from-desync scenario). + - `POST toggle action='off' ignores timestamps payload`. +- **Modified:** `packages/frontend/src/lib/components/ClaudeReset.svelte` + — `pendingHours: Set<number>` → `pendingHour: number | null`; all 4 row + buttons gated by the global lock; `toggleHour` derives `action` from + local state; `postToggle` sends it on the wire. Per-hour + `cursor-wait` class is preserved for the in-flight hour as a UX cue. +- **New:** `notes/claude-reset-review-2.md` — the round-2 review (kept + for audit trail, parallel to `notes/claude-reset-review.md`). + +### Verification (after round-2 followup) + +``` +$ bun run check +Checked 144 files in 161ms. No fixes applied. + +$ bun run test +Test Files 26 passed (26) + Tests 431 passed (431) + +$ bun run --cwd packages/frontend typecheck +svelte-check found 0 errors and 0 warnings +``` + +(`+4` tests vs round 1: the four explicit-action contract tests.) + +### What's NOT addressed in round 2 + +- **DST drift** — unchanged design trade-off (gap #1). +- **No snapshot polling** — unchanged design trade-off (gap #8). +- **R2-3 (retry storm re-probes succeeded accounts)** — deliberate trade- + off, see table above. |
