|
When the upstream LLM API returns a retryable error (HTTP 429 / 5xx
"overloaded"), the kernel now retries provider.stream() with a stepped
backoff, visibly, until the 8h cumulative-sleep budget is exhausted — then
emits the final error and seals the turn. Retries fire only when no content
was emitted yet this step (safety invariant: never duplicate partial output).
- wire: new transient TurnProviderRetryEvent AgentEvent variant (emitted
before each sleep; not persisted to model history).
- kernel contracts: RetryStrategy (pure delayFor + injected sleep) + optional
retry? on RunTurnInput (omit = no retry, backward-compatible).
- kernel run-turn: retry loop in executeStep; providerRetryEvent constructor.
Kernel imports no timer (sleep injected).
- session-orchestrator: concrete schedule (5s..30m, repeat 30m, 8h budget) +
abortable setTimeout sleep, wired into RunTurnInput.retry.
tsc -b EXIT 0; biome clean; 1574 vitest pass (+16 new: 11 kernel retry tests
with injected fake sleep + pure delayFor, zero @dispatch/* mocks; 5 schedule
tests). Transports unchanged (transport-ws forwards AgentEvent verbatim in
chat.delta; transport-http is generic JSON.stringify).
Plan: notes/retry-with-backoff-plan.md. tasks.md updated with milestone +
optional CLI-renderer roadmap follow-up.
|