| Age | Commit message (Collapse) | Author |
|
reconcile drops thinking-only messages
Root cause of tool-calls-in-thinking bug: append() assigns msgIdx as a LOCAL
index (reset to 0 per call), but load() grouped chunks by msgIdx alone. Since
the orchestrator persists messages one at a time (append([user]) at turn start,
then append([assistant, ...toolResults]) per step), all single-message appends
share msgIdx=0 and collapse into one giant user-role message. The model loses
its prior assistant responses and tool-call history, falls back to text-based
tool-call syntax inside reasoning_content, and the turn ends with finish_reason
stop (no structured tool_calls detected).
Fix 1 (store.ts load()): split message boundaries on role change too, not just
msgIdx. Handles the alternating user/assistant/tool pattern correctly.
Fix 2 (reconcile.ts hasContent): include thinking chunks as valid content so
thinking-only assistant messages are not silently dropped on load. The buggy
seq-14 output (assistant, thinking-only) was being deleted by reconcile,
destroying evidence of the bug.
Verified: load() on the affected conversation now produces 9 correct messages
(was 3 merged). All 1999 tests pass. See notes/tool-call-in-thinking-bug.md.
|
|
|
|
reconcile() only repaired orphaned tool-calls. Two other broken states made
chats uncontinuable, and load() had no parse-error guard:
- A trailing assistant message whose only chunk is 'error' (a failed-
generation marker) serializes to empty content -> provider rejects/empty
-> chat never continues. 6 of 140 production conversations were stuck.
- A tool-call whose input is a raw malformed-JSON string (model emitted
broken JSON) re-sent as OpenAI arguments -> provider 400s on every
continuation (the 77574596 break).
- load() JSON.parse had no try/catch -> one corrupt row bricked the chat.
Fix = read-time repair (no DB surgery; append-only preserved). reconcile
runs on every load() BEFORE any provider sees messages, so Layer 1
protects ALL providers.
Layer 1 (conversation-store reconcile): strip error chunks from assistant
messages + drop the now-empty error-only messages (safe: never followed by
a tool message); orphaned-tool-call synthesis unchanged; ReconcileReport
+2 additive counts. loadSince (FE reads) intentionally unreconciled so the
user still SEES the error. load() wraps JSON.parse in try/catch (skip
corrupt rows).
Layer 2 (openai-stream): serializeToolArguments ensures tool-call
arguments is always valid JSON (malformed string -> fallback object),
neutralizing already-stored malformed args.
Layer 2 equiv (../claude provider-anthropic): safeJson returns a valid
object fallback on parse failure, not the raw string. (Separate repo.)
Live-verified: reproduced 77574596's real broken tail in the dev DB;
POST /chat continued it cleanly (no 400, model replied) — the provider
accepted the reconciled history.
tsc -b EXIT 0, biome clean, 1453 vitest pass.
|
|
Load-time history repair was invisible (createConversationStore got no logger).
Now: optional logger injected (extension passes host.logger); reconcile logic
moved into pure reconcileWithReport() returning a ReconcileReport (reconcile()
stays a thin byte-identical wrapper); load() emits a reconcile.repair span
(childed with conversationId, flat attrs repairedCount/firstRepairedToolCallId)
ONLY when a real repair occurs. No contract fan-out (factory is package-internal).
typecheck EXIT 0, biome clean, 550 vitest (+4) + 89 bun.
|
|
tool calls
Expose a per-step grouping key so a client can render a model's batched/parallel
tool calls (those emitted in one step) as one unit, on both the live stream and
replayed history. Key = branded StepId, derived turnId#stepIndex (0-based).
- [email protected]: required stepId on Turn{Tool,ToolResult}Event; optional stepId on
Tool{Call,Result}Chunk (generation provenance on the chunk, not the StoredChunk
envelope — StoredChunk unchanged). [email protected] (re-export bump).
- kernel-runtime: mint stepId per step; stamp on tool chunks + tool events.
- conversation-store: chunk-carried stepId round-trips append/load/loadSince for
free; reconcile copies it onto synthesized (interrupted) results.
- cli: stepId added to event test fixtures (renderer unchanged).
typecheck clean; 509 vitest + 89 bun; biome 0/0. FE courier reply + reference
snapshots regenerated in ../dispatch-web.
|
|
StorageNamespace + pure reconcile (16 tests)
|