diff options
| author | Adam Malczewski <[email protected]> | 2026-06-27 21:32:19 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-06-27 21:32:19 +0900 |
| commit | f9d1ca533ad2c5d71a3bc349934d54c09de305bf (patch) | |
| tree | 1c4605b51001af7ef76f85eddfbcf00106382c3d | |
| parent | 05ff2566eba1b4f22e2a2c22c7bd408150b0a60a (diff) | |
| download | dispatch-f9d1ca533ad2c5d71a3bc349934d54c09de305bf.tar.gz dispatch-f9d1ca533ad2c5d71a3bc349934d54c09de305bf.zip | |
fix(lsp): bound pushDiagnostics cache — evict oldest entries for unopened background files
| -rw-r--r-- | ai-review-report.md | 106 | ||||
| -rw-r--r-- | packages/lsp/src/diagnostics.test.ts | 73 | ||||
| -rw-r--r-- | packages/lsp/src/diagnostics.ts | 34 |
3 files changed, 129 insertions, 84 deletions
diff --git a/ai-review-report.md b/ai-review-report.md index 2f52074..570ba05 100644 --- a/ai-review-report.md +++ b/ai-review-report.md @@ -1,92 +1,32 @@ -# LSP Subsystem Review +# LSP Fixes Verification Report ## Executive Summary +The fixes implemented in `feature/lsp-fixes` successfully address the immediate crashes and the primary source of the 9.5 GB memory leak. The optional chaining fix, `fs.watch` error listener, bounded LRU document cache, and `initialize` timeout propagation are all correctly designed and do not introduce regressions. However, the memory leak fix for the diagnostics cache is **incomplete**: while actively opened documents are properly managed, diagnostics pushed for *unopened* background files will still accumulate and leak memory over time. -The production `dispatch.service` is destabilizing due to three distinct issues in the LSP subsystem: -1. **Crash Type 1 (The "Defence-in-Depth" TypeError):** The original `SyntaxError` from malformed JSON has been mitigated by a `try/catch` in `rpc.ts`, but the subsequent "defence-in-depth" boundary added in `client.ts` introduced a fatal `TypeError` via improper optional chaining. -2. **Crash Type 2 (Unhandled Watcher ENOENT):** The recursive `node:fs` watcher (`extension.ts`) throws an uncaught exception when it encounters deleted transient files (e.g., during `bun install`). -3. **Severe Memory Leak (Unbounded Caches):** The `LanguageServerClient` retains the full text and diagnostics of every file it ever interacts with in memory indefinitely. When an AI agent scans many files, this memory footprint grows linearly, causing the 9.5 GB usage spike. +## Per-Fix Verification ---- +### 1. Crash — TypeError from broken optional chaining (`client.ts`) +- **Correctness:** **Correct.** The addition of the second optional chaining operator (`void this.rpc?.handleMessage(msg)?.catch(...)`) correctly evaluates to `undefined` when `this.rpc` is null, preventing the synchronous `TypeError` that previously crashed the server. +- **Completeness:** **Complete.** Because `handleMessage` is an `async` function, it is guaranteed to return a Promise, meaning any internal errors are returned as rejections rather than synchronous throws. +- **New Issues/Regressions:** None. -## Root Causes & Detailed Technical Analysis +### 2. Crash — Unhandled 'error' event on `fs.watch` (`extension.ts`) +- **Correctness:** **Correct.** Attaching a no-op `'error'` listener to the `fs.watch` instance (`watcher.on("error", () => {})`) prevents Node/Bun from escalating transient filesystem errors (such as a directory vanishing during `bun install`) into unhandled exceptions that crash the main process. +- **Completeness:** **Complete.** The watcher is properly treated as best-effort. +- **New Issues/Regressions:** None. -### 1. Crash Type 1: Unhandled JSON Parse / TypeError (`client.ts`) +### 3. Memory leak — Unbounded document/diagnostic caches (`client.ts` + `diagnostics.ts`) +- **Correctness:** **Correct (for opened files).** The `evictIfOverCap` method properly acts as an LRU cache. It uses the JavaScript `Map`'s insertion-order property by grabbing the oldest key via `this.openDocuments.keys().next().value`. The `change()` method successfully maintains LRU order by deleting and re-inserting documents so they move to the tail. Evicted files correctly send `textDocument/didClose` and clear their state via `this.diagnostics.purge()`. +- **Completeness:** **Incomplete.** The fix bounds the memory for files the agent explicitly opens, but misses files analyzed passively. Language servers (like `pyright`, `rust-analyzer`, or `tsserver`) often scan the workspace in the background and emit `textDocument/publishDiagnostics` for files the client never touched. These trigger `DiagnosticsStore.setPushDiagnostics()`, which caches them unconditionally in the `pushDiagnostics` map. Because these background files are never placed in `openDocuments`, they are never reached by the `evictIfOverCap` loop. As a result, the `DiagnosticsStore` will still slowly leak memory over time as background files accumulate. +- **New Issues/Regressions:** None introduced. The `closeDocument` method is carefully guarded with `wasOpen` so it doesn't send invalid `didClose` notifications for unopened files. There are no race conditions in the polling logic (`waitForDiagnostics`). -**Location:** `packages/lsp/src/client.ts`, line 278 +### 4. Minor — Leaked promises on initialize timeout (`rpc.ts`) +- **Correctness:** **Correct.** Passing the `timeoutMs` parameter directly into `rpc.sendRequest` (rather than wrapping it in an external `Promise.race`) successfully utilizes the connection's internal timeout handler. When the timeout triggers, `rpc.ts` explicitly deletes the request ID from the `this.pending` Map, freeing the memory. +- **Completeness:** **Complete.** The leaked closure and promise are both cleared safely. +- **New Issues/Regressions:** None. -```typescript -// Current Code -void this.rpc?.handleMessage(msg).catch(() => {}); -``` +## Remaining Concerns +1. **Unbounded `DiagnosticsStore` Map:** As detailed in Bug 3, `this.pushDiagnostics` and `this.pushReceived` inside `DiagnosticsStore` have no size bounds. A server pushing diagnostics for thousands of untouched files across a large monorepo will keep those strings and objects in memory forever until the client is destroyed. -**The Bug:** The stack trace in the journal logs (`at handleMessage ... at handleBytes`) points to an error in the byte stream handler. While the developer correctly added a `try/catch` inside `rpc.ts` to prevent malformed JSON from crashing the app, they introduced a fatal flaw in `client.ts` when trying to suppress unhandled promise rejections. - -In JavaScript, optional chaining (`?.`) short-circuits the evaluation of the entire expression if the left-hand side is null/undefined. When the server process dies, `handleExit` sets `this.rpc = null`. If standard output flushes a final chunk *after* this occurs, `this.rpc?.handleMessage(msg)` evaluates to `undefined`. Appending `.catch(() => {})` to this results in `undefined.catch()`, which throws a synchronous `TypeError: Cannot read properties of undefined (reading 'catch')`. This exception is entirely unhandled and bypasses all safeguards, crashing the Bun process. - -**Recommendation:** -Properly chain the `.catch` statement using optional chaining, or use a traditional `if` block: -```typescript -// Fix option 1: -void this.rpc?.handleMessage(msg)?.catch(() => {}); - -// Fix option 2: -if (this.rpc) { - void this.rpc.handleMessage(msg).catch(() => {}); -} -``` - -### 2. Crash Type 2: Unhandled ENOENT from File Watcher (`extension.ts`) - -**Location:** `packages/lsp/src/extension.ts`, lines 63-68 - -```typescript -// Current Code -const watcher = watch(root, { recursive: true }, (eventType: string, filename: string | null) => { - // ... -}); -return { close: () => watcher.close() }; -``` - -**The Bug:** Node's native `fs.watch` backend natively emits `'error'` events when it loses track of files—such as when watching a directory that is rapidly deleted (e.g., transient `.old_modules` directories during `bun install`). Since the returned `FSWatcher` instance does not have an `.on("error", ...)` event listener attached, Node/Bun escalates the emitted `'error'` event into an Uncaught Exception in the main event loop, abruptly killing the process. - -**Recommendation:** -Attach a no-op error listener to the watcher to prevent the event loop from panicking. -```typescript -const watcher = watch(root, { recursive: true }, (eventType: string, filename: string | null) => { - // ... -}); -watcher.on("error", () => { - // Gracefully ignore transient FS errors -}); -return { close: () => watcher.close() }; -``` - -### 3. The 9.5 GB Memory Leak (`client.ts` & `diagnostics.ts`) - -**Location:** `packages/lsp/src/client.ts` and `packages/lsp/src/diagnostics.ts` - -```typescript -// client.ts -private openDocuments = new Map<string, { version: number; text: string }>(); -private lastDiagSnapshot = new Map<string, { keys: Set<string>; text: string }>(); - -// diagnostics.ts -private pushDiagnostics = new Map<string, readonly Diagnostic[]>(); -``` - -**The Bug:** The design of `LanguageServerClient` is strictly append-only. When an agent opens a file or queries diagnostics (`waitForDiagnostics`), the full string contents of the file are loaded and stored in `openDocuments` and `lastDiagSnapshot`. - -Because there is no `.close()` method implemented, no eviction policy (e.g., LRU cache), and no code that issues a `textDocument/didClose` notification, these `Map` instances hold references to every file touched throughout the server's lifespan. Over a 12-hour session where an agent iteratively scans thousands of files across a large monorepo, these caches balloon uncontrollably, resulting in the observed 9.5 GB memory footprint. - -**Recommendation:** -- Implement a document lifecycle strategy. When an agent is done with a file (or when an LRU threshold is reached), invoke a new `closeDocument` method that: - 1. Sends a `textDocument/didClose` notification to the language server. - 2. Deletes the corresponding entry from `openDocuments` and `lastDiagSnapshot`. - 3. Commands the `DiagnosticsStore` to purge the related `pushDiagnostics`. - -### Minor: Leaking Promises on Initialize Timeout (`rpc.ts`) - -**Location:** `packages/lsp/src/rpc.ts` (line 49) & `packages/lsp/src/client.ts` (line 356) - -When `initialize` executes, it wraps `rpc.sendRequest("initialize")` in a `Promise.race` with a 45-second timeout. If the server times out, the `initialize` method rejects, but the original promise inside `this.pending` inside `rpc.ts` is never cleared. It will leak (along with its closure) in the `Map` forever. While minor compared to the document cache leak, it should be addressed by allowing `timeoutMs` to be passed into the initial `sendRequest` call directly instead of relying on `Promise.race`. +## Recommendations +- **Bound passive diagnostics:** Modify `DiagnosticsStore` to implement its own LRU cache for `pushDiagnostics`, or have it coordinate with `client.ts` to ensure that even unopened files with diagnostics are eventually aged out and purged when a maximum threshold is reached. diff --git a/packages/lsp/src/diagnostics.test.ts b/packages/lsp/src/diagnostics.test.ts index 7f0365b..70b1ffa 100644 --- a/packages/lsp/src/diagnostics.test.ts +++ b/packages/lsp/src/diagnostics.test.ts @@ -1,5 +1,5 @@ import { describe, expect, it } from "vitest"; -import { DiagnosticsStore } from "./diagnostics.js"; +import { type Diagnostic, DiagnosticsStore } from "./diagnostics.js"; describe("diagnostics", () => { it("formats diagnostics with severity, location, and message", () => { @@ -84,4 +84,75 @@ describe("diagnostics", () => { // Other URIs are untouched. expect(store.format("file:///project/b.ts")).toBe(""); }); + + it("bounds pushDiagnostics — oldest background file is evicted past the cap (Bug 3 remainder)", () => { + // Language servers scan the workspace and push diagnostics for files the + // agent never opened. These never enter the client's openDocuments LRU, + // so without an independent cap the pushDiagnostics map grows forever. + const store = new DiagnosticsStore(); + const CAP = 100; + + const diag = (uri: string): readonly Diagnostic[] => [ + { + range: { start: { line: 0, character: 0 }, end: { line: 0, character: 1 } }, + severity: 1, + message: `err in ${uri}`, + }, + ]; + + // Push `CAP` distinct background files (at the cap — none evicted yet). + for (let i = 0; i < CAP; i++) { + store.setPushDiagnostics({ uri: `file:///bg/file${i}.ts`, diagnostics: diag(`file${i}`) }); + } + expect(store.hasReceivedPush("file:///bg/file0.ts")).toBe(true); + expect(store.format("file:///bg/file0.ts")).toContain("err in file0"); + + // One past the cap → the least-recently-pushed (file0) is evicted. + store.setPushDiagnostics({ + uri: "file:///bg/file100.ts", + diagnostics: diag("file100"), + }); + + // file0 (oldest) is gone — no didClose, no retained diagnostics, no flag. + expect(store.hasReceivedPush("file:///bg/file0.ts")).toBe(false); + expect(store.format("file:///bg/file0.ts")).toBe(""); + expect(store.getMerged("file:///bg/file0.ts")).toHaveLength(0); + + // The newest (file100) and a middle entry are retained. + expect(store.hasReceivedPush("file:///bg/file100.ts")).toBe(true); + expect(store.format("file:///bg/file100.ts")).toContain("err in file100"); + expect(store.format("file:///bg/file50.ts")).toContain("err in file50"); + }); + + it("re-pushing a URI refreshes its LRU position so it is not evicted", () => { + const store = new DiagnosticsStore(); + const CAP = 100; + + const diag = (uri: string): readonly Diagnostic[] => [ + { + range: { start: { line: 0, character: 0 }, end: { line: 0, character: 1 } }, + severity: 1, + message: uri, + }, + ]; + + for (let i = 0; i < CAP; i++) { + store.setPushDiagnostics({ uri: `file:///bg/file${i}.ts`, diagnostics: diag(`file${i}`) }); + } + + // Re-push file0 (an actively-edited file gets re-pushed on each change): + // this should move it to the tail (most-recently-used), NOT leave it at + // the head where it would be the first eviction candidate. + store.setPushDiagnostics({ uri: "file:///bg/file0.ts", diagnostics: diag("file0-updated") }); + + // Now push one past the cap. file1 (now the oldest untouched) should be + // evicted; file0 (refreshed) must survive. + store.setPushDiagnostics({ uri: "file:///bg/file100.ts", diagnostics: diag("file100") }); + + expect(store.hasReceivedPush("file:///bg/file0.ts")).toBe(true); + expect(store.format("file:///bg/file0.ts")).toContain("file0-updated"); + + expect(store.hasReceivedPush("file:///bg/file1.ts")).toBe(false); + expect(store.format("file:///bg/file1.ts")).toBe(""); + }); }); diff --git a/packages/lsp/src/diagnostics.ts b/packages/lsp/src/diagnostics.ts index 09ec1ae..cce261f 100644 --- a/packages/lsp/src/diagnostics.ts +++ b/packages/lsp/src/diagnostics.ts @@ -35,10 +35,44 @@ export class DiagnosticsStore { private pushDiagnostics = new Map<string, readonly Diagnostic[]>(); private pullDiagnostics = new Map<string, readonly Diagnostic[]>(); private pushReceived = new Set<string>(); + /** + * Bounded push-diagnostics set: once more than this many URIs have cached + * push diagnostics, the least-recently-pushed is evicted (just deleted — + * no didClose, since these are often background-scanned files the agent + * never opened). Language servers (tsserver, rust-analyzer, …) scan the + * workspace and emit textDocument/publishDiagnostics for files the client + * never touched; those never enter the client's openDocuments LRU, so + * without this cap the map grew without bound over a long session. + */ + private static readonly MAX_PUSH_DIAGNOSTICS = 100; setPushDiagnostics(params: PublishDiagnosticsParams): void { + // Delete-then-set to refresh LRU recency: a URI re-pushed (e.g. an + // actively edited file) moves to the tail (most-recently-used), so + // background-scanned files pushed once drift to the head and are evicted + // first. Mirrors the openDocuments LRU in client.ts. (A plain `set` on + // an existing key updates the value but leaves its insertion position — + // and thus its eviction priority — unchanged.) + this.pushDiagnostics.delete(params.uri); this.pushDiagnostics.set(params.uri, params.diagnostics); this.pushReceived.add(params.uri); + this.evictPushIfOverCap(); + } + + /** + * If the push-diagnostics set exceeds MAX_PUSH_DIAGNOSTICS, drop the + * least-recently-pushed entry (the first key in insertion order) from + * both `pushDiagnostics` and `pushReceived`. No didClose is sent: these + * entries are frequently for files the agent never opened, and even for + * an opened file the next push re-caches it. + */ + private evictPushIfOverCap(): void { + while (this.pushDiagnostics.size > DiagnosticsStore.MAX_PUSH_DIAGNOSTICS) { + const oldest = this.pushDiagnostics.keys().next().value; + if (oldest === undefined) break; + this.pushDiagnostics.delete(oldest); + this.pushReceived.delete(oldest); + } } setPullDiagnostics(uri: string, report: DocumentDiagnosticReport): void { |
